Connect to a SharePoint Online List using Azure Data Factory

Retrieving SharePoint Online List data

SharePoint Online List data is one of those sources that can be of interest to integrate with other data in your Azure Data Platform. If your organization or client is already using SharePoint then Lists are often easy to set up and provide users a friendly way of data entry.

One method of getting this data into an Azure Storage, Data Lake or Azure SQL Database is by configuring an Azure Logic App. However it’s also possible to directly connect to your list using Azure Data Factory.

Providing an example pipeline

In this example we create a Azure Data Factory Pipeline that will connect to the list by using the Microsoft Graph API. We will request a token using a web activity. This token will be used in a copy activity to ingest the response of the call into a blob storage as a JSON file.
Big thanks for your help @Anton!

Prerequisites

To call the Microsoft Graph API we need to acquire an access token from the Microsoft Identity Platform. To request this token we need a Azure AD App with the correct API Permission. The credentials of this App will be used to send a request. The response will be a token. This token can be used in the actual data request.

There is a very helpful guide from Anoopt on Medium on how to register this App, get the API Permissions and send a request. The post mentions the API Permission “Create, edit, and delete items and lists in all site collection”. I’ve used the permission “Sites.Read.All”. Please follow these steps and get an example up and running in Postman. After this you implement the same kind of logic in your Azure Data Factory Pipeline.

The pipeline

The pipeline has two activities. The GetToken activity requests the token. The Copy Data Activity uses the token in the REST Dataset Source to request the SharePoint List items. The response is stored in the sink blob dataset:  

 

Pipeline parameters

  • AdAppClientId: The identifier of your Azure AD App
  • AdDirectoryId: The identifier of your Azure Directory
  • AdAppSecret: The secret of your Azure AD App
  • SharePointUrl: The endpoint URL of the Sharepoint List
  • FileName: The name of the file that will be created on the Azure Storage
  • FolderPath: The folder of the file The first step is to request a token. We request the token using the App secret. You should store this secret in your Azure Key vault and retrieve it using a activity. In this example we will store it in a parameter.

The Sharepoint URL should be the endpoint to retrieve the items of your SharePoint list. The correct endpoint is: GET https://graph.microsoft.com/v1.0/sites/{site-id}/lists/{list-id}/items/{item-id}?expand=fields

Datasets:

Rest dataset (Source)

By adding accept: application/json in the header the result will in JSON.

Blob dataset (Sink)

The Schema only maps the value column. This is the part of the JSON response that will be relevant to push to your blob storage. This is the same for any SharePoint list, which makes the solution generic and means you do not need to map columns:

Linked Services

The Linked Services require no special configuration.

Wrap up

This pipeline provides the functionality to directly ingest SharePoint data as JSON into your data platform. The pipeline uses an Azure AD App and the Microsoft Graph API. The result is a JSON file in a blob storage that can be picked up and – for example – transformed into SQL data.

Lars Bouwens is Business Intelligence consultant at Motion10 and likes to read and write about Power BI and Azure.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment

  1. Avatar

    dhofman

    Would you be able to elaborate on this blog? I am trying to get this to work and there are some huge gaps in the information you have provided. More detailed screenshots of each part of the process in Data Factory would be helpful.

    How do you configure the Linked Services, especially the Rest linked service? The Dataset requires a Linked Service, but what is the purpose here?

    Where are the Dataset properties entered you show near the graphic of the pipeline? Is it in the dataset? Or the pipeline itself? Nothing I see in Data Factory mimics the screenshots you have in the blog for these pieces of information.

    You also show the additional headers entered in the Rest Dataset. Unless this is exposed as the dataset is configured (which I cannot proceed because it is not explained how to configure the Rest Linked Service), I see the "additional headers" section within the Source of the Copy step.

    I have seen multiple places where they claim they can get data from Sharepoint Lists using Data Factory, but I have yet to see a concrete example where the steps can be recreated in any environment.