top of page
Search
Writer's pictureSQL Shark

Move files from Azure Data Lake to Sharepoint

Problem Statement

Recently, an internal team requested CSV reports to be stored in a SharePoint folder, which were previously generated from an Azure SQL Database and dropped into an Azure Data Lake Gen1 folder. After conducting thorough research, I discovered that there weren't many articles demonstrating how to move files from Data Lake to SharePoint, so I decided to create a project and share the step-by-step process.


Proposed Solution

To accomplish this task, I utilized the following resources:

  1. Azure SQL Database - to extract the information using a SQL Query

  2. Azure Data Pipeline - to create a CSV file and drop it across Azure Destinations

  3. Azure Data Lake Gen1 - to store the CSV files

  4. Azure Storage Account Container - to store CSV files

  5. Logic App - to move the files into Sharepoint

  6. Sharepoint Folder



The steps are the following:

  1. Retrieve the information through a SQL Query

  2. Use an ADF pipeline to create a CSV file in a Data lake Folder

  3. Since files cannot be moved directly from Data Lake to SharePoint, we create an Azure Storage Account to serve as a placeholder, helping us move the files successfully to our destination using logic apps.

Azure Data Factory Pipeline

There are a number of issues I had to solve when I started building these pipes.


The first challenge was to parameterise a @FileName and pass it into an ADF Dataset variable so we can dynamically create files. However, when I tried to store the output from my lookup activity it stored it in an Array. That's the reason of me going down the path of having 2 variables:

  • @FileNames - which is the array (output result from the lookup activity)

  • @FileName - which is the string from the Output Array

The logical progression of the Azure Pipeline is the following:

  1. We generate the filenames using a Lookup Activity to the database in order to dynamically append dates to the filename variables.


select DISTINCT TOP 1
TRIM('FileName_1' + CAST(DATEADD(DAY, 2 - DATEPART(WEEKDAY, GETDATE()), CAST(GETDATE() AS DATE)) as NVARCHAR(255)) + '.csv') as [FileName]

FileName 1: FileName_1_YYYY_MM_DD.csv


select DISTINCT
TRIM('FileName_2' + CAST(DATEADD(DAY, 2 - DATEPART(WEEKDAY, GETDATE()), CAST(GETDATE() AS DATE)) as NVARCHAR(255)) + '.csv') as [FileName]

FileName 2: FileName_2_YYYY_MM_DD.csv


2. Store the result outputs from the Lookup Activity into Array Variables @fileNames

3. Transform the output from the array variable into @FileName variable using the following transform script.

@substring(string(variables('FileNames')[0]),13,42)	

4. Copy Data from Database Query into a CSV in the lake.

5. Move the CSVs created in step 4 from the Data Lake into a Storage Account Container.

6. Send Success / Failure Email

7. Trigger Logic App using an HTTP request


Azure Logic App

For the Azure Logic App, my challenge was to trigger the app once instead of on every file that was created. I utilized a webhook trigger to address this issue.



The logical progression of the Logic App is:

  1. Receive the HTTP request from the last step of the ADF Pipeline

  2. List all blobs in the Storage Account Container

  3. Get the blob content

  4. Create a CSV file in the SharePoint

  5. Send a success email that the file has been uploaded to Sharepoint

  6. Trigger a cleanup pipe which cleans the Storage Account Container so we don't use too much storage since we're already storing the files in the Azure Data Lake Gen1 destination.

Cleanup Pipe

The cleanup pipe has only one step, which is to clear the contents of the Storage Account Container and prepare it for the next batch.

182 views0 comments

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Designer (10)_edited_edited.jpg

Fuel the SQL Shark! Buy me a coffee to keep the data waves rolling! 🦈☕

bottom of page