Problem Statement
Recently, an internal team requested CSV reports to be stored in a SharePoint folder, which were previously generated from an Azure SQL Database and dropped into an Azure Data Lake Gen1 folder. After conducting thorough research, I discovered that there weren't many articles demonstrating how to move files from Data Lake to SharePoint, so I decided to create a project and share the step-by-step process.
Proposed Solution
To accomplish this task, I utilized the following resources:
Azure SQL Database - to extract the information using a SQL Query
Azure Data Pipeline - to create a CSV file and drop it across Azure Destinations
Azure Data Lake Gen1 - to store the CSV files
Azure Storage Account Container - to store CSV files
Logic App - to move the files into Sharepoint
Sharepoint Folder
The steps are the following:
Retrieve the information through a SQL Query
Use an ADF pipeline to create a CSV file in a Data lake Folder
Since files cannot be moved directly from Data Lake to SharePoint, we create an Azure Storage Account to serve as a placeholder, helping us move the files successfully to our destination using logic apps.
Azure Data Factory Pipeline
There are a number of issues I had to solve when I started building these pipes.
The first challenge was to parameterise a @FileName and pass it into an ADF Dataset variable so we can dynamically create files. However, when I tried to store the output from my lookup activity it stored it in an Array. That's the reason of me going down the path of having 2 variables:
@FileNames - which is the array (output result from the lookup activity)
@FileName - which is the string from the Output Array
The logical progression of the Azure Pipeline is the following:
We generate the filenames using a Lookup Activity to the database in order to dynamically append dates to the filename variables.
select DISTINCT TOP 1
TRIM('FileName_1' + CAST(DATEADD(DAY, 2 - DATEPART(WEEKDAY, GETDATE()), CAST(GETDATE() AS DATE)) as NVARCHAR(255)) + '.csv') as [FileName]
FileName 1: FileName_1_YYYY_MM_DD.csv
select DISTINCT
TRIM('FileName_2' + CAST(DATEADD(DAY, 2 - DATEPART(WEEKDAY, GETDATE()), CAST(GETDATE() AS DATE)) as NVARCHAR(255)) + '.csv') as [FileName]
FileName 2: FileName_2_YYYY_MM_DD.csv
2. Store the result outputs from the Lookup Activity into Array Variables @fileNames
3. Transform the output from the array variable into @FileName variable using the following transform script.
@substring(string(variables('FileNames')[0]),13,42)
4. Copy Data from Database Query into a CSV in the lake.
5. Move the CSVs created in step 4 from the Data Lake into a Storage Account Container.
6. Send Success / Failure Email
7. Trigger Logic App using an HTTP request
Azure Logic App
For the Azure Logic App, my challenge was to trigger the app once instead of on every file that was created. I utilized a webhook trigger to address this issue.
The logical progression of the Logic App is:
Receive the HTTP request from the last step of the ADF Pipeline
List all blobs in the Storage Account Container
Get the blob content
Create a CSV file in the SharePoint
Send a success email that the file has been uploaded to Sharepoint
Trigger a cleanup pipe which cleans the Storage Account Container so we don't use too much storage since we're already storing the files in the Azure Data Lake Gen1 destination.
Cleanup Pipe
The cleanup pipe has only one step, which is to clear the contents of the Storage Account Container and prepare it for the next batch.
Comments