Chapter 2
Resources
This chapter aims to look at the resources that will be needed to deliver the database solution. Sub-articles will be hyperlinked to how to set up each resource in Azure. I will also be covering the system designs here.
Figure 1 – Data Flow Diagram
Figure 1 illustrates a very high-level of what our system design is going to look like. For this project we’ll need:
On-Premise SQL Server DB – in my instance this will be my localhost server which has several databases in it. We’ll be focusing on AdventureWorksDW2017 for this project.
Azure SQL DB – This will be created in a personal subscription that I own. We’ll look at 2 ways you can migrate your data to the cloud and cover some of the compatibility challenges you might encounter.
Azure Automation Account – We’ll look at how you can control your Azure Resources with PowerShell Runbooks as well as use them to trigger your Database ETL Processes
Azure Data Factory – This resource will be compared against using an Azure Automation Account. We’ll look at whether it’s more cost-effective to use Azure Data Factory or Azure Automation Account. We’ll also look at how to deploy SSIS packages to the cloud.
Storage Account – We’ll use the resource to create 2 blob containers and link them as external sources to our Azure SQL DB. We’ll then upload a couple of CSV’s and show how we can BULK INSERT data into our Azure SQL DB using our reports blob container. The other container will be used for storing backups.
Azure Analysis Services – This SSAS Tabular model will be hosted in Azure and is built on the backbone of the AdventureWorksDW2017. We’ll be applying Row Level Security to our Model for the different User Groups.
Power BI – We’ll look at the difference in building your model in Power BI compared to using an AAS model for your data source. What are the pros and cons and limitations of using AAS models? A sample report would be developed from the AdventureWorksDW2017.
Databases
For this project, we’ll use an On-Premise SQL Server DB and an Azure SQL DB. I had pre-setup the databases so I won’t be covering how to set them up. The specs of my Azure SQL Database are covered in the section below.
To keep costs down, I’ve set-up my database at the lowest Tier possible which is a Basic (5 DTUs) database with a max size of 5. If you’re doing this in a commercial environment you should consider some of the other Standard or Premium tiers depending on your data warehouse size.
To find out which Pricing tier works best for your organization use this tool: https://azure.microsoft.com/en-us/pricing/details/sql-database/single/
Automation
In this section, we’ll cover how to set up an Azure Automation Account and an Azure Data Factory. We’ll start by setting up the automation account.
Azure Automation Account
Azure Automation delivers a cloud-based automation and configuration service that supports consistent management across your Azure and non-Azure environments. It comprises process automation, configuration management, update management, shared capabilities, and heterogeneous features. Automation gives you complete control during deployment, operations, and decommissioning of workloads and resources.
To read more about Azure Automation Accounts follow this link: https://docs.microsoft.com/en-us/azure/automation/automation-intro
Step 1 – Navigate and login to your Azure Portal.
Step 2 – Click on Automation Accounts and then on the button Create an automation account
Step 3 – Give it a name, choose your subscription, resource group, and location. Also, create Azure Run As account which would help us later on when we come to creating Runbooks.
Step 4 - Click on create the resource and wait until your resource is deployed. Once you’ve received a notification that it’s completed successfully click on the refresh button and your automation account should appear.
Great, your Automation Account is all set and ready to use! :)
Azure Data Factory
To create Data Factory instances, the user account that you use to sign in to Azure must be a member of the contributor or owner role or an administrator of the Azure subscription. Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation.
Azure Data Factory does not store any data itself. It allows you to create data-driven workflows to orchestrate the movement of data between supported data stores and the processing of data using compute services in other regions or in an on-premise environment. It also allows you to monitor and manage workflows using both programmatic and UI mechanisms.
The Data Factory service allows you to create data pipelines that move and transform data and then run the pipelines on a specified schedule (hourly, daily, weekly, etc.). This means the data that is consumed and produced by workflows is time-sliced data, and we can specify the pipeline mode as scheduled (once a day) or one time.
To read more about data factory please go to this link: https://docs.microsoft.com/en-us/azure/data-factory/
Here’s a guide to how you can create your data factory:
Step 1 – Go to Azure Portal
Step 2 – From the Azure Portal menu, select Create a resource
Step 3 – Select Integration, and then select Data Factory
Step 4 – On the Create Data Factory page, under Basics tab, select your Azure Subscription in which you want to create the data factory.
Step 5 – For Resource Group, take one of the following steps:
a. Select an existing resource group from the drop-down list
b. Select Create new, and enter the name of a new resource group
For our example, I’m just going to use the same resource group as my Azure SQL Database and my Automation Account.
Step 6 – For the region, select the same region as your other sources
Step 7 – Give it a name
Step 8 – For Version, keep it V2
Step 9 – Select Next: Git configuration, and then select Configure Git later check box
Step 10 - Select Review + create, and select Create after the validation is passed. After the creation is complete, select Go to resource to navigate to the Data Factory page.
Step 11 - Select the Author & Monitor tile to start the Azure Data Factory user interface (UI) application on a separate browser tab.
Azure Storage Account
Azure Storage offers several types of storage accounts. Each type supports different features and has its own pricing model. Consider these differences before you create a storage account to determine the type of account that is best for your applications. The types of storage accounts are:
General-purpose v2 accounts: Basic storage account type for blobs, files, queues, and tables. Recommended for most scenarios using Azure Storage.
General-purpose v1 accounts: Legacy account type for blobs, files, queues, and tables. Use general-purpose v2 accounts instead when possible.
BlockBlobStorage accounts: Storage accounts with premium performance characteristics for block blobs and append blobs. Recommended for scenarios with high transactions rates, or scenarios that use smaller objects or require consistently low storage latency.
FileStorage accounts: Files-only storage accounts with premium performance characteristics. Recommended for enterprise or high performance scale applications.
BlobStorage accounts: Legacy Blob-only storage accounts. Use general-purpose v2 accounts instead when possible.
To read more about Storage Accounts go to this link: https://docs.microsoft.com/en-us/azure/storage/blobs/
For the purpose of our project, we’re going to use General-purpose v2.
In order to create it, follow the steps below:
Step 1 – Log-in into Azure Portal
Step 2 – Navigate to Storage Account in the resources tab and click on Create
Step 3 – Give it a name, subscription and location. Select Performance – Standard, Account kind – Storage v2 and Replication - RA-GRS
Step 4 – Click on Review and Create. Leave the other Settings default if you want additional security you can amend them in the advanced tab.
Your Storage Account I all set up! :)
Creating Blob Containers
The next step of this process is to create 3 blob containers – one for the azure data factory, another for azure database backups, and one for .CSV reports. The naming conventions are:
bireports - Container
bibackups - Container
biadf - Container
Create a linked service between Azure Data Factory and your Storage Account Container
Step 1 - On the Azure Data Factory UI page, open Manage tab from the left pane.
Step 2 - On the Linked services page, select +New to create a new linked service.
Step 3 - On the New Linked Service page, select Azure Blob Storage, and then select Continue.
Step 4 - On the New Linked Service (Azure Blob Storage) page, complete the following steps:
a. For Name, enter AzureStorageLinkedService.
b. For the Storage account name, select the name of your Azure Storage account.
c. Select Test connection to confirm that the Data Factory service can connect to the storage account.
d. Select Create to save the linked service.
Your linked service is all set! :)
Azure Analysis Services Server
Azure Analysis Services is a fully managed platform as a service (PaaS) that provides enterprise-grade data models in the cloud. Use advanced mashup and modeling features to combine data from multiple data sources, define metrics, and secure your data in a single, trusted tabular semantic data model. The data model provides an easier and faster way for users to perform ad hoc data analysis using tools like Power BI and Excel.
Azure Analysis Services is available in Developer, Basic, and Standard tiers. Within each tier, plan costs vary according to processing power, Query Processing Units (QPUs), and memory size. When you create a server, you select a plan within a tier. You can change plans up or down within the same tier, or upgrade to a higher tier, but you can't downgrade from a higher tier to a lower tier.
To read more about the tiers follow this link: https://docs.microsoft.com/en-us/azure/analysis-services/analysis-services-overview
For the purpose of the project, we’ll be using Developer Tier D1.
Similar to the other guides just navigate to the Analysis Services Resource and Create one.
Power Bi
Microsoft Power BI is used to find insights within an organization's data. Power BI can help connect disparate data sets, transform and clean the data into a data model and create charts or graphs to provide visuals of the data.
Since I don’t have a Power BI Subscription all of the reports would be done locally. Make sure you have Power BI installed. I won’t be covering how to install and set up your Power BI Application in this article.
Conclusion
Please read the next article where we’ll cover the different ways you can migrate your data warehouse schema and data to the cloud. We’ll also be covering the different challenges and compatibility issues that you might encounter.
Comments