0% found this document useful (0 votes)

25 views36 pages

Data Migration Project

The document outlines a detailed architecture for migrating data from an on-premises environment to Azure, utilizing services like Azure Data Factory, Azure Data Lake Storage Gen2, and Azure Synapse Analytics. It describes the necessary components, data movement processes, transformation stages, and security measures, including the use of Azure Key Vault and Logic Apps for monitoring. The architecture ensures a secure, automated, and scalable solution for data migration and analytics preparation in Azure.

Uploaded by

aman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views36 pages

Data Migration Project

Uploaded by

aman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

On-Prem to Azure Data Migration Architecture

This architecture outlines a comprehensive approach for migrating data from an on-premises
environment to Azure using various Azure services. The required resources include:
1. On-Prem VM and File System: The on-prem VM will host the file system with data in
formats such as TXT, CSV, and Parquet.
2. Azure Data Factory (ADF): ADF will serve as the primary tool for orchestrating and
automating the data migration process.
3. Azure Data Lake Storage Gen2 (ADLS Gen2): Used to store raw, preprocessed, and
processed data. The raw data will be stored in the landing layer, while cleaned and
transformed data will be moved to the preprocessed and processed layers.
4. Azure Synapse Analytics: The destination data warehouse where the final processed
data will be loaded for analytics and reporting.
5. Azure Databricks with PySpark: Used to create the bronze, silver, and golden layers. This
involves cleaning the data by removing nulls and duplicates, transforming data by joining
tables, and applying business logic to create structured and ready-for-reporting datasets.
6. Azure App Registration: Facilitates secure connections to ADLS Gen2 via mount points
for data access.
7. Azure Logic Apps: Used to set up alert mechanisms for monitoring the migration and
data processing workflows.
8. Azure Key Vault: Stores and manages secrets, keys, and credentials for secure access to
Azure resources.

Architecture Overview:
1. On-Prem Data Storage: We start by establishing an on-prem VM with a file system
containing data in formats such as TXT, CSV, and Parquet.
2. Connecting On-Prem to Azure: Once Azure Data Factory (ADF) is set up, we use the Self-
Hosted Integration Runtime (SHIR) to create a secure gateway between the on-prem VM
and Azure. The SHIR allows ADF to connect to on-prem data sources.
3. Data Movement to ADLS Gen2: Using ADF, we perform various activities like Lookup,
Metadata, Copy, and Stored Procedure activities to move the data from on-premises to
Azure Data Lake Storage Gen2 (ADLS Gen2), specifically into the raw or landing layer.
4. Data Transformation in Azure Databricks: In Azure Databricks, we create the bronze,
silver, and golden layers:
o Bronze Layer: Raw data is cleaned by removing nulls, duplicates, and irrelevant
records.
o Silver Layer: Preprocessed data is transformed with additional logic, such as
applying joins or other business rules.
o Golden Layer: The final, cleaned, and transformed data is ready for reporting and
analytics.
5. Data Loading into Synapse: The processed data is then loaded into Azure Synapse
Analytics as a data warehouse, where it will be used for analytics and reporting.
6. Security and Connectivity:
o App Registration: Provides secure access to ADLS Gen2 by using mount points.
o Azure Key Vault: Ensures the safe storage of secrets and credentials required for
accessing resources.
7. Monitoring and Alerts: Azure Logic Apps are configured to send alerts on various
activities, such as task failures or successful migrations, ensuring smooth monitoring of
the pipeline.
This architecture ensures a secure, automated, and scalable solution for migrating on-prem
data to Azure, transforming it for analytics, and storing it in a centralized data warehouse.

Here are the pipeline headings:

1. Pipeline 1: On-Prem to ADLS
2. Pipeline 2: Raw ADLS to Bronze ADLS
3. Pipeline 3: Bronze ADLS to Silver ADLS
4. Pipeline 4: Silver ADLS to SQL Data Warehouse (Synapse)
5. Pipeline 5: Master Pipeline
Step → create on-prem VM
Under imagine select your sql server
Next directly go to sql server by
giving next→ next → enable the sql
auth
Step2→ ceate adf

Step 3 → create adls gen 2 storage

account
create a container → global
under global→ create 3 directory as
raw, bronze, silver

step 4→ create key vault

Under principal give acess to your ADF so that adf
can access the key vault

After giving access review and create key vault.

step 5→ create a dedicated sql pool(i.e sql dw)

Step 6→ go to VM and connect to VM

In your local open RDP and paste the ip address and give credentials and login in.
Minimize the RDP

Step 7→ go to ADF and create Self hosted run time

Here download the integration runtime in your VM (remember to copy the keys)
Step→ go to VM → local server→ off the enchanced sercuity
Open the explorer→ download the integration run time in your RDP

once downloaded install in RDP

once downloaded copy the key from ADF and
paste it and register in RPD
Step 8→ mean while connect your Sql pool to your SSMS

Step 9→ put the csv files in c folder of VM to create the file system data
(make sure they are in txt format as in VM there is no Excel to support)
Step 10→ we need to disable dmgcmd.exe in integration runtime folder in VM
which Is c folder
C:\Program Files\Microsoft Integration Runtime\5.0\Shared
Open power shell in RDP and disable the dmgcmd.exe file from integration runtime folder
Open PowerShell.Navigate to the directory where dmgcmd.exe is located using the cd (Change
Directory) command. For example:
powershell
cd "C:\Program Files\Microsoft Integration Runtime\5.0\Shared"
After you have navigated to the correct directory, you can run the executable as follows:
powershell
.\dmgcmd.exe -DisableLocalFolderPathValidation
Step 11→ create the linked services in ADF for

Linked service→ File system → self hosted

in Host → give path where we have coped the
file system data in VM
username → VM / RDP username
password→ place the password in key vault and
use it as the password
later check for test connection→ connection
should be established.(if dgmcdm is disabled it
will get connected)

Similarly create for ADLS gen 2 using key vault →

url→ hhtps:\\<adlscontainername>.dfs.core.windows.net
copy the connection string of the container from adls--> storage →adls container→access
key→connnection string copy→ key vault→ secret→ create the key
create the linked service for key vault if not created to access the secret
Similary create the linked service for Sql pool (linked service → snapse analytics)
Here we use the key vault secret connection using dot.net connection string
replace the password → create the key vault secret
➔ Configure the firewall for sql pool

there a option to enable the azure services,

have to enable it
use selfhosted IR which is created for runtime

Step 12→ execute the following scripts in testpool database in SSMS

Table syntax:
-- Create the table 'metadata'
CREATE TABLE metadata (
sourcefoldername VARCHAR(50),
storagepath VARCHAR(50),
isactive INT,
status VARCHAR(50)
);

-- Insert data into the 'metadata' table

INSERT INTO metadata (sourcefoldername, storagepath, isactive, status)
VALUES
('cust', 'cust', 0, 'ready'),
('orders', 'orders', 0, 'ready'),
('emp', 'emp', 0, 'ready'),
('discounts', 'discounts', 0, 'ready');

-- Create the 'metadata_usp' stored procedure

CREATE PROCEDURE metadata_usp (@status VARCHAR(50), @sourcefoldername
VARCHAR(50)
AS
BEGIN
UPDATE metadata
SET status = @status
WHERE sourcefoldername = @sourcefoldername;
END;

-- Create the 'reset_status_usp' stored procedure

CREATE PROCEDURE reset_status_usp
AS
BEGIN
UPDATE metadata
SET status = 'ready';
END;

Step 13→ Create the pipeline in ADF

First activity is lookup→ to lookup metadata table from azure synapse
Take lookup activity→ setting→create dataset→
create dataset as below
Step → take for each activity next to lookup, take the output of lookup as input to for each
loop

Inside for each → take copy activity → copy activity

for copy activity source is file system → create the dataset →
file system→ csv file

for sink use adls gen 2→ create dataset for the same
Here file path should be in sink raw, as we are
going to put raw data in raw folder

Parameterize the directory → for file system

dataset source
And in sink dataset also parameterize

Add→ directory in form of

Raw/filename/yyyy/mm/dd
It should be created as folder.
Take output from lookup activity

Source→
sourcefoldername

Sink→ storage path

Following the copy activity take store proc activity

Use→metadat_usp as store proc,

And import parameter,
Add sourcefoldername as dynamically.
Status hard code succeeded indicating the
copy activity finished
Take another store procedure activity for
failure
Use the same store procedure and
hardcode status as failed

Next take the store procedure outside the for

each activity
Here we reset the sp on success
CREATE PROCEDURE reset_status_usp
AS
BEGIN
UPDATE metadata
SET status = 'ready';
END;
Step → Create a logic app

Create logic app→go to resource→ create

blank new→ search for hhtp→select
request→ add new parameter→method
→GET→ add next step → gmail→send
email→name = gmail →sign in → add the to
email→ subject → save → url will be generated in http→ copy url

Step→ go to ADF→ on failed of for each take a web activity

Small change→ add the wait activity after for each loop → 30 secs

Here add the wild card and * as there are folder in source in RDP where the files are added
If any failure like the change in file name
or meatadata occurs email is triggered.

Step 14→ perform data quality checks and move cleansed data from Raw layer to bronze
layer.
Data quality checks are performed in data bricks →Create Azure databricks service in Azure.
Data source is ADLS container → create a mount point to connect to container (since we are
using only one container and inside that container we are moving data from raw folder to
bronze folder only 1 mount point is enough)
For mount ADLS → required service→ ADB, Azure key vault and SPN→ i.e App
registration(where we create a new client network and from there extract client secret value,
client ID, tenant ID and create the key vault secrets for the same)
Create app registration → go to app directory →app registration→once created→ go to
secrets and certificate→new client secret → copy the secret value and keep as it will be
encrypted once web is closed
Step→ go to your Adls storage account→I am role→ create a role for storage Blob data
contributor →assign that to app regist that i.e created above → review and assign

Step 15→ go to ADB →create notebook→

Create dbutilities widgets (once executes the widgets box appears at top through which we can
filter the particular data and run the entire notebook)
dbutils.widgets.text(processeddate,'')
dbutils.widgets.text(foldername,'')

step→ create the ADLS mount point

configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type":
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": dbutils.secrets.get(scope="adlsgenkey",key="appid"),
"fs.azure.account.oauth2.client.secret":
dbutils.secrets.get(scope="adlsgenkey",key="apppwd"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/f5ea40f2-
c7b8-4658-8d25-0aac8535e48c/oauth2/v2.0/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "abfss://global@adlsgenstorageaccountny.dfs.core.windows.net/",
mount_point = "/mnt/global",
extra_configs = configs)

create the scope by adding attend of url as →#secrets/createScope

Give the scope name → adlsgenkey

DNS name and rescource ID → from
azure key vault
in mount point→
client.secret":→ i.e apppwd the secret value
copied from app registration
client id→ application id
tenant id copy and paste in
fs.azure.account.oauth2.client.endpoint":
"https://login.microsoftonline.com/<tenantI
D > /oauth2/v2.0/token
source =
"abfss://<containername>@<storageaccountname>.dfs.core.windows.net/",

once execute comment all (select all in text and Ctrl+/ shortcut to comment all together)

Step 16→Aim is to move data from raw to bronze

src_path="/mnt/global/raw/"
dest_path="/mnt/global/bronze/"

dbutils.widgets.text(processeddate,'')
dbutils.widgets.text(foldername,'')

foldername = dbutils.widgets.get(' foldername')

pdate = dbutils.widgets.get(' processeddate')

print(foldername )
print(cdate)
src_final_path=src_path+foldername+"/"+pdate
print(src_final_path)
dest_final_path=dest_path+foldername+"/"+pdate
print(dest_final_path)
#following is the code for cleaning the data
try:
# Read data from source path
df = spark.read.format("csv").option("header", True).load(src_final_path)

# Count the number of rows in the source DataFrame

src_count = df.count()
print("Source count:", src_count)

# Remove duplicates
df1 = df.dropDuplicates()

# Count the number of rows in the destination DataFrame

dest_count = df1.count()
print("Destination count:", dest_count)

# Write the cleaned data to the destination path

df1.write.mode("overwrite").format("csv").option("header", True).save(dest_final_path)

# Exit the notebook with success message and counts

print("Success: Source count = " + str(src_count) + ", Destination count = " + str(dest_count))
except Exception as e:
# Handle exceptions and exit with an error message
dbutils.notebook.exit("Error: " + str(e))

Step 17→ go to ADF to create the pipeline for the movement of data from raw to bronze
Take the lookup activity →create the source dataset for synapse →
Next take→ foreach activity→ take output of lookup as input→ go inside for each → take
notebook activity → create linked service→ dataset
for notebook
Under setting→ we have to pass 2 base parameters (as we have take processeddate and
foldername as widgets in notebook)

Simialry to last pipeline→following the notebook activity take 2 store proc , 1 for success and
other for failure add the parameter
Outside the for each take reset store proc and web activity for failure where we call the logic
app
If any errors in notebook, pipeline fails and in output of the activity we get the notebook url,
click on that and it directly goes to ADB, and one with highlighted one is having error fix it and
run the pipeline again.
cannot edit here directly→ so go to the main
development branch and edit the code.

Step 18→ to do transformation and move data from bronze layer to Silver layer
Below is the scripts to perform transformation in notebook from Bronze to silver, here only join
transfprmtion for cust table is performed
# Set source and destination paths
src_path = "/mnt/global/bronze/"
dest_path = "/mnt/global/silver/"

# Input widgets for folder name and processing date

dbutils.widgets.text('foldername', '')
dbutils.widgets.text('pdate', '')

try:
# Get user input for folder name and processing date
foldername = dbutils.widgets.get('foldername')
pdate = dbutils.widgets.get('pdate')

print("Folder Name:", foldername)

print("Processing Date:", pdate)

# Create source and destination paths based on user input

src_final_path = src_path + foldername + "/" + pdate
print("Source Path:", src_final_path)

# Destination path for writing processed data

dest_final_path = dest_path + 'dim' + foldername
print("Destination Path:", dest_final_path)

# Load data from the source path

df = spark.read.format("csv").option("header", True).load(src_final_path)
src_count = df.count()
print("Source Count:", src_count)

# Display the DataFrame

df.show()

# Create a sample DataFrame (df11) - replace this with your actual data
df11 = spark.createDataFrame([(2, '78654345'), (3, '67865467')], ['cid', 'cphone'])
df11.show()

# Join dataframes if foldername is 'cust', otherwise use df as is

from pyspark.sql.functions import col

if foldername == 'cust':
df1 = df.alias('a').join(df11.alias('b'), col('a.cid') == col('b.cid'), "inner").select('a.*',
'b.cphone')
df1.show()
else:
df1 = df

# Count rows in the destination DataFrame

dest_count = df1.count()
# Write processed data to the destination path
df1.coalesce(1).write.mode("overwrite").format("csv").option("header",
True).save(dest_final_path)

print("Processing completed successfully.")

print("Source Count:", src_count)
print("Destination Count:", dest_count)
dbutils.notebook.exit("Processing completed successfully.")
except Exception as e:
print("Error:", str(e))
dbutils.notebook.exit("Error: " + str(e))

Create a pipeline similar to above raw to Bronze, change the notebook and provide the base
parameters properly

Step 19→ move data from Silver layer to Sql DW

print("Source Count:", src_count)
print("Destination Count:", dest_count)

# Load SQL data into the data warehouse

dbutils.widgets.text('foldername', '')

foldername = dbutils.widgets.get('foldername')
print("Folder Name:", foldername)

# Set source and destination paths for SQL data

src_path = "/mnt/global/silver/" + 'dim' + foldername
dest_path = "dim" + foldername
print("Source Path:", src_path)
print("Destination Path:", dest_path)

# Read data from the source path

df = spark.read.format("csv").option("header", True).load(src_path)
src_count = df.count()
print("Source Count:", src_count)

# Set Azure Storage account key

spark.conf.set("fs.azure.account.key.onpremdatasynasegen.dfs.core.windows.net",
"o82RdY56QpidiJOBzA0+c0xBYomGajKVXZ8oZKRr+TtVSjYOTI5+i6IVTmOFL5E73Ha5wJHe7aQ1+
AStdIFwNA==")

# Write data to SQL Data Warehouse (using JDBC connection from key vault)
df.write \
.mode("overwrite") \
.format("com.databricks.spark.sqldw") \
.option("url", dbutils.secrets.get(scope="adlsgenkey", key="sqljdbcpwd")) \
.option("dbtable", dest_path) \
.option("tempDir",
"abfss://global@onpremdatasynasegen.dfs.core.windows.net/tmp/synapse") \
.option("forwardSparkAzureStorageCredentials", "true") \
.save()
# Display source count
print("Source Count:", src_count)
dbutils.notebook.exit("Source Count: " + str(src_count) + ", Destination Count: " +
str(dest_count)

Create a pipeline similar to above here we give only 1 base parameter

Step 20→Create a master pipeline to execute all the pipeline using execute pipeline activity

PROJECT 2 For Python
No ratings yet
PROJECT 2 For Python
41 pages
Azure Project Execution Plan ADF+DBX+CICD
No ratings yet
Azure Project Execution Plan ADF+DBX+CICD
5 pages
Notes
No ratings yet
Notes
22 pages
End To End Project ADF
100% (1)
End To End Project ADF
73 pages
Data Engineer Interview Question
No ratings yet
Data Engineer Interview Question
4 pages
MS Azure+Azure Data Engineering-Syllabus
No ratings yet
MS Azure+Azure Data Engineering-Syllabus
9 pages
Azure Cloud & Data Integration Guide
No ratings yet
Azure Cloud & Data Integration Guide
3 pages
Azure Data Engineer Course Curriculum Nareshit
100% (1)
Azure Data Engineer Course Curriculum Nareshit
10 pages
Azure Data Factory Interview Questions Answers 1740678784
No ratings yet
Azure Data Factory Interview Questions Answers 1740678784
9 pages
ADF Course Deck
No ratings yet
ADF Course Deck
88 pages
Full Load
No ratings yet
Full Load
16 pages
PROJECT 8 For Python
No ratings yet
PROJECT 8 For Python
31 pages
Adf Syllabus
No ratings yet
Adf Syllabus
12 pages
Azure Data Factory
No ratings yet
Azure Data Factory
3,167 pages
Azure Data Factory Guide
No ratings yet
Azure Data Factory Guide
2,982 pages
Azure Data Engr POC - S For Interns
No ratings yet
Azure Data Engr POC - S For Interns
9 pages
Azure Data Factory V2 Preview Guide
No ratings yet
Azure Data Factory V2 Preview Guide
59 pages
Self Introduction
No ratings yet
Self Introduction
3 pages
Azure Data Factory Workshop
No ratings yet
Azure Data Factory Workshop
26 pages
Azure Resource Group & SQL Setup Guide
No ratings yet
Azure Resource Group & SQL Setup Guide
73 pages
Azure Data Engineering Project Part 1
No ratings yet
Azure Data Engineering Project Part 1
41 pages
Azure Data Factory Guide
No ratings yet
Azure Data Factory Guide
13 pages
Reading 1
No ratings yet
Reading 1
4 pages
Azure de Project
No ratings yet
Azure de Project
29 pages
f4b7901ed5e5f9106a3a82eea2e2f003
No ratings yet
f4b7901ed5e5f9106a3a82eea2e2f003
3,614 pages
Start To Finish With Azure Data Factory
100% (2)
Start To Finish With Azure Data Factory
30 pages
Course Content
No ratings yet
Course Content
13 pages
Azure Data Engineering for Pharma
100% (1)
Azure Data Engineering for Pharma
5 pages
Task Document Manish
No ratings yet
Task Document Manish
19 pages
Tasks For Hybrid Data Integration With Error Handling
No ratings yet
Tasks For Hybrid Data Integration With Error Handling
3 pages
Data Factory
100% (2)
Data Factory
26 pages
Azure Data Solutions
No ratings yet
Azure Data Solutions
7 pages
Azure Data Engineer
100% (1)
Azure Data Engineer
8 pages
Sales Data Analytics AW-2017LT Az - Project - 2
No ratings yet
Sales Data Analytics AW-2017LT Az - Project - 2
118 pages
ADE Project Amit
No ratings yet
ADE Project Amit
17 pages
Azure Data Engineering Guide
No ratings yet
Azure Data Engineering Guide
11 pages
Learn: Zure Data Factory (Adf)
No ratings yet
Learn: Zure Data Factory (Adf)
9 pages
025.0 ADF Overview
No ratings yet
025.0 ADF Overview
12 pages
PROJECT 4 For Python
No ratings yet
PROJECT 4 For Python
26 pages
Narsimlu - Azure Data Engineer - Resume .Pf-1
67% (3)
Narsimlu - Azure Data Engineer - Resume .Pf-1
4 pages
Introduction To ADF - LWTN
No ratings yet
Introduction To ADF - LWTN
54 pages
Azure Data Factory
No ratings yet
Azure Data Factory
18 pages
Az Questions
No ratings yet
Az Questions
11 pages
PROJECT 12 For Python
No ratings yet
PROJECT 12 For Python
26 pages
Azure Data Factory
No ratings yet
Azure Data Factory
4 pages
Interview Series ADF Part-1
No ratings yet
Interview Series ADF Part-1
17 pages
I&A Tech Solution Architecture Guidelines
No ratings yet
I&A Tech Solution Architecture Guidelines
321 pages
PROJECT 11 For Python
No ratings yet
PROJECT 11 For Python
22 pages
Azure Data Factory For Beginners
No ratings yet
Azure Data Factory For Beginners
250 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
Azure SQL
No ratings yet
Azure SQL
3,064 pages
Azure SQL
No ratings yet
Azure SQL
3,323 pages
Azure Data Factory Interview Questions & Answers - Claude
No ratings yet
Azure Data Factory Interview Questions & Answers - Claude
25 pages
ADF - Data Movt and IR
No ratings yet
ADF - Data Movt and IR
26 pages
Bharathbejjanki - ADF Developer
No ratings yet
Bharathbejjanki - ADF Developer
6 pages
Data Factory, Data Integration
100% (1)
Data Factory, Data Integration
2,034 pages
ADE Azure Data Engineer Interview
No ratings yet
ADE Azure Data Engineer Interview
12 pages
Auto Jack Loader Research Paper
No ratings yet
Auto Jack Loader Research Paper
6 pages
Connecting Artificial Intelligence To Value Creation in Services: Mechanism and Implications
No ratings yet
Connecting Artificial Intelligence To Value Creation in Services: Mechanism and Implications
28 pages
Nandini RM CV
No ratings yet
Nandini RM CV
1 page
Unit - 2 - 2.2 Structure System Design
No ratings yet
Unit - 2 - 2.2 Structure System Design
91 pages
JanuaryFebruary 2023
No ratings yet
JanuaryFebruary 2023
2 pages
Advance Deep Learning
No ratings yet
Advance Deep Learning
10 pages
C Lecture Notes Full 1
No ratings yet
C Lecture Notes Full 1
172 pages
Netflix Case Study Yash
No ratings yet
Netflix Case Study Yash
10 pages
Unit IV Chapter 1
No ratings yet
Unit IV Chapter 1
13 pages
IS Assignment 2022
No ratings yet
IS Assignment 2022
6 pages
Rishabh Singh: Experience
No ratings yet
Rishabh Singh: Experience
1 page
BSDS Curriculum
No ratings yet
BSDS Curriculum
8 pages
Lecture 1 Notes
No ratings yet
Lecture 1 Notes
2 pages
De Normalization
No ratings yet
De Normalization
7 pages
Emotion Recognition
No ratings yet
Emotion Recognition
31 pages
CS 3440 Graded Quiz Unit 3
No ratings yet
CS 3440 Graded Quiz Unit 3
7 pages
Big Data & Machine Learning Guide
No ratings yet
Big Data & Machine Learning Guide
12 pages
Ai Unit-Iii Notes
No ratings yet
Ai Unit-Iii Notes
8 pages
Basic MIS
No ratings yet
Basic MIS
33 pages
They Called Us River Rats
100% (1)
They Called Us River Rats
2 pages
Audio To Sign Language Tool
No ratings yet
Audio To Sign Language Tool
7 pages
Detailed Annual Teaching Plan IT 402 Class X
No ratings yet
Detailed Annual Teaching Plan IT 402 Class X
3 pages
WinDbg CheatSheet
No ratings yet
WinDbg CheatSheet
1 page
2 Huff
No ratings yet
2 Huff
3 pages
NCBI Databases Overview & Access Guide
No ratings yet
NCBI Databases Overview & Access Guide
7 pages
Spring Data MongoDB Cheat Sheet
No ratings yet
Spring Data MongoDB Cheat Sheet
1 page
Experiment 1 ML Tools Data Exploration
No ratings yet
Experiment 1 ML Tools Data Exploration
4 pages
MS - Time Table Spring 2025 CS Department
No ratings yet
MS - Time Table Spring 2025 CS Department
3 pages
#INTERACTION Handout With Quiz 14
No ratings yet
#INTERACTION Handout With Quiz 14
8 pages
Advanced Data Structures Syllabus
No ratings yet
Advanced Data Structures Syllabus
4 pages
Dr. Krishna Kanth's AI & ML CV
No ratings yet
Dr. Krishna Kanth's AI & ML CV
4 pages

Data Migration Project

Uploaded by

Data Migration Project

Uploaded by

On-Prem to Azure Data Migration Architecture

Here are the pipeline headings:

Step 3 → create adls gen 2 storage

step 4→ create key vault

After giving access review and create key vault.

step 5→ create a dedicated sql pool(i.e sql dw)

Step 7→ go to ADF and create Self hosted run time

once downloaded install in RDP

Linked service→ File system → self hosted

Similarly create for ADLS gen 2 using key vault →

there a option to enable the azure services,

Step 12→ execute the following scripts in testpool database in SSMS

-- Insert data into the 'metadata' table

-- Create the 'metadata_usp' stored procedure

-- Create the 'reset_status_usp' stored procedure

Step 13→ Create the pipeline in ADF

Inside for each → take copy activity → copy activity

Parameterize the directory → for file system

Add→ directory in form of

Sink→ storage path

Following the copy activity take store proc activity

Use→metadat_usp as store proc,

Next take the store procedure outside the for

Create logic app→go to resource→ create

Step→ go to ADF→ on failed of for each take a web activity

Step 15→ go to ADB →create notebook→

step→ create the ADLS mount point

create the scope by adding attend of url as →#secrets/createScope

Give the scope name → adlsgenkey

Step 16→Aim is to move data from raw to bronze

foldername = dbutils.widgets.get(' foldername')

# Count the number of rows in the source DataFrame

# Count the number of rows in the destination DataFrame

# Write the cleaned data to the destination path

# Exit the notebook with success message and counts

# Input widgets for folder name and processing date

print("Folder Name:", foldername)

# Create source and destination paths based on user input

# Destination path for writing processed data

# Load data from the source path

# Display the DataFrame

# Join dataframes if foldername is 'cust', otherwise use df as is

# Count rows in the destination DataFrame

print("Processing completed successfully.")

Step 19→ move data from Silver layer to Sql DW

# Load SQL data into the data warehouse

# Set source and destination paths for SQL data

# Read data from the source path

# Set Azure Storage account key

Create a pipeline similar to above here we give only 1 base parameter

You might also like