Task 03: Using data pipelines/data flow for data ingestion
There are multiple ways to ingest data into a Lakehouse, and in this exercise, Contoso focuses on using data pipelines and data flow to efficiently funnel diverse datasets into their system, setting the stage for advanced analytics and insights.
-
In the lower left of the navigation pane for the workspace, select Data Engineering and then select Data Factory.
-
Select the Data pipeline tile.
-
Enter Azure SQL DB Pipeline for the pipeline name and then select Create.
-
Select the Copy data assistant tile.
-
On the Choose data source page, select Azure SQL Database. You may need to scroll down to see the Azure SQL Database option.
-
Configure the connection by using the values in the following table. Leave all other settings at their default values.
If there is no value listed for the Server setting, right-click the instructions pane in the lab environment and select Refresh.
Default Value Server [Your SQL Server Name] Database Adventureworks Authentication kind Basic Username [Your SQL Admin Username] Password [Your SQL Admin Password] -
Select Next. Close any pop-up windows that display and wait for the connection to be created.
-
On the Connect to data source page, select Tables.
-
Select Select all. Clear the dbo.BuildVersion and dbo.ErrorLog checkboxes and select Next.
-
On the Choose data destinations page, search for and select Azure Data Lake Storage Gen2
-
On the Connect to data destination page, enter the following to create a new connection:
Default Value URL [Your ADLS Gen2 URL] Authentication kind Organizational account The connection URL for the Data Lake Storage account can be located here: Storage account > Settings > Endpoints > Data Lake Storage.
-
Select Sign in.
-
Select the account that’s already authenticated and then select Next.
-
On the Connect to data destination page, next to the Folder path box, select Browse.
-
Select medallion > bronze and then select OK.
-
In the File name suffix box, enter .csv and then select Next to test the connection.
-
Select Next and then Select Save + Run. After a brief delay, the Pipeline Run window displays.
-
In the Pipeline run window, select OK. The pipeline will start processing.
-
On the upper-right of the page, select Notifications. You can use the Notifications area to monitor the pipeline.
-
At the bottom left of the page, select Data Factory. The, in the Synapse section, select Data Engineering.
-
In the left navigation pane for the Synapse Data Engineering Home page, select Monitor.
You may need to select the ellipses (…) icon display the Monitor option.
The name for this page is in flux. You may see Monitor or you may see Monitoring Hub.
-
Verify that the value in the Status field for the pipeline is Succeeded.
Please wait for the pipeline to execute. If the notification continues to say it’s running after 10 minutes, check the monitoring hub for a succeeded status.
-
After the status shows Succeeded, your data has been transferred from Azure SQL Database to ADLS Gen 2.
Similarly, you can get data into the Lakehouses using pipelines from various other sources like Snowflake, Dataverse, and so on.