Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Task 03: Using data pipelines/data flow for data ingestion

There are multiple ways to ingest data into a Lakehouse, and in this exercise, Contoso focuses on using data pipelines and data flow to efficiently funnel diverse datasets into their system, setting the stage for advanced analytics and insights.

  1. In the lower left of the navigation pane for the workspace, select Data Engineering and then select Data Factory.

    selectDataFactory.jpg

  2. Select the Data pipeline tile.

    dataFactory_pipeline.jpg

  3. Enter Azure SQL DB Pipeline for the pipeline name and then select Create.

  4. Select the Copy data assistant tile.

    Copydataassistant.png

  5. On the Choose data source page, select Azure SQL Database. You may need to scroll down to see the Azure SQL Database option.

    selectSQLDatabase.jpg

  6. Configure the connection by using the values in the following table. Leave all other settings at their default values.

    If there is no value listed for the Server setting, right-click the instructions pane in the lab environment and select Refresh.

    Default Value
    Server [Your SQL Server Name]
    Database Adventureworks
    Authentication kind Basic
    Username [Your SQL Admin Username]
    Password [Your SQL Admin Password]

    selectSQLDatabase2.jpg

  7. Select Next. Close any pop-up windows that display and wait for the connection to be created.

  8. On the Connect to data source page, select Tables.

  9. Select Select all. Clear the dbo.BuildVersion and dbo.ErrorLog checkboxes and select Next.

    selecttables.jpg

  10. On the Choose data destinations page, search for and select Azure Data Lake Storage Gen2

    adlsgen2.jpg

  11. On the Connect to data destination page, enter the following to create a new connection:

    Default Value
    URL [Your ADLS Gen2 URL]
    Authentication kind Organizational account

    The connection URL for the Data Lake Storage account can be located here: Storage account > Settings > Endpoints > Data Lake Storage.

    endpoint.jpg

  12. Select Sign in.

  13. Select the account that’s already authenticated and then select Next.

  14. On the Connect to data destination page, next to the Folder path box, select Browse.

  15. Select medallion > bronze and then select OK.

  16. In the File name suffix box, enter .csv and then select Next to test the connection.

    selectbronzeandcsv.jpg

  17. Select Next and then Select Save + Run. After a brief delay, the Pipeline Run window displays.

    save+run.jpg

  18. In the Pipeline run window, select OK. The pipeline will start processing.

  19. On the upper-right of the page, select Notifications. You can use the Notifications area to monitor the pipeline.

    notifications.jpg

  20. At the bottom left of the page, select Data Factory. The, in the Synapse section, select Data Engineering.

  21. In the left navigation pane for the Synapse Data Engineering Home page, select Monitor.

    You may need to select the ellipses () icon display the Monitor option.

    The name for this page is in flux. You may see Monitor or you may see Monitoring Hub.

  22. Verify that the value in the Status field for the pipeline is Succeeded.

    Please wait for the pipeline to execute. If the notification continues to say it’s running after 10 minutes, check the monitoring hub for a succeeded status. 0faouzwm.png

  23. After the status shows Succeeded, your data has been transferred from Azure SQL Database to ADLS Gen 2.

    completedtransfer.jpg

    Similarly, you can get data into the Lakehouses using pipelines from various other sources like Snowflake, Dataverse, and so on.