top of page
  • Facebook
  • Linkedin
Search

Azure Data Factory: Your Solution for Cloud-Based Data Integration

Writer's picture: RishRish


What is Azure Data Factory? 

A cloud-based data integration Service that orchestrates data movements & transformation between diverse data sources and cloud compute resources at scale.   


Integration



Let's Consider a Scenario 


A game development company has petabytes of game log data on which they need actionable insights into their customers gaming preference, player demographics and user behaviors. Analysts can utilize this data to make calls, such as to develop new features to improve player experience which will upsell and drive business growth. 

 

How does ADF work 


ADF is a set of interconnected systems, that give you an end-to-end platform for your data engineering. 

  • Connect and Collect 

    • Build an information production system to connect all data sources and adapt ingest to their diverse intervals and speed. 

    • Collect all data in a centralized location to facilitate processing. 

  • Transform and Enrich 

    • Transform the collected data using actions to aggregate, filter, clean etc. 

    • Use code free UI based mapping transformation graph creation or use compute to transform data by hand. 

  • CI/CD and Publish 

    • Use data pipeline OPS using Azure DevOps and GitHub to incrementally develop your ETL process then publish processed data into Azure Data Warehouse, Azure SQL Database, Azure Cosmos DB or your BI analytics engine. 

  • Monitor and Alert 

    • Use Azure Monitor, API, Power Shell, Health Panels on Azure Portal. So, once your data pipeline is created and deployed, monitor it for success and failure rates. 

 

Components of ADF 


  • Pipelines: A pipeline is a logical grouping of activities that perform one unit of work. 

  • Activities: Represents a single processing step in the pipeline. 

  • Datasets: Represents data structure giving select view into datastore. Points to the specific data subset to use in activity for input and output. 

  • Linked Servies: Represents connection information to link ADF to external services. Works closely with dataset defines what data looks like. 

  • Mapping Data Flows: Create and manage data transformation graphs that work on any size data. Buildup reusable library of data transformation routines. 

  • Integration Runtime: Integration runtime is the compute infrastructure used by ADF to provide various data integration capabilities across different network environments. (IR Types: Self Hosted, Azure SSIS, Auto resolve) 

 

Additional Concepts 


  • Control Flow: Refers to the orchestration of pipeline activities like chaining activities, defining parameters, branching activities etc. 

  • Pipeline Run: An instance of pipeline execution. Initiating your pipeline manually or by defining triggers. 

  • Trigger: Unit of processing that determines when to kick off pipeline runs. 

  • Parameters: Key value pairs populated from run context at execution. 

  • Variables: Used inside pipelines to store temporary values, state. 

 

Benefits of Using ADF 


  • Enterprise Ready: Data integration at cloud scale. 

  • Enterprise Data Ready: 100+ Data connectors. 

  • Code Free Transformation 

  • Run Code on Any Azure Compute 

  • ADF can Make Data OPS Seamless 

  • SSIS Packages runs on Azure: Rehost On-prem in ADF in 3 steps 

  • Secure Data Integration: Managed Virtual Network protects against data exfiltration, simplifying your networking.

4 views0 comments

Recent Posts

See All

Comments


Contact Us

Thanks for submitting!

1915 140th AVE NE, Suite D2 - 610, Bellevue, WA, 98005

Tel. 864-372-9542

bottom of page