Extracting data out of Excel Files using Azure Batch and Azure Data Factory

Sometimes you have a requirement to get data out of Excel files as part of your data ingestion process. Perhaps the business users create reports in Excel and then send them to you on a regular basis. Unfortunately though, there is not always a great mechanism to extract data out of Excel files, especially if you want to use the data as part of a data processing pipeline with Azure Data Factory. In this post I outline an approach to leverage and extract data out of Excel files as part of an Azure Data Factory pipeline. ...

May 14, 2016 · 4 min

Using Azure Functions to clean up Azure Data Factory

When you build out a pipeline using Azure Data Factory you will have to associate it to a storage account. If, as part of your pipeline you are running certain jobs such as a HDInsight On Demand job it will, for each slice run, generate a container in the storage account for that run. This is great for debugging, but if you run a pipeline for any extensive period of time these job containers build up and you need a way to periodically purge them. Typically you will see the adf job container be called something like: adfname_of_factory-name_of_ondemand_service-timestamp ...

April 20, 2016 · 2 min