Extracting data out of Excel Files using Azure Batch and Azure Data Factory
Sometimes you have a requirement to get data out of Excel files as part of your data ingestion process. Perhaps the business users create reports in Excel and then send them to you on a regular basis. Unfortunately though, there is not always a great mechanism to extract data out of Excel files, especially if you want to use the data as part of a data processing pipeline with Azure Data Factory. In this post I outline an approach to leverage and extract data out of Excel files as part of an Azure Data Factory pipeline.
Using Azure Batch to unzip large number of files
If you ever have a need to unzip a large number of files that are sitting in Azure Storage then one option is to use Azure Batch. In this post I will show how easy it is to create an application that leverages Azure Batch to unzip files sitting in Azure Storage and place the extracted files back into Azure Storage. A full working solution is available on my github repository here.
Using Azure Functions to clean up Azure Data Factory
When you build out a pipeline using Azure Data Factory you will have to associate it to a storage account. If, as part of your pipeline you are running certain jobs such as a HDInsight On Demand job it will, for each slice run, generate a container in the storage account for that run. This is great for debugging, but if you run a pipeline for any extensive period of time these job containers build up and you need a way to periodically purge them. Typically you will see the adf job container be called something like: adfname_of_factory-name_of_ondemand_service-timestamp
Using Azure Resource Manager templates for deploying Drupal 7
This post is about automating deployment into Azure for Drupal 7. This will focus on leveraging Azure Resource Manager (ARM) and ARM templates to define the infrastructure as well as wire up the continous deployment processes.
Azure Resource Manager template apiVersion
If you have been working with Azure Resoure Manager templates then you will have come across the need for apiVersion property on all resources. You will also have noticed that this is not consistent between resources.