How To Scale Azure SQL Data Warehouse in Azure Data Factory
You can use Azure SQL Data Warehouse as part of your Azure Data Factory pipeline which is great, but you probably don’t want to have the data warehouse running at the maximum Data Warehouse Units (DWU) all the time, especially if the pipeline is not running on a frequent basis. I want to share with you some steps to enable scaling up and scaling down of SQL Data Warehouse right within your Data Factory pipeline.
Using Azure Functions to extract named entities from news articles
Recently a custom visual was released for Power BI that enabled browsing and analyzing collections of text. These visuals can provide a powerful set of tools for analysis.
Extracting data out of Excel Files using Azure Batch and Azure Data Factory
Sometimes you have a requirement to get data out of Excel files as part of your data ingestion process. Perhaps the business users create reports in Excel and then send them to you on a regular basis. Unfortunately though, there is not always a great mechanism to extract data out of Excel files, especially if you want to use the data as part of a data processing pipeline with Azure Data Factory. In this post I outline an approach to leverage and extract data out of Excel files as part of an Azure Data Factory pipeline.
Using Azure Batch to unzip large number of files
If you ever have a need to unzip a large number of files that are sitting in Azure Storage then one option is to use Azure Batch. In this post I will show how easy it is to create an application that leverages Azure Batch to unzip files sitting in Azure Storage and place the extracted files back into Azure Storage. A full working solution is available on my github repository here.
Using Azure Functions to clean up Azure Data Factory
When you build out a pipeline using Azure Data Factory you will have to associate it to a storage account. If, as part of your pipeline you are running certain jobs such as a HDInsight On Demand job it will, for each slice run, generate a container in the storage account for that run. This is great for debugging, but if you run a pipeline for any extensive period of time these job containers build up and you need a way to periodically purge them. Typically you will see the adf job container be called something like: adfname_of_factory-name_of_ondemand_service-timestamp