Demystifying The Ways to Use R in the Microsoft Ecosystem

In 2015, Microsoft acquired Revolution Analytics. Microsoft R was a rebranding of Revolution R. Since the R landscape at Microsoft can be a bit confusing I want to try to lay it out simply. First, Microsoft R Server has been rebranded to Microsoft Machine Learning Server. At the time of writing, ML Server 9.2 was available.

So, what are the different ways to use R from Microsoft?

Microsoft R Open

  • This is the enhanced, open source, distribution of R from Microsoft.
  • It is based on, and extends, the R language. It contains the R language, compatible with all R packages, scripts and applications that work with the underlying version of R.
  • Contains a set of specialized packages to enhance the R experience, including multi-threaded math libraries and enhanced performance optimizations.

Microsoft R Open in Azure ML

  • You can execute R scripts as part of Azure Machine Learning Studio experiments.
  • This supports Microsoft R Open and CRAN.
  • Note that this is currently a couple of versions behind the latest R releases - supporting CRAN 3.1.0

RevoScaleR

  • Ships as part of Microsoft Machine Learning Server and Microsoft R Client.
  • A collection of portable, scalable, and distributable R functions for importing, transforming, and analyzing data at scale.
  • Can run it locally or remotely (for scale out etc.)
  • Remote context could be: Machine Learning Server, Spark, Hadoop, SQL Server

MicrosoftML

  • R pacakge that adds state-of-the-art data transforms, machine learning algorithms, and pre-trained models to R and Python functionality
  • Installed as part Machine Learning Server, Microsoft R Client and SQL Server Machine Learning Services.
  • Works in tandem with RevoScaleR.

mrsdeploy

  • R package for establishing a remote session and for publishing and managing a web service that is backed by R.
  • It comes installed and loaded with Microsoft R Client. On ML Server and SQL Server it is installed but not loaded by default.
  • It makes it easy to use a remote server for executing your jobs as well as making it very easy to publish your models as a web service to Machine Learning Server.

Microsoft R Client

  • This is a free data science tool built on top of Microsoft R Open.
  • Allows you to work with data locally and then offload to a remote compute context for more power.
  • You can use the RevoScaleR packages as part of this.
  • Its aim is to enable local development and exploration.

Microsoft Machine Learning Server

  • Standalone and installed on a computer not running SQL Server.
  • Enterprise data analysis at scale - providing high performance and enterprise robustness.
  • Supports R and Python.
  • Secure environment for deploying and operationalizing machine learning models.
  • Makes it easy to deploy your models as a web service.
  • Ability to scale out using either Spark, Hadoop, SQL Server, or multiple nodes of ML Server.
  • Microsoft Machine Learning Server stand-alone for Linux or Windows is licensed core-for-core as SQL Server 2017.
  • All customers who have purchased Software Assurance for SQL Server Enterprise Edition are entitled to use 5 nodes of Microsoft Machine Learning Server for Hadoop/Spark for each core of SQL Server 2017 Enterprise Edition under SA. In addition, we are removing the core limit per-node; customers can have unlimited cores per node of Machine Learning Server for Hadoop/Spark.

Microsoft SQL Server 2017 Machine Learning Services

  • Builds on R support in SQL Server 2016
  • Integrating Machine Learning Services in the database - includes R and Python support.
  • Can perform far better than conventional R because you can use server resources and RevoScaleR for scale out.
  • This is built into the database engine (vs. stand alone as described above)
  • Execute R scripts via sp_execute_external_script
  • Supports in-database package management
  • Supports native scoring via T-SQL PREDICT function - can predict without needing to load R environment.

PowerBI and R

  • The Power BI service supports viewing and interacting with visuals created with R scripts.
  • Note that in the service not all of the R packages are supported.
  • R visuals that are created in Power BI Desktop, and then published to the Power BI service, for the most part behave like any other visual in the Power BI service; you can interact, filter, slice, and pin them to a dashboard, or share them with others.

Azure Databricks

  • Can create notebooks and workflows using R or SparkR
  • Support of CRAN packages.
  • Leverage SparkR to take advantage of Spark (scale out etc.) for R jobs.

R with HDInsight

  • HDInsight includes an option to spin up a Machine Learning Server (previously called R Server) to integrate with your HDI cluster.
  • Execute R scripts with Spark/Hadoop compute context to distribute job across cluster.
  • Use the ScaleR functions from RevoScaleR package to ensure R functions run across cluster.

R in Azure Batch

  • doAzureParallel is a lightweight R package that allows you to use Azure Batch directly from your R session.
  • Built on top of the R foreach package - takes each iteration of the foreach loop and submits it as a Azure Batch task.
  • Leverage low priority VMs to significantly reduce the cost.
  • Azure Batch allows you to create a pool of VMs which you can use to run jobs in parallel achieving better scale out and more efficiency.