RevoScaleR - Misplaced Pages

RevoScaleR
Original author(s)	Microsoft
Initial release	2016; 8 years ago (2016)
Written in	Python
Platform	Windows, Linux
Available in	R
Website	docs.microsoft.com/en-us/machine-learning-server/r-reference/revoscaler/revoscaler

RevoScaleR is a machine learning package in R created by Microsoft. It is available as part of Machine Learning Server, Microsoft R Client, and Machine Learning Services in Microsoft SQL Server 2016.

The package contains functions for creating linear model, logistic regression, random forest, decision tree and boosted decision tree, and K-means, in addition to some summary functions for inspecting and visualizing data.

It has a Python package counterpart called revoscalepy. Another closely related package is MicrosoftML, which contains machine learning algorithms that RevoScaleR does not have, such as neural network and SVM.

In June 2021, Microsoft announced to open source the RevoScaleR and revoscalepy packages, making them freely available under the MIT License.

Concepts

Many R packages are designed to analyze data that can fit in the memory of the machine and usually do not make use of parallel processing. RevoScaleR was designed to address these limitations. The functions in RevoScaleR orientate around three main abstraction concepts that users can specify to process large amount of data that might not fit in memory and exploit parallel resources to speed up the analysis.

Compute Contexts

A compute context refers to the location where the computation on the data happens. It could be "local" (on the client machine) or "remote" (on a data platform such as a SQL server, or Spark). Pushing the computation to a remote server allows people to take advantage of the greater compute resources that a remote machine may have. If the data being analyzed reside on the same machine, using a remote compute context also removes the need to pull data across the network onto the client machine.

Data source

Data source defines where the data comes from. There are various data sources available in RevoScaleR, such as text data, Xdf data, in-SQL data, and a spark dataframe. People can wrap their data in a data source object and use that as run analytics in different compute context. Different data sources are available in different compute context. For example, if the compute context is set to SQL server, then the only data source one can use would be an in-SQL data source.

Analytics

Analytic functions in RevoScaleR takes in data source object, a compute context, and the other parameters needed to build the specific model, such as formula for the logistic regression or the number of trees in a decision tree. In addition to those parameters, one can also specify the level of parallelism, such as the size of the data chunk for each process or number of processes to build the model. However, parallelism is only available in non-express edition.

Limitations

The package is mostly meant to be used with a SQL server or other remote machines. To fully leverage the abstractions it uses to process a large dataset, one needs a remote server and non-Express free edition of the package. It cannot be easily installed such as by running "install.packages("RevoScaleR")" like most open source R packages. It's available only through Microsoft R Client, a distribution of R for data science, or Microsoft Machine Learning Server (stand-alone with no SQL server attached), or Microsoft Machine Learning Services (a SQL server services). However, one can still use the analytics functions in an Express, free version of the package.

References

"RevoScaleR package". Microsoft Corporation. Retrieved 2018-04-12.
Looking to the future for R in Azure SQL and SQL Server - Microsoft SQL Server Blog
"Compute context for script execution in Machine Learning Server". Microsoft Corporation. Retrieved 2018-04-12.

External links

Samples for using revoscalepy and microsoftml

Microsoft free and open-source software (FOSS)

Overview

Software

Applications	3D Movie Maker Atom Conference XP Family.Show File Manager Open Live Writer Microsoft PowerToys Terminal Windows Calculator Windows Console Windows Package Manager WorldWide Telescope XML Notepad
Video games	Allegiance
Programming languages	Bosque C# Dafny F# F* GW-BASIC IronPython IronRuby Lean P Power Fx PowerShell Project Verona Q# Small Basic Online TypeScript Visual Basic
Frameworks, development tools	.NET .NET Framework .NET Gadgeteer .NET MAUI .NET Micro Framework AirSim ASP.NET ASP.NET AJAX ASP.NET Core ASP.NET MVC ASP.NET Razor ASP.NET Web Forms Avalonia Babylon.js BitFunnel Blazor C++/WinRT CCF ChakraCore CLR Profiler Dapr DeepSpeed DiskSpd Dryad Dynamic Language Runtime eBPF on Windows Electron Entity Framework Fluent Design System Fluid Framework Infer.NET LightGBM Managed Extensibility Framework Microsoft Automatic Graph Layout Microsoft C++ Standard Library Microsoft Cognitive Toolkit Microsoft Design Language Microsoft Detours Microsoft Enterprise Library Microsoft SEAL mimalloc Mixed Reality Toolkit ML.NET mod_mono Mono MonoDevelop MSBuild MsQuic Neural Network Intelligence npm NuGet OneFuzz Open Management Infrastructure Open Neural Network Exchange Open Service Mesh Open XML SDK Orleans Playwright ProcDump ProcMon Python Tools for Visual Studio R Tools for Visual Studio RecursiveExtractor Roslyn Sandcastle SignalR StyleCop SVNBridge T2 Temporal Prover Text Template Transformation Toolkit TLA+ Toolbox U-Prove vcpkg Virtual File System for Git Voldemort VoTT Vowpal Wabbit Windows App SDK Windows Communication Foundation Windows Driver Frameworks KMDF UMDF Windows Forms Windows Presentation Foundation Windows Template Library Windows UI Library WinJS WinObjC WiX XDP for Windows XSP xUnit.net Z3 Theorem Prover
Operating systems	MS-DOS (v1.25, v2.0 & v4.0) Barrelfish SONiC Azure Linux
Other	ChronoZoom Extensible Storage Engine FlexWiki FourQ Gollum Project Mu ReactiveX SILK TLAPS TPM 2.0 Reference Implementation WikiBhasha

Licenses

Forges