During this course you will learn how to use Microsoft R Server to create and run an analysis on a large dataset, and how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.
The primary audience for this course is people who wish to analyze large datasets within a big data environment. The secondary audience are developers who need to integrate R analyses into their solutions.
Module 1: Microsoft R Server and R Client
This module gives an overview of how Microsoft R Server and Microsoft R Client work
- What is Microsoft R server
- Using Microsoft R client
- The ScaleR functions
Module 2: Exploring Big Data
This module module covers how to use R Client with R Server to explore big data held in different data stores.
- Understanding ScaleR data sources
- Reading data into an XDF object
- Summarizing data in an XDF object
Module 3: Visualizing Big Data
This module covers how to how to visualize data by using graphs and plots.
- Visualizing In-memory data with ggplot2
- Visualizing big data with rxLinePlot and rxHistogram
Module 4: Processing Big Data
This module explains how to transform and clean big data sets.
- Transform big data using rxDataStep
- Perform sort and merge operations over big data sets
Module 5: Parallelizing Analysis Operations
This module explains how to implement options for splitting analysis jobs into parallel tasks.
- Use the rxLocalParallel compute context with rxExec
- Use the RevoPemaR package to write customized scalable and distributable analytics.
Module 6: Creating and Evaluating Regression Models
This module covers how to build and evaluate regression models generated from big data.
- Cluster big data to reduce the size of a dataset.
- Create linear and logit regression models and use them to make predictions.
Module 7: Creating and Evaluating Partitioning Models
This module explains how to create and score partitioning models generated from big data.
- Create partitioning models using the rxDTree, rxDForest, and rxBTree algorithms.
- Test partitioning models by making and comparing predictions.
Module 8: Processing Big Data in SQL Server and Hadoop
This module covers how to transform and clean big data sets.
- Using R in SQL Server
- Using Hadoop Map/Reduce
- Using Hadoop Spark
- Programming experience using R, and familiarity with common R packages
- Knowledge of common statistical methods and data analysis best practices.
- Basic knowledge of the Microsoft Windows operating system and its core functionality.