Analyzing Big Data with Microsoft R

During this course you will learn how to use Microsoft R Server to create and run an analysis on a large dataset, and how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.

Målgrupp

The primary audience for this course is people who wish to analyze large datasets within a big data environment. The secondary audience are developers who need to integrate R analyses into their solutions. 

Ämnesområden

Module 1: Microsoft R Server and R Client

This module gives an overview of how Microsoft R Server and Microsoft R Client work

  • What is Microsoft R server
  • Using Microsoft R client
  • The ScaleR functions

Module 2: Exploring Big Data

This module module covers how to use R Client with R Server to explore big data held in different data stores.

  • Understanding ScaleR data sources
  • Reading data into an XDF object
  • Summarizing data in an XDF object

Module 3: Visualizing Big Data

This module covers how to how to visualize data by using graphs and plots.

  • Visualizing In-memory data with ggplot2
  • Visualizing big data with rxLinePlot and rxHistogram

Module 4: Processing Big Data

This module explains how to transform and clean big data sets.

  • Transform big data using rxDataStep
  • Perform sort and merge operations over big data sets

Module 5: Parallelizing Analysis Operations

This module explains how to implement options for splitting analysis jobs into parallel tasks.

  • Use the rxLocalParallel compute context with rxExec
  • Use the RevoPemaR package to write customized scalable and distributable analytics.

Module 6: Creating and Evaluating Regression Models

This module covers how to build and evaluate regression models generated from big data.

  • Cluster big data to reduce the size of a dataset.
  • Create linear and logit regression models and use them to make predictions.

Module 7: Creating and Evaluating Partitioning Models

This module explains how to create and score partitioning models generated from big data.

  • Create partitioning models using the rxDTree, rxDForest, and rxBTree algorithms.
  • Test partitioning models by making and comparing predictions.

Module 8: Processing Big Data in SQL Server and Hadoop

This module covers how to transform and clean big data sets.

  • Using R in SQL Server
  • Using Hadoop Map/Reduce
  • Using Hadoop Spark

Förkunskaper

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices.
  • Basic knowledge of the Microsoft Windows operating system and its core functionality.

Boka kursen

Boka din plats redan idag.

Om kursen

Pris: 25 950,00 kr

exklusive moms

Längd 3 dagar
Kurskod M20773

Software AssuranceSA-voucher gäller på denna kurs
KompetenskortKompetenskort gäller på denna kurs 

Boka kursen

Välj ort och kursstart

lc LiveClass innebär att kursen hålls som en lärarledd interaktiv onlineutbildning.

5 mars

Kunduppgifter