Möt och lär av en internationellt erkänd mästare! Nu har du möjlighet att gå en exklusiv masterclass av och med SQL- och BI-gurun Rafal Lukawiecki. Utbildningen är en intensiv fyra-dagarskurs där du lär dig det senaste inom Azure Machine Learning, SQL data mining och Revolution Analytics R Software.
You will learn machine learning, data mining, some statistics, data preparation, and how to interpret the results. You will see how to formulate business questions in terms of data science hypotheses and experiments, and how to prepare inputs to answer those questions. We will cover common issues and mistakes, how to resolve them, like overtraining, and how to cope with rare events, such as fraud. At the end of this course you will be able to plan and run data science projects.
As a practicing data miner, Rafal will also share his decade of hands-on experience while teaching you about Azure Machine Learning (Azure ML) which is the foundation of Cortana Analytics Suite, and its highly-visual, on-premises companion, the SQL Server Analysis Services Data Mining engine, supplemented with the free open source and Cortana’s Revolution Analytics R software. We will use some Excel, however, most of our time will be spent in ML Studio, some in R, RStudio, SSDT, SSMS, and the Azure Portal.
While we will use a little of Excel, too, we will not focus on it, as we will spend most of our time in SSDT, SSMS, R, and, of course, in ML Studio. Above all, this course will focus on the process of analytics, consisting of problem formulation, data preparation, modelling, validation, and deployment. All fundamental concepts will be introduced, including a good discussions of inputs and predictive outcomes, cases, observations and signatures, algorithm classes and key algorithms in each class, approaches to validation including balancing fundamental statistics of classifiers, such as lift charts, ROC curves or precision-recall charts.
At the end of these four days you will be able to begin your own advanced analytical projects using the best combination of software, statistics, and algorithmic approaches.
About Rafal Lukawiecki
As Strategic Consultant at Project Botticelli Ltd (projectbotticelli.com), Rafal focuses on making advanced analytics easy, insightful, and useful, helping clients achieve better organizational performance. Passing those skills to consultants, developers, and board members is important to him. He specializes in business intelligence, looking for valuable patterns and correlations using data mining, and he is also known for his work in cryptography, enterprise architecture, and solution delivery. Rafal has been a popular, well-travelled speaker at major IT conferences since 1998. He even had the honour of sharing keynote platforms with Bill Gates, Neil Armstrong, and Steve Ballmer. A natural educator, he explains complex concepts in simple terms in an engaging, enjoyable, energetic style. Outside IT, Rafal spends a quarter of every year finding abstractions in natural landscapes, expressing them through traditional, black-and-white, large-format lm photography in his hand-made, silver-gelatin prints—see rafal.net.
Analysts, analytical power users, predictive developers, BI power users and developers, budding data scientists, consultants.
Although there are no formal prerequisites to attend this course because everyone will benefit from the lectures and the discussions, you will find that if you want to follow the demos and examples on a PC a certain knowledge of SQL Server Data Tools and Excel as well as basic knowledge of writing SQL queries will help. It will also help if you have some experience of analytical projects, and if you have existing questions in mind, that you would think predictive analytics and machine learning could answer.
60% lectures interspersed with 10% demos, plus approximately 30% time allocated for you to practice the demos while Rafal helps you resolve any issues and answers group questions. You will be provided with a PC, but you are welcome to bring your own laptop instead, too. You should have your own account with access to Azure Machine Learning configured (both the free and paid-for versions are acceptable.) You do not need to practice: if you prefer you can use the available time for a discussion of your own data and projects. You are free to take our data samples and PPT slides, but no formal notes or workbooks are provided. A follow-up book-reading list will be shared.
Please note that this agenda is subject to last-minute alterations to best suit the needs and the flow of this live classroom course. Learning points marked with an asterisk (*) are optional, and will be covered subject to interest and time remaining.
Day 1: Overview of Practical Data Science for Business
- Introduction to data science and its components (machine learning/data mining, statistics, big data and data wrangling)
- Team, process, and tools
- Inputs and outputs
- Stating business question in data science term
- Scientific method and experiments
- Data formats: cases/observations, signatures.
- High-level overview of algorithm classes (classifiers, clustering, regression, recommenders)
- Moving data around and its storage
- Getting started with and using Azure ML, SSAS DM, and R (structures, models, data flow)
Day 2: Segmentation and Classification
- Introduction to segmentation
- Clustering algorithms (k-means, EM, and others)
- Interpreting clusters (cluster characteristics, discrimination, tornado charts)
- Introduction to classifiers (two-class and multi-class)
- Key algorithms (decision trees/forests, neural networks, naïve Bayes, boosting, and others)
- Class imbalance problem (fraud analytics and rare event prediction) *
Day 3: Model Validation and Statistics
- Descriptive statistics with R
- Interpreting classifier quality
- Testing model accuracy, reliability, and usefulness
- False positives vs. false negatives: classification (confusion) matrix
- Charting precision-recall (sensitivity-specificity) with ROC curves, lift charts, and precision-recall charts
- Optimising binary classifier thresholds for a known business goal of a prediction quality
- Refining models to improve accuracy and reliability *
- Avoiding over-training (over-fitting) in critical situations *
Day 4: Regressions, Recommenders, Other Algorithms, Production & Model Maintenance
- Deploying models to production (Azure ML web services, DMX queries, PMML)
- On-going maintenance and model updates *
- Introduction to recommender concepts
- Key recommender algorithms (association rules, collaborative filtering, matchbox recommenders, associative decision trees, Market Basket Analysis)
- Understanding itemsets and rules
- Rule importance vs. rule probability
- Introduction to simple regressions
- Key regression algorithms (linear regression, regression decision trees)
- Measuring linear regression quality (R-squared, predictor p-values, additional testing using R) *
- Briefly: remaining algorithms of interest: sequence clustering, SVM, perceptrons, Bayes point machine *