1234567

R is one of the most popular, powerful data analytics languages and environments in use by data scientists. Actionable business data is often stored in Relational Database Management Systems (RDBMS), and one of the most widely used RDBMS is Microsoft SQL Server. Much more than a database server, it’s a rich ecostructure with advanced analytic capabilities. Microsoft SQL Server R Services combines these environments, allowing direct interaction between the data on the RDBMS and the R language, all while preserving the security and safety the RDBMS contains. In this book, you’ll learn how Microsoft has combined these two environments, how a data scientist can use this new capability, and practical, hands-on examples of using SQL Server R Services to create real-world solutions.

How this book is organized

This book breaks down into three primary sections: an introduction to the SQL Server R Services and SQL Server in general, a description and explanation of how a data scientist works in this new environment (useful, given that many data scientists work in “silos,” and this new way of working brings them in to the business development process), and practical, hands-on examples of working through real-world solutions. The reader can either review the examples, or work through them with the chapters.

Who this book is for

The intended audience for this book is technical—specifically, the data scientist—and is assumed to be familiar with the R language and environment. We do, however, introduce data science and the R language briefly, with many resources for the reader to go learn those disciplines, as well, which puts this book within the reach of database administrators, developers, and other data professionals. Although we do not cover the totality of SQL Server in this book, references are provided and some concepts are explained in case you are not familiar with SQL Server, as is often the case with data scientists.

Wee-Hyong Tok

Wee-Hyong Tok is a senior program manager on the SQL Server team at Microsoft. WeeHyong has a range of experiences working with data, with more than six years of data platform experience in industry and six years of academic experience. After obtaining his PhD in data streaming systems from the National University of Singapore, he joined Microsoft and worked on SQL Server Integration Services (SSIS). He was responsible for shaping the SSIS Server, bringing it from concept to its inclusion in SQL Server 2012. WeeHyong has published 20 academic papers and speaks regularly at technology conferences.

Buck Woody

Buck Woody is a senior technical specialist for Microsoft, working with enterprise-level clients to develop computing platform architecture solutions within their organizations. With more than 25 years of professional and practical experience in computer technology, he is also a popular speaker at TechEd, PASS, and many other conferences. Buck is the author of more than 500 articles and five books on databases and teaches a database design course at the University of Washington.

Debraj GuhaThakurta

Debraj GuhaThakurta is a senior data Scientist at Microsoft in the Algorithms and Data Science group. His effort focuses on the use of different platforms and toolkits such as Microsoft’s Cortana Intelligence suite, Microsoft R Server, SQL Server, Hadoop, and Spark for creating scalable and operationalized analytical processes for business problems. Debraj has extensive industry experience in biopharma and financial forecasting domains. He has a Ph.D. in chemistry and biophysics, and post-doctoral research experience in machine learning applications in bio-informatics. He has published more than 25 peer-reviewed papers, book chapters, and patents.

Danielle Dean

Danielle Dean is a senior data scientist lead at Microsoft in the Algorithms and Data Science group. She leads a team of data scientists and engineers on endto- end analytics projects that use Microsoft’s Cortana Intelligence Suite for applications ranging from automating the ingestion of data to analyzing and implementing algorithms, creating web services of these implementations, and integrating them into customer solutions or building end-user dashboards and visualizations. Danielle holds a Ph.D. in quantitative psychology from the University of North Carolina at Chapel Hill, where she studied the application of multilevel event history models to understand the timing and processes leading to events between dyads within social networks.

Gagan Bansal

Gagan Bansal is a data scientist leading the development of financial forecasting capabilities in Cortana Analytics at Microsoft. Gagan joined Microsoft from Yahoo Labs, where he was a lead engineer building and deploying large-scale user modeling and scoring pipelines on both grid (Hadoop) and stream scoring systems for display-ad targeting applications. Prior to Yahoo!, he worked on social targeting in online advertising at 33Across. Before that, he worked for another startup where he was involved in the development of real-time video processing algorithms for advertising in sports broadcasts. Gagan obtained his masters in computer science from Johns Hopkins University, where he worked on pedestrian detection in videos for his thesis. Before that, he graduated with a Bachelors in Computer Science degree from Indian Institute of Technology, Delhi. Gagan enjoys working on problems related to machine learning, large-scale data processing, computer systems, and image processing.

Matt Conners

Matt Conners is a senior data sciences program manager in Microsoft’s Algorithms and Data Sciences group. He is focused on the forecasting domain, working with customers, partners, and data scientists to operationalize machine learning financial forecasting solutions. He has extensive business operations and industry domain experience, with more than 20 years’ of financial technology experience across sales, marketing, business operations, securities, and banking. He has an undergraduate degree in economics, and master’s degrees in finance and statistics.