Chapter 1 Preface

Data Science is a popular buzzword, but the reality is that it has always existed in some form since the birth of applied statistics. Some might even argue that Data Science itself is nothing more than applied statistics applied at scale. The exact definition of data science should be left to other but it is important to talk about data science in regards to a working definition for this book. Data science then will be called applied statistics, predictive analytics and modeling applied to large data in order to derrive insights to understand the world and make better data driven decisions. It brings into its toolkit principles that emerged from statistics, computer science and the social sciences whiling taking advantage of large data sources and often open data sources. That’s that. There will always be detractors from this definition but at the very least it is a start for the book.

Now on to Institutional Research which is itself somewhat of a hard to define discipline. Some might even argue if Institutional Research is a discipline. Insitutional Research is definitely a discipline in search of a definition, but one definition that I tend to agree with is by Joe Saupe

“to provide informationm which supports institutional planning, policy formation, and decision making”"

This statement encompasses at least the core functions of any Institutional Research department. However, what separates Institutional Research from many other enterprises in analytics and data driven decision making is that generally the subjects of its study are student. These students are afforded all kinds of protections by the Family Educational Rights and Privacy Act (FERPA) can be minors. Often Institutional Research operates in a somewhat gray area regarding Insitutional Review Boards as the studies are certainly human subject based. In addition to these more focused studies there are studies on the “brass tacks” of the operation of Higher Education in the form of workload studies and government reporting (such as Integrated Postsecondary Education Data System called IPEDS )

The purpose of this book then is to be a practical guide for how to actually practice data science and applied statistics in Instititional Research. The programming language of choice is R. R is open source, freely available and a powerful tool. This should allow all readers of this book the ability to put the items discussed in this book directly into practice.