Data Science for Institutional Research
1
Preface
2
Introduction
2.1
Taking on the Discipline
3
APIs
3.1
What is it?
3.2
Knock, Knock
3.3
Where Do I Find Them?
3.4
How to use it
3.5
Commonly Used APIs
4
Cleaning and Transforming Data
4.1
Searching for issues
4.1.1
Checks for data integrity
4.2
Wrangling
4.2.1
Gather/ Spread
4.2.2
SQL-esque
4.3
Transformations
4.3.1
Logs
4.3.2
Square Roots
5
Import
5.1
Generic Files
5.2
Databases
6
Literature
6.1
Borrow From
7
Maps and Geospatial Analysis
7.1
Map It!
7.2
Like the census!
7.3
So What?
8
Computing
8.1
High Performance Clusters
8.1.1
Compute or Memory
8.2
Cloud Computing and Storage
8.3
On Premises Storage
9
Confounding
9.1
Impact
9.2
Matched Pairs
9.3
Propensity Scores
9.4
Regression
10
Methods
10.1
Linear Regression
10.2
Logistic Regression
10.3
Hierarchical Modeling
10.3.1
Bayesian Hierarchical Modeling
10.4
Being Certain about What we can
10.5
Hypothesis Testing
10.6
Dimensionality Reduction
10.6.1
Principal Components Analysis
10.6.2
Factory Analysis
10.7
Advanced Statistical Learning
10.8
Supervised Learning
10.8.1
Naive Bayes
10.8.2
Linear Discriminant Analysis
10.8.3
Decision Trees
10.8.4
Random Forest
10.8.5
Gradient Boosted Machines
10.8.6
Partial Least Squares Regression
10.8.7
Neural Netorks
10.9
Unsupervised Learning
10.9.1
Hierarchical Clustering
10.9.2
Cluster Analysis
11
Reproducibility and Documentation
11.1
A Common Reporting Structure
11.1.1
Executive Summary
11.1.2
Background
11.1.3
Method
11.1.4
Results
11.1.5
Conclusions
11.1.6
Recommendations
11.1.7
What is it good for?
11.2
Short Form Report
12
Dealing with Survey Data
12.1
Survey Analysis
12.1.1
Simple Random Sample
12.1.2
Stratified Random Sample
12.1.3
Post Stratification
12.1.4
Finite Population Corrections
12.1.5
Finite Population Correction
12.1.6
But is it representative
12.1.7
Non-Response Bias
12.1.8
Confirmation Bias
12.2
Constructs
13
Applications
13.1
Example one
13.2
Example two
14
Visualisation
15
Ethics
15.1
Instititional Research Board
16
Experiemental Design
16.1
The Fundamental Problem of Causal Inference
16.1.1
Draw it (DAGs)
16.1.2
Validity
16.1.3
16.2
Causal Diagrams
16.3
Power
17
Data Formatting Standards
17.1
Formatting
17.2
Some guidelines
18
Missing Data
18.1
Missing Completely at Random
18.2
Missing at Random
18.3
Missing Not at Random
18.4
Missing
18.5
Solutions
18.5.1
Multiple Imputation
18.5.2
Bayesian Missing Data Techniques
18.5.3
Can you go without?
18.6
Meaning in Missing?
19
Model testing
20
The Analysis Workflow
20.1
Answering (and asking) the Right Question
20.1.1
The Right Question
20.2
A Common Project Structure
20.2.1
Initiate with an project
20.2.2
File Structure
20.2.3
Function Writing
20.3
Version Control
21
Introduction to R and Rstudio
22
Final Words
References
Data Science for Institutional Research
Chapter 14
Visualisation
Tables
Charts
Exploratory Data Analysis
Graphics for Communication
Fonts