Data Science Interviews

This deck helps you study the material behind common data science interview questions. It focuses on data science theory, mostly statistics and machine learning, rather than practice (it contains no code). It does not contain specific problems, but instead the knowledge needed to solve them. Requires Anki 2.1+ for Mathjax equations. Questions are tagged as high-level (broad, important, conceptual knowledge), medium-level, and low-level (less common, details, equations). A good study strategy would be to start with all high-level questions, then move on to medium and low. Questions are grouped into sub-decks by topic, such as stats, supervised learning, and clustering. You can study just the sub-topics you want to learn and skip those you already know. Material is sourced from around the web and from "Data Science Interviews Exposed," by You et. al. As you are studying the material, a great list of actual questions (and answers) on which to test your knowledge is http://rpubs.com/JDAHAN/172473. Covers: Does not cover:

Front How does a random forest work?
Back Bagging multiple decision trees.Plus "feature bagging," selecting the split feature from a random subset of all features, to reduce correlation between trees
Tags high-level
Front Significance test for binary (binomial) outcomes (e.g., conversions out of visitors)
Back \(\chi^2\) with 1 d.o.f.or, for small samples or unbalanced classes, Fischer's Exact Test based on the hypergeometric distribution.
Tags medium-level
Front Describe something you've enjoyed in your previous experience
Back <your response here>
Tags high-level

