This deck helps you study the material behind common data science interview questions. It focuses on data science theory, mostly statistics and machine learning, rather than practice (it contains no code). It contains general knowledge needed for problem solving, rather than specific problems.
Updated 9/20: Revised to make notes simpler, more concise, and more accurate. Many more pictures. Cards now have a reference link for easy access to source material. Split large cards into several parts.
Statistics (counting and probability, hypothesis testing, distributions)
Questions are grouped into sub-decks by topic, such as stats, supervised learning, and clustering. You can study just the sub-topics you want to learn and skip those you already know.
Questions are tagged as high-level (broad, important, conceptual knowledge), medium-level, and low-level (less common, details, equations). A good study strategy would be to start with all high-level questions, then move on to medium and low.
Material is sourced from around the web and from Data Science Interviews Exposed by You et. al. As you are studying the material, one great list of actual questions (and answers) on which to test your knowledge is Data Science Interview Questions & Detailed Answers.
Does not cover:
Data wrangling; Programming, engineering; Databases, SQL; Natural Language Processing; Deep Learning; Recommender Systems; Bayesian Methods; Time Series Analysis; Anomaly Detection; Visualization; Calculus; or the very basics.
Requires Anki 2.1+ for Mathjax equations.
Sample (from 117 notes)
Cards are customizable!
When this deck is imported into the desktop program, cards will appear as
the deck author has made them. If you'd like to customize what appears on
the front and back of a card, you can do so by clicking the Edit button, and
then clicking the Cards button.
Bootstrap aggregatingTrain multiple models on subsamples and average predictions to reduce variance.Usually uses "strong," low-bias models.