Diamonds in a Sea of Silver; Handpicked Resources for Data Science

Resources for data science are anything but scarce. However, finding concise, to-the-point, useful, and thought-provoking resources isn’t as common as one might think. Below, I’ve curated a list of resources that I’ve found succinct, straightforward, practical, and stimulating.

This post will be updated periodically as I discover new gems.


Inference vs Prediction

Breiman, L. (2001). Statistical modeling: The two cultures. Statistical science, 16(3), 199-231.

Shmueli, G. (2010). To explain or to predict?. Statistical science, 289-310.

Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122.

Ij, H. (2018). Statistics versus machine learning. Nature Methods, 15(4), 233.

Statistics Papers

Cinelli, C., Forney, A., & Pearl, J. (2024). A crash course in good and bad controls. Sociological Methods & Research, 53(3), 1071-1104.

Shmueli, G. (2010). To explain or to predict?.

Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3), 199-231.

Machine learning

Lantz, B. (2013). Machine learning with R: learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications.
A great source to start with. Its explanation , for example about KNN, are very easy to understand.

Data Sources for Cross-Cultural Research on Threats

Big Data for Psychology

Sample Size Determination and Power Analysis

Random Variables and Probability Distributions

Software Tutorials

Data Preprocessing

Phylogenetic Non-Independence

Presenting the Results