I was doing some research on an algorithm this morning and came across a new book that I wasn’t aware of. That prompted me to look for more. The list of what I came up with is below.
Each of these is free-as-in-beer, which means you can download the complete version without expectation for anything in return. I think most of them are available for purchase as well, if you prefer a hard copy. Some of them include code samples in R, Python or MATLAB.
Regardless of your background, skills or goals, there’s something for you in this list. Here they are, in no particular order.
- An Introduction to Statistical Learning with Applications in R by James, Witten, Hastie & Tibshirani – This book is fantastic and has helped me quite a bit. It provides an overview of several methods, along with the R code for how to complete them. 426 Pages.
- The Elements of Statistical Learning by Hastie, Tibshirani & Friedman – This is an in-depth overview of methods, complete with theory, derivations & code. I’d definitely consider this a graduate level text. I’d also consider it one of the best books available on the topic of data mining. 745 Pages.
- A Programmer’s Guide to Data Mining by Ron Zacharski – This one is an online book, each chapter downloadable as a PDF. It’s also still in progress, with chapters being added a few times each year.
- Probabilistic Programming & Bayesian Methods for Hackers by Cam Davidson-Pilson – This book is absolutely fantastic. The author explains Bayesian statistics, provides several diverse examples of how to apply and includes Python code. Each chapter is an iPython notebook that can be downloaded.
- Think Bayes, Bayesian Statistics Made Simple by Allen B. Downey – Another great, easy to digest introduction to Bayesian statistics. The author’s premise is that Bayesian statistics is easier to learn & apply within the context of reusable code samples. It includes a number of examples complete with Python code. 195 Pages.
- Data Mining and Analysis, Fundamental Concepts and Algorithms by Zaki & Meira – This title is new to me. It’s a text book that looks to be a complete introduction with derivations & plenty of sample problems. 599 Pages.
- An Introduction to Data Science by Jeffrey Stanton – Overview of the skills required to succeed in data science, with a focus on the tools available within R. It has sections on interacting with the Twitter API from within R, text mining, plotting, regression as well as more complicated data mining techniques. 195 Pages.
- Machine Learning by Chebira, Mellouk & others – This is an introduction to more advanced machine learning methods. It includes chapters on neural networks, discriminant analysis, natural language processing, regression trees & more, complete with derivations. Each chapter is downloadable as a PDF. 422 Pages.
- Machine Learning – The Complete Guide – This one is new to me. It’s a collection of Wikipedia articles organized into chapters & downloadable in a number of formats. I didn’t realize they did this, but its a great idea. Because its a collection of individual articles, it covers quite a bit more material than a single author could write. This is an incredible resource.
- Bayesian Reasoning and Machine Learning by David Barber – This is an undergraduate textbook. It includes an overview, derivations, sample problems and MATLAB code. 648 Pages.
- A Course in Machine Learning by Hal Daumé III – Another complete introduction to machine learning topics. Each chapter is individually downloadable. 189 Pages.
- Information Theory, Inference and Learning Algorithms by David J.C. MacKay – Nice overview of machine learning topics, including an introduction and derivations. One nice feature of this book is that it has a chart that shows how various topics are related to one another. 628 Pages.
I love it that so much material is available for free. All you need is time & motivation and you can be an expert on this topic. If you were to only select one book to pursue from this list, I’d recommend either of the first two.
========================
May 29, 2014: A couple of books have been mentioned in the comments that look noteworthy.
- Modeling with Data by Ben Klemens – Surprisingly, all of the code in this book is C, Klemens includes a section to defend this choice. The book includes plenty of code samples. 454 Pages.
- Mining of Massive Datasets by Rajaraman & Ullman – This book covers concepts and includes several domain specific examples. It includes plenty of derivation and little code. 493 Pages.






Please also check out BI and … which is available on Amazon.
Thanks for your comment Chandu. In this post I’m only listing free resources. I’m sure its a great book but this isn’t the place to mention items for sale.
Excellent resources. Thanks for taking the time to post. Just bookmarked this page.
Glad you enjoyed it Jacob. I’ve referred to it a few times myself since posting.
Thank you for listing these incredible resources! You don’t need formal education any more with such free resources and some curiosity to explore them!
Thank you for your great work I have benefited from the provision of this information and resources
Here is a link describing how to make your own Wikipedia book:
http://en.m.wikipedia.org/wiki/Help:Books
Awesome, thanks for sharing this!
Nice collection of resources!
There’s another free book that you might want to add to the list. This Modeling with Data
This looks great, thanks for the link Miguel.
Let’s harness the power of the user community;
Another free book is Mining of Massive Datasets.
Disclamer: I only read chapter 9, it is a tad dry.
Thank you for the aggregation work Chris!
Great find, thanks for sharing this Arthur. I’m definitely going to be referring to some of the examples in this book.
Hi Chris,
This is wonderful. Thanks a lot for posting such great resources.
Keep updating the list so that we can frequently visit and check for new books. I will also send you if any useful materials found.
Chris, excellent list and look forward to updates in the future.
Chris,
Thank you for sharing these links. Creative Commons is a wonderfull way for moving from a Global Brain toward a Global Heart.
Arnlodo
Hi Chris,
Truly a great collection of free resources in Data Science / Machine Learning domain. Definitely this list will greatly help students, practitioners and professionals.
A small change is needed to the link for the book (6th item) by Zaki & Meira. Looks like the download link is changed to
http://www.dataminingbook.info/pmwiki.php/Main/BookDownload
Good catch, thanks for letting me know Renga. I’ve corrected the URL.
Thanks a lot for the post, this was immensely helpful. Just curious if there are any links or websites where we can read about some live or dummy projects for practice on R or any other statistical application for predictive modelling
Hi Partha, thanks for the feedback. Kaggle always has at least one project aimed at beginners. These are great because some of them include tutorials for R or Python’s scikit.
Hi Chris,
Nice list.
You might find some of the free stuff on my web site to be of interest. There are pretty clean typed notes for machine learning, Bayes analysis, and time series/forecasting. In particular, the machine learning stuff is here:
http://www.analyticsiowa.com/course-materials/modern-multivariate-statistical-learning/
Best Regards,
Steve Vardeman
This looks great, thanks very much for sharing Steve!
Hi Chris,
Thanks much for the great collection. There is no free lunch, but free books are available around…
I also suggest one more…
Thanks, Linh
Hi Linh, and thanks for your contribution. That looks like a great book, but it isn’t completely free. The author is asking for information in exchange for the book, therefore it doesn’t belong on this list.
On the list, #12 and #15 are the same title. Thanks for publishing the collection.
Good catch, thank you Jeff. I’ve updated the post.
Thank you for giving information about those books.
This is great!! An entire bookshelf on a current, relevant field…all for free. Hats off to you Chris!