2014 has been a great year. I’ve been writing programs in various languages for the better part of 3 decades, though I would have never defined myself as a ‘programmer’. In 2013 I recognized a handful of projects that would simplify some of my work, so earlier this year I picked a couple and have completed them. The result is that I now have MUCH more confidence in my ability to write code and am pursuing more complicated projects to continue developing my skill set. Here’s a list of some of the things that I’ve completed.
In July I started taking classes within Coursera’s Data Science specialization, taught by 3 professors at Johns Hopkins University. Until now I’ve been taking 2 courses at a time, I just finished numbers 5 & 6. Before signing up for this certificate, I had read a number of negative reviews of the program on Reddit and within Coursera forums of other MOOC’s that I’d taken. That’s all bunk, this is a great program and I’m glad to have found it.
So far I’ve completed:
- The Data Scientist’s Toolbox
- R Programming
- Getting & Cleaning Data
- Exploratory Data Analysis
- Reproducible Research
- Statistical Inference
The first 5 of those courses were pretty easy for me. I have a background in programming and a couple of years of experience with R. I did pick up a couple of tips along the way, and the course has also introduced me to a couple of great packages that I’d been intending to learn about anyway.
The courses consist of video lectures where they explain concepts. You can download the accompanying presentation or even the video itself. There’s about an hour of videos for each week. For each idea that they’re presenting, they introduce the R code for working with the problem. Each week there is a quiz, and there is a course project as well (some courses have 2 projects).
The videos are well done, some show the slide being discussed, some show the instructor while talking and some show the desktop (or more specifically, RStudio).
The quizzes usually require some thought and a review of slides or notes, but honestly they aren’t difficult. You have 3 chances, so anyone that’s paying attention should be able to ace these. I usually spend about an hour on each quiz.
The projects are the real meat of the course. They provide a dataset and ask a few questions. You have to manipulate the data, find the answer and write a small report. The grading criteria is available, and they’re all yes/no questions, so you know exactly what you’re aiming for. You usually post your work on github or RPubs (which enforces that you know how to use git & knitr), and the reports are usually 2-4 pages. You’re showing blocks of R code and a plot or two, so the length of the reports hasn’t been difficult at all.
I’m just now getting to the point in the program that I’m most interested in. The Statistical Inference class was a good review for me, and a bit of a challenge (in a good way). I’m pretty eager to get into regression models and machine learning. From this point forward I’m only going to take one class at a time.
Will This Certificate Get Me a Job?
I’ve been asked this question a few times since starting the course. I’m not in a position to hire any data scientists so take this with a grain of salt, but if you understand the concepts in the course I believe you’ll be prepared to take on real analysis projects.
I can say that enrolling in this program has created a couple of opportunities for me. When I told my team that I signed up for this certificate, my manager shared ~2GB of data with me and said he needed to understand it. I cleaned it, provided some summary statistics and a few plots. That simple project has opened the door for others, now I’m working on predictive modeling and creating real time dashboards for my organization. In other words, this program has opened the door for work that I really enjoy and valuable experience.
I’m glad that I signed up for the program, and I’d encourage anyone that’s interested in data analysis to sign up as well. You can get this training for free, or you can pay $49 per course ($490 for all) and get an authenticated certificate. That’s well worth it in my opinion.
Any project involving data requires a specific format. Visualization libraries such at ggplot2 or matplotlib work with specific types of data. Any modeling or prediction is going to require a specific format. Most of the time a project requires several iterations of plotting or analysis, so data munging is a skill that you’ll use a LOT.