2014 has been a great year. I’ve been writing programs in various languages for the better part of 3 decades, though I would have never defined myself as a ‘programmer’. In 2013 I recognized a handful of projects that would simplify some of my work, so earlier this year I picked a couple and have completed them. The result is that I now have MUCH more confidence in my ability to write code and am pursuing more complicated projects to continue developing my skill set. Here’s a list of some of the things that I’ve completed.
Web Project with PHP
I was talking with a co-worker about workflows and together we realized that there was an opportunity for simplification. It took me about a week to think through what was needed, and about a month to build and test a simple web app that does much of the heavy lifting for her. I created this site using CodeIgniter, which was a great choice for this particular project. The result is a simple form and some simple processing. Later I added integration with bitly to save her an additional step. She likes the tool enough that she has shared it with her team and has publicly praised my work a number of times.
I’ve been using PHP for more than a decade. Choosing a framework for this project was great because it forced me to get comfortable with the MVC pattern of web apps. CodeIgniter in particular simplified application security, and it’s well documented so I never got stuck.
I could take this further by either integrating directly with the social platforms to simplify broadcasting of messages, or making her work easily sharable via email notifications.
Predictive Modeling with R
A year ago my manager asked me how we can show that a portion of the work my team does should be cut. It took me a while to wrap my head around how to solve this, but I decided to build a predictive model using R in order to show which efforts generate results for us. This took me quite a while, but man was it fun. I asked our BI team for data, which they responded to with a 750 MB file (awesome!). I cleaned the data through a combination of PowerShell & R scripts. Then I split it into testing & training data sets & created models via linear regression, lasso, naive Bayes and random forests. After evaluating the success of each I chose random forests and generated summary plots via ggplot2. My team has begun to shift focus.
This was great because it forced me to become more familiar with data.tables and evaluating models. Also, I’ve done each piece of this project many times over the past 4-5 years, but this was the first time I completed them in this order. I learned a lot in the process, and have referred back to the code & methods that I used dozens of times.
I could take this further by simplifying the data via dimension reduction, recreating models with support vector machines, and automating the scripts so that this can be quickly updated at regular intervals.
Building a Data Store
My team has been working toward improving the information available from our analytics packages. Some of the things we’d like to offer aren’t easily accomplished within the tools we have, so we’ve decided to build some systems in-house. We’re now generating log files (at least) daily. Initially I thought I’d be working with this data in SQL Server, so that’s where I began storing the data. I quickly realized that was messy & prone to errors. I wrote logic in Python scripts to handle the most common errors, and notify me of anything else. In my research I realized that NoSQL would be much easier, so I installed mongoDB and wrote a Python script to push there as well. Now we’re storing all data in both SQL Server and mongoDB until we’re confident that one or the other will fully meet our needs.
The fun thing about this project is that I’m learning to work with a LOT of data (some days we have several GB). This has sharpened my skill with logic, Python and SQL. I’ve gotten stuck a couple of times along the way, but each time I’ve been able to figure out a solution within a day.
I will take this further by automating all of my scripts and improving the error handling. This project is still just getting legs though. Eventually I’ll need to shift attention from data processing to cleaning and analysis. Initially I’ll be manually creating charts and models. Later I envision bringing MapReduce into the project and creating at least a couple of dashboards. At some point we’ll need to add more servers to handle the volume of data. This is definitely a fun project!
I doubt all of these projects will get more attention from me. Some of them certainly will. Others will pop up along the path. I’m very much looking forward to 2015.