Tag Archives: python

The Year of Code

computer programming2014 has been a great year. I’ve been writing programs in various languages for the better part of 3 decades, though I would have never defined myself as a ‘programmer’. In 2013 I recognized a handful of projects that would simplify some of my work, so earlier this year I picked a couple and have completed them. The result is that I now have MUCH more confidence in my ability to write code and am pursuing more complicated projects to continue developing my skill set. Here’s a list of some of the things that I’ve completed.

Continue reading

Scathing Matlab Review from a Google Employee

Google-MatlabA screen capture of this email was posted to the datascience subreddit earlier today.

Re: students’ preparedness for internships. Interesting that they all know Matlab. I have some strong opinions about that, but I will save them until the next paragraph. Software engineer candidates who list Matlab as their primary language when they show up at Google for an interview will be treated with suspicion, so they should stress the C++ and Python instead. Good that the stats people know R, it seems to be the industry standard.

Matlab rant coming now: you need to stop teaching your students Matlab now. Matlab is a broken, outdated language that is proprietary and has extortionate pricing policies for licenses outside education. The language has been completely superceded by modern languages in the numerical computing space, such as the numerical extensions to Python (numpy etc), and Julia. Matlab only still exists for two reasons: one is the large amounts of legacy code at big defense contracting firms that is too expensive to rewrite, and the other is academic institutions who get sucked in by the free or cheap software licenses and keep exposing their students to it despite it being a relic that deserves to die. The language had some very big mistakes baked into it when it was first designed back in 1985, which the company is afraid to fix because of the legacy code base issue; computer scientists look at it with despair, to be honest, because it would have earned a B- in a language design class even back then. Optimizing compilation will never work properly with Matlab, for instance, because of one of the mistakes in the language. It does a few things very fast (matrix multiplication and solving Ax=b) but it is painfully slow at many other things (function calls, most critically) and its design promotes terrible software practices. On top of all this, it’s not even free: it’s a very expensive language to use unless you’re an academic. (By comparison, R is also a broken language, in some similar ways, but it at least has the saving grace of being open source.) If students turn up at Google (or any other software company that isn’t a Matlab shop – mostly just defense contractors these days, and one hedge fund I know of) listing Matlab as their language and they want a software engineer job, they will be treated as “might be able to program, but probably not”.

Not very gentle. I’m not surprised at the position, while I’ve never worked for Google they’re a company very focused on programmers. Matlab isn’t a general purpose language, and it wasn’t designed by a computer scientist. I am somewhat surprised at the vitriol though. However functional the language, it is useful in engineering. I used it a fair amount in college, and while it wasn’t my favorite it handled structural engineering problems easily. I’m also surprised that anyone wanting to work for Google would list it as a primary language on their resume.

How I Learned Data Munging

cleaning dataAny project involving data requires a specific format. Visualization libraries such at ggplot2 or matplotlib work with specific types of data. Any modeling or prediction is going to require a specific format. Most of the time a project requires several iterations of plotting or analysis, so data munging is a skill that you’ll use a LOT.

Continue reading

Scraping Page Title & Meta Description with Google Docs

Last week I posted several scripts useful for cleaning up URL data with PowerShell. If you work in search marketing one of the next logical steps is to gather data on these URLs. For example, doing a content audit on your website. There are (at least) a hundred ways to scrape content from the web. One of the easiest methods that I’ve found is within Google Drive, using the IMPORTXML() function and a couple of XPath queries.

Scrape page titles & meta descriptions with Google Drive. Continue reading