Update 8/21: I’ve gotten a lot of feedback about issues with these rankings from comments, and have tried to address some of them here. The data there has been updated to include confidence intervals.
A few weeks ago I described how I used Git commit metadata plus the Rapleaf API to build aggregate demographic profiles for popular GitHub organizations (blog post here, per-organization data available here).
I was also interested in slicing the data somewhat differently, breaking down demographics per programming language instead of per organization. Stereotypes about developers of various languages abound, but I was curious how these lined up with reality. The easiest place to start was age, income, and gender breakdowns per language. Given the data I’d already collected, this wasn’t too challenging:
- For each repository I used GitHub’s estimate of a repostory’s language composition. For example, GitHub estimates this project…
View original post 380 more words