A few weeks ago, I wrote a blog describing how I used Git metadata and RapLeaf APIs to build a population statistics overview for a GitHub organization (Click here to view blog posts, view the data of each organization ). I have also tried different ways to intercept data and get a population statistics overview for each programming language rather than the organization. I'm curious about the stereotypes of developers who use different programming languages.
A few weeks ago, I wrote a blog describing how I used Git metadata and RapLeaf APIs to build a population statistics overview for a GitHub organization (Click here to view blog posts, view the data of each organization ).
I have also tried different ways to intercept data and get a population statistics overview for each programming language rather than the organization. There are a lot of stereotypes about developers using different programming languages, and I'm curious about how they are linked to reality. It is not difficult to analyze the basic information of the audience in each programming language, such as age, income, and gender:
- I use GitHub to estimate the programming language composition of each data Resource Library. For example, GitHub estimates that a project uses 75% of the Java language;
- I learned what programming language is used for each project in proportion to more than 50%, who is the developer of a project using this major language, and their income is accumulated;
- I then filtered out the programming languages that get more than 100 data points.
The following figure shows the income statistics, which are sorted from low to high based on the average household income:
Most of the rankings meet my expectation:
- Haskell is a very academic language, so it is not impressive in terms of income;
- PHP is a language that is easy to master and can be used by non-professional or novice programmers. Therefore, the income is relatively low.
- Java and ActionScript are considered as advanced languages and are mostly used for enterprise software development. Therefore, the revenue is quite high.
On the other hand, I am not very familiar with some low-end and high-end languages, such as XSLT, Puppet, and CoffeeScript, and I am not clear about the reasons behind their rankings.
We also need to see the limitations of using the data to draw conclusions:
- These projects are open-source and cannot be paid by closed program developers;
- Rapleaf data does not involve the total income information, so the samples may be deviated;
- I ignored the possibility of age, gender, and other factors causing data distribution to be biased;
- I have not analyzed all the GitHub data libraries, and user data as samples may not be representative.
In summary, even if there is a deviation in the absolute value, this is still the beginning of comparing the relative income differences between different programming languages.