Accurately count GitHub contributors ' lines of code

Source: Internet
Author: User
Tags commit json git commands

GitHub's warehouse is the number of lines of code that can count each contributor, and at the company's annual meeting, a "Code God Award" was awarded to the engineer who contributed the most code last year, and the statistics from GitHub show that the great God submitted the code to 110w last year, which is amazing. A person can't write so much code, I was very curious to study, found in the middle also included many of his third-party library, but GitHub is also counted, and after his combined code will be counted in. Then is there any way to get rid of these invalid data, to obtain the actual code contribution amount. Check the GitHub API, then combine GIT commands, or you can, on the code:

#copy this script to your target repo #run python github-stats.py to collect data import re import JSON import OS import s YS Import requests #get token from cmd line tk = sys.argv[1] user_stats={"dummy": {"additions": 0, "deletions": 0, "Total": 0} } #query GitHub API for last year ' s commits payload = {' Since ': ' 2013-01-01t00:00:00z ', ' until ': ' 2014-01-01t00:00:00z ', '  Access_token ': tk} token = {' Access_token ': tk} def is_merge (commit_sha): cmd = "git show--oneline" + Commit_sha output = Os.popen (cmd) title = Output.read () P_merge = re.compile ("merge") if (P_merge.search (title) is not None): return Tr

		UE Else:return False def collect_stats (commit_list): for M in commit_list: #print user_stats #print m[' Sha '] #print Data if (Is_merge (m[' Sha ')): Continue Git_show_command = "git show-s--format=%an" + m[' sha ') output = Os.popen (git_show_command) user = Output.read (). Strip (' \t\n\r ') #print user #r2 = Requests.get (commit_request_ap
i+m[' Sha '], params = token)		#commit = R2.json () #print Commit git_diff_command = "git diff--shortstat" +m[' sha ' + "" + m[' sha ' + "^" ou Tput = Os.popen (git_diff_command) data = Output.read () #print "data is:" #print data P_ins = Re.compile ("(\d +) insertion ") R_ins = P_ins.search (data) ins_data = 0 Del_data = 0 if (R_ins is not None): Ins_str = R_ins. Group (1) ins_data = Int (ins_str) #print ins_data P_del = Re.compile ("(\d+) deletion") R_del = P_del.search (d ATA) if (R_del is not None): Del_str = R_del.group (1) del_data = Int (del_str) #print del_data if (ins_dat
		  A + Del_data >: Print user print ' Ins: ' +str (ins_data) print ' del: ' +str (del_data) ins_data = 0 Del_data = 0 if (user in user_stats): stats = user_stats[user] stats[' additions '] + = Ins_data stats[' Deletio NS '] + = del_data stats[' total ' + = (ins_data + del_data) user_stats[user] = Stats Else:new_stat = {' Additio NS ': ins_data, ' deletions':d el_data, ' Total ': ins_data+del_data} User_stats[user] = New_stat r = Requests.get ("Https://api.github.com/repos/coc Os2d/cocos2d-x/commits ", params = payload) collect_stats (R.json ()) print user_stats pattern = Re.compile (" < (\s+) > ;  Rel=\ "Next\" ") H = r.headers print r.headers[' x-ratelimit-remaining '] result = pattern.search (h[' link ') while (result is Not None): Next_url = Result.group (1) r = Requests.get (Next_url, params = token) collect_stats (R.json ()) H = R.headers print h[' link '] result = pattern.search (h[' link ') #print h[' link ' #next_url = result.group (1) #print NEX
 T_url #r_next = Requests.get (next_url[1]) print r.headers[' x-ratelimit-remaining '] print user_stats
The code can also be obtained on GitHub: https://github.com/heliclei/githubtools/blob/master/github-stats.py
This script filters the commit of more than 5000 rows in a single commit, and filters the merged commit, first clone the warehouse that needs to be counted to local, then copy the script to the local git repository, and notice to change the line to the URL of the corresponding warehouse.

Https://api.github.com/repos/cocos2d/cocos2d-x/commits
GitHub tokens can be generated using a script from the previous article
Run Python git-stats.py xxxxxxxxxxxxxgithub-oauth-tokenxxxxxxxxxxxxxxxxxxx
PS: After filtering, the code of the Cocos2d-x God last year to contribute more than 10w line, or very bad ~ ~ But this data does not have 110W line so surreal.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.