Linux source code passes 10 million lines

Source: Internet
Author: User
[Original link]

The numberOf lines of source code comprising Linux kernel files recently surpassed the ten million mark after the latest release of Linux version 2.6.27, an analysis has found.

However, that count between des blank lines, comments and text files encoded in a full checkout of the kernel source. counted slightly differently, the number of lines of actual text is "only" over nine million, but we rather like that larger figure of ten million, because white-space really is important for code readability and, well... it's a nice round number.

As with all long-term programming projects, the size of the Linux kernel code base varies over time, as old code is discarded and replaced.

Newer features and functions are constantly being added, though, so the overall size of the Linux kernel continually increases.

Some analyses of the Linux kernel code base using David Wheeler's sloccount program yield some interesting facts. (the acronym "sloc" stands for source lines of code .) it finds only 6,399,191 lines of source code, since it doesn't count blank lines, comments and other input. one breakdown of the code base by sloccount comes up with the following figures (percentages are rounded to one decimal place ):

Type Count Per cent
Drivers 3,301,081 51.6
Ubuntures 1,258,638 19.7
Filesystems 544,871 8.5
Networking 376,716 5.9
Sound 356,180 5.6
Include 320,078 5.0
Kernel 74,503 1.2
Memory Mgmt 36,312 0.6
Cryptography 32,769 0.5
Security 25,303 0.4
Other 72,780 1.1

Categorisation by language finds that the overwhelming majority of the Linux kernel code is written in ansi c, at 96.39 per cent, with assembly language accounting for almost all of the rest at 3.32 per cent. other programming ages used in the kernel source files, in descending order of the number of lines of code, include Perl, C ++, YACC, SH (ELL), Lex, Python, lisp, pascal and awk.

More interestingly perhaps, sloccount also produces an estimation of the Linux kernel source code's value, that is, what it might cost to redevelop the code base from scratch, using the cocomo development model.

Sloccount estimates that it wocould take a team of over 200 developers about nine and a half years to rewrite the Linux kernel from scratch. based up a four year old assumption of programmers 'average salary level, sloccount calculates that wocould cost nearly $268 million.

Given inflation and adding in management overhead, $500 million might be a fair estimate of what it might actually cost a proprietary software vendor to redevelop Linux.

In fact, thousands of programmers have contributed to developing the Linux kernel, over a period of more than 15 years.

And in terms of what it costs one to download a full Linux distribution, they did for free.


A more detailed version[Original link]

After the release of Linux 2.6.27, kernel developers are currently busily integrating patches for the next kernel version into the main development branch of Linux. this usually involves discarding some old code and adding new code though on balance, there are usually more new lines than old ones, making the kernel grow continually.

In this process, the kernel developers have now passed the 10 million line mark if blank lines, comments and text files are written in a current git checkout of the Linux source code (find. -Type F-not-RegEx '/. //. git. * '| xargs cat | WC-l ). it is also worth noting that the lines of text in source code files as that number has recently passed 9 million (find. -name *. [HCs]-Not-RegEx '/. //. git. * '| xargs cat | WC-l ).

Programs like sloccount can be used to inspect the Linux kernel's source code in more detail. according to this tool, the source code line count is not 9 million but exactly 6,399,191 (source lines of code/sloc), as the program doesn't count blank lines, comments and several other types of input. more than half of the lines are part of hardware drivers; the second largest chunk is the ARCH/directory which contains the source code of the varous ubuntures supported by Linux.

Sloc directory sloc-by-language (sorted)
3301081 drivers ansic = 3296641, YACC = 1680, ASM = 1136, Perl = 829, Lex = 778, SH = 17
1258638 arch ansic = 1047549, ASM = 209655, SH = 617, YACC = 307, Lex = 300, awk = 96, Python = 45, Pascal = 41, Perl = 28
544871 FS ansic = 544871
376716. Net ansic = 376716
356180 sound ansic = 355997, ASM = 183
320078 include ansic = 318367, CPP = 1511, ASM = 125, Pascal = 75
74503 kernel ansic = 74198, Perl = 305
36312 ansic = 36312
32729 crypto ansic = 32729
25303 Security ansic = 25303
24111 scripts ansic = 14424, Perl = 4653, CPP = 1791, SH = 1155, YACC = 967, Lex = 742, Python = 379
17065 lib ansic = 17065
10723 block ansic = 10723
7616 documentation ansic = 5615, SH = 926, Perl = 857, LISP = 218
5227 IPC ansic = 5227
2622 virt ansic = 2622
2287 init ansic = 2287
1803 firmware ASM = 1598, ansic = 205
833 samples ansic = 833
493 USR ansic = 491, ASM = 2
0 top_dir (none)

According to sloccount, 96.4 per cent of the Code is written in C and 3.3 percent in memory er. the other programming versions are only used marginally: Perl, for example, was used for some help scripts during kernel development and only accounts for a tiny 0.1 percent. in the Explorer-heavy architecture directory, sloccount also claims to have found 116 lines of Pascal code-but that cocould well be a misinterpretation by sloccount.

Totals grouped by language (dominant language first ):
Ansic: 6168175 (96.39%)
ASM: 212699 (3.32%)
Perl: 6672 (0.10%)
CPP: 3302 (0.05%)
YACC: 2954 (0.05%)
SH: 2715 (0.04%)
LEX: 1820 (0.03%)
Python: 424 (0.01%)
LISP: 218 (0.00%)
Pascal: 116 (0.00%)
Awk: 96 (0.00%)

Sloccount also tries to give a rough calculation of the source code's value; according to the program's estimates, it wocould take more than 200 developers about nine and a half years and cost $267 million to rewrite the code from scratch. given that the program has not been updated for four years, the accuracy of this calculation is arguable; especially the cost per developer wocould now surely need to be increased.

Total physical source lines of code (sloc) = 6,399,191
Developers effort estimate, person-years (person-months) = 1,983.63
(23,803.60)
(Basic cocomo model, person-months = 2.4 * (ksloc ** 1.05 ))
Schedule estimate, years (months) = 9.59 (115.10)
(Basic cocomo model, months = 2.5 * (person-months ** 0.38 ))
Estimated Average number of developers (effort/Schedule) = 206.81
Total estimated cost to develop =267,961,839
(Average salary = $56,286/year, overhead = 2.40 ).
Generated using David A. Wheeler's 'sloccount'

There is no end in sight for kernel growth which has been ongoing in the Linux 2.6 series for several years-with every new version, the kernel hackers extend the Linux kernel further to include new functions and drivers, improving the hardware support or making it more flexible, better or faster. A look at the figures pertaining to the latest kernel versions also shows that it is not only the number of lines of source code which is continually increasing, but also the number of changes per kernel version.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.