In association with heise online

12 January 2012, 12:32

Kernel Log: 15,000,000 lines, 3.0 promoted to long-term kernel

By Thorsten Leemhuis

With the merger of the first changes into Linux 3.3, the number of lines of kernel source code has passed through the 15 million mark. Maintenance of Linux 2.6.32 is set to end in one month's time, while Linux 3.0 and real-time kernels based on it will be maintained for the next two years.

With the source code for last week's release of Linux 3.2 falling just short of the 15 million line mark, the kernel finally reached this milestone over the weekend with the merger of the first changes into the main development branch which is now building towards Linux 3.3. As the kernel hit the 10 million line mark in October 2008, this implies that the Linux kernel source code has grown by more than 50 per cent over the last three years. It's worth noting that these figures do include the comments, blank lines, documentation, scripts and userland tools included with the kernel (find . -type f -not -regex '\./\.git.*' | xargs cat | wc -l).

Criticism of this growth is only rarely aired in kernel developer circles. Linux veteran Theodore 'tytso' Ts'o recently noted on the kernel developer mailing list that while analysis of codebase size certainly had "entertainment value", it shouldn't be taken as an indicator of complexity. In contrast, the latter topic has been much discussed among kernel developers in recent times. In the course of reviewing some recent changes, Andrew Morton, for example, noted that kernel development had gone "beyond the point where any additional kernel complexity should be considered a regression". Avoiding regressions – ensuring that new kernels do not generate problems absent in previous versions – is considered to be an almost unbreakable commandment.

Torvalds himself criticised the growing complexity of the kernel in a recent interviewGerman language link with Zeit Online, noting that he awaits with concern the day when kernel developers are faced with a bug which no-one is able to get the measure of. According to a report on LWN.net, complexity was also a topic at this year's kernel developers' conference, where the memory management code came in for particular criticism. According to the report, a "[…] problem involving page migration took three core developers to solve. Nobody really knows how the whole thing is implemented." Another example can be found in an LWN.net article published last year which elucidates the background to a bug in the memory management code which arose during 2.6.34 development, the cause of which took Torvalds and a number of other developers several days to track down.

Analysis

At the time of writing this Kernel Log, a git checkout of the main Linux development tree contained 15,046,951 lines of code. It is possible, though unlikely, that the line count could drop back below the 15 million mark, but in the long term further growth appears inevitable since, with the exception of some outliers, recent versions have typically been around 100,000 to 300,000 lines larger than their direct predecessor.


Zoom The majority of the code is for drivers, filesystems and architecture-specific code
A tool by the name of SLOCCount provides a more detailed analysis of the kernel codebase. In Linux 3.2 – the codebase for which, at 14,998,651 lines, fell just short of the 15 million mark – nearly 1.9 million lines are responsible for supporting different processor architectures. Similarly the filesystems directory contains just under 700,000 lines of code. The largest directory is the drivers directory, at 5.6 million lines. The total amount of code dedicated to drivers is in fact even greater than this, as some driver code is located in other directories. Drivers for audio hardware, for example, are located in sound/drivers/.

SLOC    Directory       SLOC-by-Language (Sorted)
5615064 drivers         ansic=5610304,yacc=1688,asm=1475,perl=792,lex=779,sh=26
1876166 arch            ansic=1632759,asm=241881,sh=692,awk=470,pascal=231, perl=58,python=45,sed=30
698974  fs              ansic=698974
533134  sound           ansic=532951,asm=183
493711  net             ansic=493615,awk=96
301646  include         ansic=299895,cpp=1709,asm=42
120454  kernel          ansic=120149,perl=305
56177   tools           ansic=51029,perl=3272,python=1399,sh=476,asm=1
54529   mm              ansic=54529
44171   security        ansic=44171
42627   crypto          ansic=42627
37307   scripts         ansic=22487,perl=8287,sh=2028,cpp=1820,yacc=1291,lex=947,python=447
28486   lib             ansic=28473,awk=13
14382   block           ansic=14382
11579   Documentation   ansic=6896,perl=2369,sh=1018,python=949,lisp=218,awk=129
5705    ipc             ansic=5705
4661    virt            ansic=4661
2377    init            ansic=2377
1876    firmware        asm=1660,ansic=216
1232    samples         ansic=1232
564     usr             ansic=550,asm=14
0       top_dir         (none)

SLOCCount also takes a close look at the files and generates an analysis of the programming languages used. It shows that Linux 3.2 contains just under 10 million lines of actual code, 97% of which is Ansi-C and 2.5% of which is assembler.

Totals grouped by language (dominant language first):
ansic:      9667982 (97.22%)
asm:         245256 (2.47%)
perl:         15083 (0.15%)
sh:            4240 (0.04%)
cpp:           3529 (0.04%)
yacc:          2979 (0.03%)
python:        2840 (0.03%)
lex:           1726 (0.02%)
awk:            708 (0.01%)
pascal:         231 (0.00%)
lisp:           218 (0.00%)
sed:             30 (0.00%)
Total Physical Source Lines of Code (SLOC)  = 9,944,822

However, it has been some years since SLOCCount has been updated and, as the Pascal figure shows, it can get confused. Perl and Python code definitely are present in Linux however, with some kernel versions requiring the Perl interpreter to compile.

Next: Linux 3.0 promoted to long-term kernel, In brief

Print Version | Permalink: http://h-online.com/-1408062
  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit
 


  • July's Community Calendar





The H Open

The H Security

The H Developer

The H Internet Toolkit