Friday, September 11, 2009

Proposal for a new source code measurement unit


With the pace and scale of modern software development being what it is, it is very rare to find any application beyond university projects and weekend hack-jobs that amount to fewer than hundreds of thousands of lines of code in source form.  The truth is that the measurement Source Lines Of Code has become about as efficient and logical as measuring the distance between stars, not in parsecs, but in kilometers.


What, then, could we use as the programming equivalent of the astronomical parsec?  I propose the unit torvalds, uncapitalised to differentiate it from the name, to represent 200,000 lines of code.  Shortened to Tr with an initial capital to differentiate it from the Unix tr command.


Linus Torvalds, the creator of the Linux kernel, and an outspoken man in his own right, has many critics.  Recently, there's been a trend among those wishing to criticize the quality - or worse, the magnitude - of his contributions of pointing out the fact that only 2% of the kernel source code was actually written by him.  This may be true by certain estimates, but invariably they leave out the context:  that the kernel currently (as of version 2.6.30) comprises 11,637,173 lines of source.  That's almost twelve million lines of source code.  Since the first mention I could find of this fact references an unspecified minor revision of the 2.6 kernel, I'd say a safe estimate places the then-current kernel size at 10 million lines.

By this measurement, Linus' contribution of 2% amounts to roughly 200,000 lines of code which, by any individual standard, is substantial.


  • The Debian 4.0 distribution, at 283 million lines of code, amounts to 1415 torvalds (Tr).
  • Mac OS X version 10.4, at 86 million SLOC, equals 430 Tr.
  • The FreeBSD operating system which totals 8.8 million lines, comes to 44 Tr.

No comments: