NetworkingFiles
SecurityProNews
ITmanagement




Fighting Analysis Paralysis With Open Source?

By: Savio Rodrigues
Expert Author
2007-02-14

I stumbled across this analysis of the Linux Kernel which brought back "fond" memories of my market opportunity forecasting days.

In the analysis, the author, kripken, estimates that "at most, 60% of the Linux Kernel is GPLv2 code". Read his methodology here, but I'll summarize.

He wrote a program that scanned license statements found at the beginning of source code files. The program then attempted to match the license text against patterns to determine if the file was licensed under GPLv2 and above, GPLv2 only, GPL version unspecified or Other. The program tracked the size of the file, not the number of files nor the number of lines licensed under a given license. The results:

License # Bytes % Bytes
GPL 2 or above 60,637,907 39%
GPL 2 only 32,215,150 21%
GPL, Ver unspecified 19,773,264 13%
Other 43,762,840 28%
All Combined 156,389,161 100%


In a follow up post, kripken, compares his results vs. a much less thorough analysis that Linus did using:

[torvalds@g5 linux]$ git-ls-files '*.c' | wc -l
Result= 7978
[torvalds@g5 linux]$ git grep -l "any later version" '*.c' | wc -l
Result= 2720


Comparing the two, we see that Linus estimates 34% (2720 / 7978) of the kernel being "GPL 2 or above", while kripken estimated 39%. As kripken says himself, the two pieces of analysis point towards a relatively similar result, but his analysis took several hours, and Linus needed about 10 seconds.

So what did we learn?

I'm all for using "perfect" data and analysis to make decisions. But sometimes, actually, most of the time, perfect data isn't available. This can call into question the analysis that relies on the imperfect data. In my days of forecasting, I'd often explain to colleagues and execs that the right data wasn't available, so here are some assumptions I'm making and its impact on the final results. Some would quickly "get it" and make a decision based on "the best data and analysis available within the timeframe at hand". Others couldn't get over the hurdle of using imperfect data to make decisions, and would attempt to find "the missing data".

I remember discussing this with a manager at the time. He said something like:

"You'll find that there's very little you can tell a really good executive that he/she doesn't already know or have a gut feeling for. These people probably got to where they are because they are able to combine disparate sources of imperfect data (i.e. a customer call, a conference pitch, talking with their friends, kids, neighbors, etc) to spot trends before the rest of us can. As a result, they're much more likely to accept analysis based on imperfect data. They're more worried about acting based on the best analysis available, than deliberating so long that the opportunity has passed."

That's one thing open source developers, projects and vendors seem to do really well; spot trends and make decisions without "all the data in the world". This could be because they're closer to the user and open source communities foster two-way dialogue between creators and users. Come to think of it, maybe open source actually allows for "better data" collection?

Comments

Tag:

Add to Del.icio.us | Digg | Reddit | Furl


About the Author:
I am taking a semi-break from IBM life as I return to finish a PhD in Industrial Engineering. I've held roles in market intelligence, strategy and product management. I'm ex-product manager of IBM WAS Community Edition, and blog about enterprise open source topics.
Newsletter Archive | Submit Article | Advertising Information | About Us| Contact

LinuxDeveloperNews is an iEntry, Inc.® publication © 1998-2008 All Rights Reserved Privacy Policy and Legal
iEntry Contact Advertise iEntry Jayde WebProWorld Forums Downloads News Article Archive About iEntry LinuxDeveloperNews Home Page LinuxDeveloperNews News