|
Recent
Articles |
Novell Could Loose Access To New Linux Versions Make a deal with a big closed source company and the FSF (Free Software Foundation) may pull your access rights to Linux Distro's. At least that is what Novell is facing this week. The Geek.com reports: The Free...
Invalidating The Linux Buffer Cache When you write data, it doesn't necessarily get written to disk right then. The kernel maintains caches of many things, and disk data is something where a lot of work is done to keep everything fast and efficient.
Invalidating The Linux Buffer Cache When you write data, it doesn't necessarily get written to disk right then. The kernel maintains caches of many things, and disk data is something where a lot of work is done to keep everything fast and efficient.
Cron Isn't Working? Let's just get this out of the way first: when someone says cron is not working, it almost always is, and they have just misunderstood something basic. Usually that's not understanding the environment that cron scripts...
Xen GUI In Fedora Core 6 Fedora Core is Red Hat's Linux distribution for testing new technologies. The new version (6) of Fedora Core, which became available for download in...
Performance Profile For Apache Geronimo 1.1.1. TheServerSide has a thread announcing that the Apache Geronimo team has released a performance profile for Geronimo 1.1.1. It's a great start and the team should be commended on releasing the results even...
|
|
|
02.15.07
Fighting Analysis Paralysis With Open Source?
By Savio Rodrigues
I stumbled across this analysis of the Linux Kernel which brought back "fond" memories of my market opportunity forecasting days.
In the analysis, the author, kripken, estimates that "at most, 60% of the Linux Kernel is GPLv2 code". Read his methodology here, but I'll summarize.
He wrote a program that scanned license statements found at the beginning of source code files. The program then attempted to match the license text against patterns to determine if the file was licensed under GPLv2 and above, GPLv2 only, GPL version unspecified or Other. The program tracked the size of the file, not the number of files nor the number of lines licensed under a given license. The results:
| License | # Bytes | % Bytes | | GPL 2 or above | 60,637,907 | 39% | | GPL 2 only | 32,215,150 | 21% | | GPL, Ver unspecified | 19,773,264 | 13% | | Other | 43,762,840 | 28% | | All Combined | 156,389,161 | 100% |
In a follow up post, kripken, compares his results vs. a much less thorough analysis that Linus did using:
Comparing the two, we see that Linus estimates 34% (2720 / 7978) of the kernel being "GPL 2 or above", while kripken estimated 39%. As kripken says himself, the two pieces of analysis point towards a relatively similar result, but his analysis took several hours, and Linus needed about 10 seconds.
So what did we learn?
I'm all for using "perfect" data and analysis to make decisions. But sometimes, actually, most of the time, perfect data isn't available. This can call into question the analysis that relies on the imperfect data. In my days of forecasting, I'd often explain to colleagues and execs that the right data wasn't available, so here are some assumptions I'm making and its impact on the final results. Some would quickly "get it" and make a decision based on "the best data and analysis available within the timeframe at hand". Others couldn't get over the hurdle of using imperfect data to make decisions, and would attempt to find "the missing data".
I remember discussing this with a manager at the time. He said something like:
"You'll find that there's very little you can tell a really good executive that he/she doesn't already know or have a gut feeling for. These people probably got to where they are because they are able to combine disparate sources of imperfect data (i.e. a customer call, a conference pitch, talking with their friends, kids, neighbors, etc) to spot trends before the rest of us can. As a result, they're much more likely to accept analysis based on imperfect data. They're more worried about acting based on the best analysis available, than deliberating so long that the opportunity has passed."
That's one thing open source developers, projects and vendors seem to do really well; spot trends and make decisions without "all the data in the world". This could be because they're closer to the user and open source communities foster two-way dialogue between creators and users. Come to think of it, maybe open source actually allows for "better data" collection?
Comments
About the Author:
I am taking a semi-break from IBM life as I return to finish a PhD in Industrial Engineering. I've held roles in market intelligence, strategy and product management. I'm ex-product manager of IBM WAS Community Edition, and blog about enterprise open source topics.
|