max spevack's blog - netcraft confirms it -- ubuntu is dying? [entries|archive|friends|userinfo]

[ website | My Website ]
[ userinfo | livejournal profile ]
[ archive | journal archive ]

netcraft confirms it -- ubuntu is dying? [Oct. 17th, 2007|03:50 pm]
Previous Entry Add to Memories Share Next Entry
[Tags|]
[Location |raleigh, nc]

December 28, 2006 -- Mark Shuttleworth says "We know now that there are probably at least 8 million [Ubuntu] users."

October 17, 2007 -- Mark Shuttleworth says Ubuntu has "in excess of 6 million users."

I have a follow up question, Mr. Shuttleworth. How did you manage to lose about 2 million users in less than 10 months? What does that mean for the future of Canonical and the funding that you provide to Ubuntu? Or were your numbers then, and now, simply made up? How many users do you actually have? Your inconsistent tales to reporters detract from the good work that the Ubuntu community guys like Jono and others are doing.

At least Fedora makes a good-faith attempt at statistical transparency.
LinkReply

Comments:
(Deleted comment)
[User Picture]From: codergeek42
2007-10-17 10:19 pm (UTC)

(Link)

I think the point is not that we can or cannot accurately get demographics of this, but that those of Ubuntu seem highly falsified or exaggerated, whereas (with Fedora) we have an actual basis for our userbase estimates in the form of simple IP tracking on the updates mirrors and things of that nature.
[User Picture]From: spevack
2007-10-18 01:27 am (UTC)

(Link)

that is, in fact, EXACTLY the point
From: fooishbar
2007-10-18 02:09 am (UTC)

(Link)

'Seem'? Do you have anything at all to back that assertion up, or is this just all breeze? Accusing people (another free software project, indeed) of jacking up their numbers for their own gain is pretty serious, so you should back that up when you say it (not to mention, think).
[User Picture]From: codergeek42
2007-10-18 02:42 am (UTC)

(Link)

Max's original post asked about this potential forgery of stats. I merely restated it for clarification in my post.
[User Picture]From: mmcgrath
2007-10-18 02:48 am (UTC)

Time to spell it out.

(Link)

:: sigh :: From the article:

"Shuttleworth also touched upon the issue of Unbuntu's user base, saying that it has "in excess of 6 million users." He did, however, admit that he couldn't be sure of the number since there is no formal registration process for most users, and neither Canonical nor the Ubuntu community do any active monitoring of installations."

Shuttleworth: 6 million!

Kerner (aurhor): But how did you get 6 million?

Shuttleworth: I don't know, neither Canonical nor Ubuntu do any active monitoring. 6 million!!!

[User Picture]From: mjg59
2007-10-18 02:49 am (UTC)

Re: Time to spell it out.

(Link)

Because, clearly, Canonical don't run the servers used for security updates.

Oh. Wait.
[User Picture]From: mmcgrath
2007-10-18 02:54 am (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

It came from Shuttleworth's million dollar (and getting smaller every day) mouth

"neither Canonical nor the Ubuntu community do any active monitoring of installations."

They have access to monitor, but for whatever reason don't.
[User Picture]From: mjg59
2007-10-18 03:22 am (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

To the best of my knowledge, there's no regular attempt to work out precisely how many users there are. That's somewhat different to occasionally checking the number of unique IPs that hit a specific server, given that there's no easy way of getting from there to the number of unique hosts - it's something of a guesstimate.
[User Picture]From: codergeek42
2007-10-18 04:13 am (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

"there's no regular attempt to work out precisely how many users there are. [...] given that there's no easy way of getting from there to the number of unique hosts - it's something of a guesstimate."

You're correct in this; but an estimate based on statistics such as these (among others) is assuredly many orders of magnitude more accurate than a simple "guess" as Mr. Shuttleworth has done without any attempt at accuracy.
From: fooishbar
2007-10-18 01:11 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

Again, your assertion is that this isn't based on statistics. I'm interested to know how you know this.
[User Picture]From: mihmo
2007-10-18 02:44 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

Daniel, I'm curious how the following quote from Mark's interviewer that folks have referenced a number of times in the original post and in reply to you does not back the assertion up:

"He did, however, admit that he couldn't be sure of the number since there is no formal registration process for most users, and neither Canonical nor the Ubuntu community do any active monitoring of installations."

Does that not satisfy your question?
[User Picture]From: mjg59
2007-10-18 03:00 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

Active monitoring is clearly different to mining data that's been passively collected from various sources. I know that figures have been produced inside Canonical, but I have absolutely no idea what the results or error bars are.
From: fooishbar
2007-10-18 04:16 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

'No formal registration process' is true of everyone, and 'active monitoring' doesn't mean that they can't gather meaningful statistics. If you guys want to be dicks about it, though, then I'm not in a position to stop you.
[User Picture]From: mihmo
2007-10-18 04:45 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

it was more the 'couldn't be sure of the number' quote than the methodology he admitted to not using?
[User Picture]From: spaz_own_joo
2008-02-09 05:08 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

Perhaps the idea was to express this point and remind everyone that OS market share statistics are to be taken with a grain of salt.

I might suggest one reason that the 8M might have shrunk to 6M is that they're watching the number of unique IPs and, maybe, unique combinations of hardware-specific packages being downloaded per IP. It's reasonable to assume that most everybody will go on a bit of an Aptitude spree after first installing, but once they've gotten all the software they need, there's no reason to assume they'll be showing up in Canonical's server logs ever again. Some users don't trust the automatic update process, some are ignorant to its existence, and some use the computer strictly with non-administrator accounts and the dude who installed it is never around to click "OK" to the update window.

So if of the 8 milion they counted in the PR craze when Gutsy came out, only 6 million are still hitting the Canonical servers, they may be playing it safe and assuming that the other 2 million went back to Windows.
From: skvidal
2007-10-18 05:27 am (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

So if the data exists, parse it. If canonical discovers that they have 80million unique ip hits then even if that's just one guy changing ips and checking in every few seconds that still says something. Now, if the number is only 1million or maybe 2 million then that says something. But to say the number is 6 million or 8 million w/o even giving a rough basis for coming up with the number? That's just pulling it straight out of thin air.

We all realize the variances of checking web server logs. In fact fedora's stats site talks about them in some detail. However, the issues which impact fedora's connection counting also impact canonical, and probably equally.

so, at the very least, the numbers should be roughly comparable to one another, right?

-sv
[User Picture]From: mjg59
2007-10-18 02:56 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

There's multiple ways to guess at the installation figures - number of CDs shipped, number of ISOs downloaded to unique addresses, number of security update downloads, number of hits to the NTP server, number of checks for stable updates, number of clickthroughs from the Firefox start page to the Ubuntu website, contributions to the hardware database and so on. How you interpret those will dramatically alter the figure you come up with and we're certainly going to be looking at error bars in the millions. At a guess, 8 million was the best estimate of the total number of users a year ago and 6 million is the minimum plausible number of users now.

Clearly the figures Mark comes up with aren't comparable to the Fedora figures - they're the entire installed userbase, whereas the Fedora figures seem to be per release. We're not going to be able to come up with any sort of reliable metric, and defining one as a reference technique is inevitably going to end up favouring one distribution over another. Why worry about it? It's not like anyone chooses which Linux distribution to install based on figures on a website.
[User Picture]From: jspaleta
2007-10-18 07:54 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

The fundamental question here is what is the methodology behind the quoted numbers.

Without the statement of methodology to generate the numbers its not clear how reliable the number is as a metric for anything. Did the 8 million quote drop to a 6 million quote because the methodology changed in the meantime? Without a statement as to how metrics are being aggregated into a number we can't even be sure that the 8 million from the earlier quote is comparable to the 6 million quoted more recently just looking at Ubuntu's own growth, without getting into the complication of comparing with Fedora's metrics at all.

All methodologies have caveats..but we have to be transparent about the methodologies being used or the numbers mean absolutely nothing even as a relative time evolving metric.

If you have any pull with Canonical, please, please please encourage them to publish a representative discussion of the methodology used in the calculation they quote in the press. Some of us care rather deeply about linux adoption and we want to make sure the numbers being bantered around from month to month from distros tell at least a self-consistent story about adoption trends. The fact that we have high profile Canonical numbers beinq quoted in the press that show a significant drop in client adoption in under a year is not something that makes a whole lot of sense to me. And it would help if we had a public statement of how those numbers are generated so as to explain the drop.

-jef
[User Picture]From: mjg59
2007-10-18 09:45 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

No, I don't think the differing numbers are due to a difference in methodology. I think that the 8 million figure was the most likely number of users in 2006 and the 6 million figure is the absolute minimum number of users in 2007. I don't think Canonical are likely to see any specific benefit from describing the methodology used, and I also don't think anyone else should care. There's no great benefit to Canonical from providing grossly overinflated figures, and the idea of us actually being able to generate comparable figures for Ubuntu and Fedora is unrealistic. Different demographics are going to use Linux installs differently, and that's going to disadvantage one distribution over another in some way.
[User Picture]From: jspaleta
2007-10-18 10:12 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

Everyone has an opinion as to what the numbers might mean. Hence the problem with not knowing what the actual methodology is. If you personally don't think that the trend of the numbers stated publicly matter.. then please refrain from participating in the discussion concerning what the numbers might or might not mean.

As someone who cares about advancing linux adoption broadly, I would appreciate it if you and everyone else who don't think the trends in client metrics matter would take the time to find something you do think is important and spend your time discussing that.

-jef
[User Picture]From: mjg59
2007-10-18 10:28 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

I've told you what these numbers are likely to mean, and that there's no plausible way of determining what the year-long trend of Ubuntu installs has been from them. But perhaps a more interesting question is what kind of comparable metric do you think would be appropriate in terms of determining trends of different distributions? The Ubuntu shipit program has been highly successful in shipping CDs to people without any sort of significant internet access, so anything purely involving the internet is going to disproportionately favour Fedora over Ubuntu. The remaining solutions seem to involve carrying out physical surveys in large parts of the world, which is sufficiently unrealistic that I don't think it's possible to produce comparable figures. And if you can't produce comparable figures, what's the point?
[User Picture]From: jspaleta
2007-10-18 10:44 pm (UTC)

Re: Time to spell it out. (this time shorter!!)

(Link)

I'm not as interested in comparing distributions to each other. I am most interested in looking at the trends over time of linux as broadly as possible, not just in total global numbers but also by location location. Watching distributions get into a pissing match over numbers is something I'll leave to Osnews. I care about linux adoption metrics as a surrogate towards for open source adoptionin a broad sense . I care about making it easier for people to make the case to their local government and their local institutional entities to provide equitable access to digital services that work for open source users/linux users in there area by showing them that people in their area are in fact using linux in sustainable numbers (geoip isn't complete crap for basic location trending).

Does Ubuntu have stats for Shipit?
The Fedora FreeMedia program keeps a at least a minimal public stats log for media shipped every month::
http://fedoraproject.org/wiki/Distribution/FreeMedia/Information

And I'll probably be able to work with them to get zipcodes/postcodes and datamine the location information for the FreeMedia program on top of what I'm doing with the ip logs. I just haven't gotten that far yet.

-jef
From: maco.myopenid.com
2007-11-04 12:46 am (UTC)

Re: Time to spell it out.

(Link)

I don't use Canonical's repos for updates because I found I get faster download rates from a different mirror. So, they'd need to get all of the mirrors (there's what? 100??) to give stats on how many times $package was downloaded, and then they might get a close measure of how many boxes have a specific version installed. Then you have to take into account things like there being 1 Ubuntu desktop at my house and 5 people using it. I say "might" because sometimes it's easier, especially if you have dial-up, to get a friend with broadband to use AptOnCD to burn you an update disk right after they update.
[User Picture]From: mjg59
2007-11-04 01:21 am (UTC)

Re: Time to spell it out.

(Link)

There aren't any official security mirrors, but of course you're right that a single IP address doesn't necessarily correspond to a single person (or vice versa). There's no way to know precisely how many systems are running Ubuntu, so the best you can manage is to use various different methods to get a range of ballpark figures.
[User Picture]From: jspaleta
2007-10-17 10:50 pm (UTC)

Speaking of Fedora usage stats....we've got maps!!!!!!

(Link)

Here's a density map showing the density of unique IP's associated with the Fedora mirrorlist logs:
http://fedoraproject.org/maps/mirrorlist/

I'm hoping to roll out time lapse animation eye-candy of the first week of F8 release so we can get avisual depiction of not only the total number of unique ips but see them how they come on line across the globe over the first week or so of F8 life.

-jef
[User Picture]From: codergeek42
2007-10-18 12:03 am (UTC)

Re: Speaking of Fedora usage stats....we've got maps!!!!!!

(Link)

You rock, good sir! 8)
From: (Anonymous)
2007-10-18 07:56 am (UTC)

just words

(Link)

It's just a matter of choice of words. After all, 8 million is also 'in excess of 6 million'. He didn't lose 2 million, he just rephrased with a little more caution.
[User Picture]From: mihmo
2007-10-18 02:46 pm (UTC)

Re: just words

(Link)

if my bank 'rephrased with a little caution' my statements on the order of 2 million I'd be a bit upset ;)
(Deleted comment)
[User Picture]From: spevack
2007-10-20 03:12 am (UTC)

Re: Who gives a shit? Ubuntu is better and he has the marketing plan and you have?

(Link)

I had to delete the parent comment due to profanity (and bad grammar).
From: (Anonymous)
2007-10-23 12:35 pm (UTC)

Either way...

(Link)

...we know it's OVER 9000!!! Sorry, I had to say it. Keep up the great work on Fedora, guys. I'm lovin' it!
From: (Anonymous)
2008-01-12 06:08 pm (UTC)

Love!

(Link)

Make peace, not war!