Thursday, December 08, 2005

Downloading the Web in an Afternoon

That's the equivalent of what Caltech recently accomplished.

Bandwidth is a measure of how much data can be sent across a network in a set amount of time. It's what you experience online when waiting for a web page to load.

But recently, Caltech reached a stunning milestone in how fast data can be sent, transferring 475 terabytes of data in 24 hours. That's fast enough to download the entire Web as indexed by Google in an afternoon. (Cool idea too. Can you imagine having your own personal cache of the web, updated every couple of hours in the background while you surf with no page load waits?)

OK, for math heads, here's the figures I used rounded to the nearest sanest numbers:

475 terabytes (the amount of data transferred in 1 day by Caltech), divided by 156 terabytes (the size of Google's web) = (roughly) 3.

More math? OK here is how I determined Google's size:

I did a Google search with my preferences set to return 100 pages. I searched for everything using *.* (Note: This search query doesn't work anymore. Use a creative seach of your own with common words like "home" or "welcome").
From those 100 hits, I added up the size of each page for a total of:
1721 k
and then divided that by 100 to get the average size of each page.
That gave me an average of 17 K per page
Multiply that by the number of pages on Google (9,180,000,000)
and that gives 156,060,000,000,000 Bytes or 156 terabytes.


Anonymous said...

Interesting thought! What's more interesting is the fact that the first result I get by searching *. in Google is Macromedia. Does it mean Macromedia is the most "backlinked" site in the web??

Anonymous said...

I think that you forgot about databases. They got a lot of stuff.... might take you a whole day to download all the PORN that many sites have.

Kiltak said...

Hey, there's probably A LOT of stuff that you missed with your theory. You forgot newsgroups, sites that aren't indexed by google robots (those that require membership and the ones that tells bots to ignore them) and databases of all kinds.

You could probably double your total easily.


[Geeks Are Sexy] Tech. News

Niels Madsen said...

Even doubling the total means nothing, downloading the internet once a day should be enough to prove the point?