Thursday, December 08, 2005

How to Visualize Data

Did you know that 1 CD can hold the entire Human genome? That a Gigabyte of text printed out would fill the bed of a pickup truck with paper? That it takes 20,000 trees to print out 400 Gigabytes on paper? That 100 Terabytes is a high-side guess of Human brain storage space, slightly less than one days Web Page views from Google in 2009.

I've been collecting these data bits for a couple of years now, finding gems and doing my best math to see if they fall in line with everything else. It's pretty close. Close enough to give you some useful comparisons.

Bits


2 bits: any 2 choice decision. Yes/No Run/Stop...
3 bits: A color pixel. Only 3 bits are needed to represent any colour. Red, Blue and Green, from which all other colors are derived.
8 bits: 1 byte

Bytes


1 byte: Any character on your computer keyboard (not including your cat).
5 bytes: The average English word

Kilobyte


1,024 bytes often rounded to 1,000 bytes
1 Kilobyte: A joke or a couple of paragraphs
2 Kilobytes: Typewritten page
3.2 Kilobytes: The amount of data in the H1N1 Swine Flu virus
3.5 Kilobytes: The size of the first web page
5 Kilobytes: A Desktop Icon
10 Kilobytes: A page out of an encyclopedia
17 Kilobytes: The size of an average Web Page
50 Kilobytes: A (roughly) 4 by 6 inch image
100 Kilobytes: A low-resolution photograph
750 kilobytes: The file size necessary to categorize the entire range of human experience and interest. (11 categories and about 450 unique sub-categories) as indicated by the Yahoo! directory on November 3rd 2007. (I wrote a small program to fetch all the headings and categories, save it to a text file, then view the text file size.)

Megabyte


1,048,576 bytes,
1 Megabyte: Small novel, 3-1/2 inch diskette
2 Megabytes: 12 Megapixel Digital Photo, high resolution
3 Megabytes: The average mp3 song. (Rough rule of thumb: an mp3 plays about 1 megabyte per minute.)
4 Megabytes: A Non illustrated King James Bible. I downloaded a text version from Project Gutenberg and viewed the file size).
5 Megabytes: 1 minute of a YouTube High Quality video.
10 Megabytes: (Roughly) 1 minute MPEG movie.
20 Megabytes: Typical hard drive in the first desktop PCs
100 Megabytes: Roughly the text info contained in 1 meter (3 feet) of bookshelf
750 Megabytes: 1 CD. The Human genome (props to Max for his comment below)

Gigabyte


1,073,741,824 bytes,
1 Gigabyte: The bed of a pickup truck filled with paper.
7 Gigabytes: 1 DVD
10 Gigabytes: A 1 inch stack of CD's
28 Gigabytes: Tweets Per Day on Twitter as of Jan 1, 2011
30 Gigabytes: From My Life In A Terabyte - Roughly the entire collection of Gordon Bell's Gordon Bell's articles, books, correspondence (letters and email), CD's, memos, papers, photos, pictures, presentations, home movies, videotaped lectures, and voice recordings by 2003.
400 Gigabytes: 20,000 trees made into paper and printed.
500 Gigabytes: 100 DVD Movies

Terabyte


1 Terabyte: 1000 gigabytes
An 8 foot stack of CD's or about 150 DVD's. It would hold all 350 episodes of The Simpsons or all 238 episodes of Friends. About 2 years non stop MP3s. About 50,000 trees made into paper would be needed to print out a Terabyte of data. 250 million pages printed both sides, over 10 miles high. Roughly 250,000 MP3s (2 years non-stop listening). About 2 weeks of non-stop DVD movies. 500,000 digital camera pictures.
4 Terabytes: The YouTube record of U.S. user names and IP addresses including every record of every video watched by them as of 2008.
10 Terabytes: Enough to store everything you look at for a year, and could include a heart monitor, personal GPS, everything you type and every move of your mouse. From Charlie's Diary Shaping the Future.
45 Terabytes: All the videos on YouTube as of Aug 2006
100 Terabytes: High guess of Human brain storage space. The monthly growth of The Internet Archive in 2009. From this Google search
122 Terabytes: The size of one days Web Page views from Google in 2009 (7.2 Billion daily page views) X 17 Kilobytes (the size of the average web page).
150 Terabytes: Estimated size of all Web pages indexed by Google on Dec 8th 2005 (not including databases or video). (See this article for the figures I used)

Petabyte


1 Petabyte: 1 thousand Terabytes. Storage at this level signals the dawn of a new era with powerful implications to the sciences and Artificial Intelligence.
About 100 years of television. The amount of data storage space the Internet Archive had in 2004.
Roughly the amount of new video added to youTube every day in 2007
A stack of CD's 3 kilometers high
2 Petabytes: The amount of data Google processed every day in 2008
3.5 Petabytes: 2007 Estimated capacity of Google's Data centers in a box.
4 Petabytes: Estimated amount of Internet data stored in RAM by Google in 2006.
4.5 Petabytes: The capacity of The Internet Archive as of 2009. From this Google search
15 Petabytes: The amount of data the Large Hadron Collider (LHC) generates per year as of 2008.
20 Petabytes: Google daily total workload in January 2008. The storage capacity of all hard drives produced in 1995.
60 Petabytes: Estimated total size of Flickr photos by December 2011
200 Petabytes: The estimated amount of data contained within the Googleplex in 2006.

Exabyte


1000 Petabytes.
2.2 Exabytes: According to Charles Stross, all data recorded by our species in 2003
246 Exabytes: Total storage of the Internet in 2006.

Zettabyte


1000 Exabytes
1.8 Zettabytes: Estimated amount of total electronic data in existence by 2011

Yottabyte


1000 Zetabytes.

12 comments:

Anonymous said...

very interesting and useful - thanks!

-jay

Steve said...

Fascinating info! Thanks for sharing this as it gives us a real world perpective on what these various data points actually mean in "terms" that everyone can relate to.

Steve

Pluto said...

To have two hi-definition camcorders record the video and audio of every moment of your life, waking and sleeping, from the moment you were born until your 100 year birthday would fill 21 petabytes.

Anonymous said...

Excellent list of examples putting things into such perspective that even the Wife now understands!

Very useful, thanks...

KJ. (UK)

Anonymous said...

Unless I'm doing my math wrong, Pluto's comment implies that hi-definition video requires 12.57 Gigabytes/hour, which is about what standard definition video requires...

(21 * 1024 * 1024) / (2 * 100 * 365 * 24)

Anonymous said...

You might want to adjust the lenght of the human genome from 750MB to 3GB.

"Since the human genome is 3 billion base pairs long, 3 gigabytes of computer data storage space are needed to store the entire genome."

Refrence:
http://www.ornl.gov/sci/techresources/Human_Genome/faq/faqs1.shtml

Max said...

Not really correct. One nucleotide has only the letters ACTG, so you need only 2 bits for one nucleotide. The human genome has 3 billion nucleotides but you need only 750 MB to store it. As our harddisks are big enough these days, we don't care about compression and just use a normal word-like textfile which has a size of 3 GB but could be easily compressed (NIB or 2BIT-formats, only the UCSC genome browser is using them) but most people don't do this.

RodMunday said...

I work in television and the data needed for a 1 hour standard definition program is considerably more than the 50 Gigabytes for 26 hours you quote. In fact one hour of uncompressed standard definition TV uses 100 Gigabytes storage capacity in an Avid media composer.

Anonymous said...

I think you need to adjust your definitions in the Bits section...only one bit is needed to represent two choices (because one bit is either a 0 or a 1). You can't represent all colours in just 3 bits--that only gives you eight colours!

provashi jack said...

it's really a information full post. thanks to shear . this post has removed my some mistaken thing . i thing if you bring on your acctivetice you will achive much popularety.. at last..thanks.
Information visualization Low

mahasiswa teladan said...

hi..Im student from Informatics engineering, this article is very informative, thanks for sharing :)

Ben Haslett said...

Very useful when I was searching for analogies for various amounts of data. Boiling an abstract concept down to X Simpsons episodes is genius!