The Internet Archive

The Internet Archive is a non-profit organization working toward establishing permanent online access to digitized historical collections for future generations.  It was founded in 1996 in San Francisco and collaborates with institutions such as the Library of Congress and the Smithsonian to build content.  Many of the materials found on the Internet Archive were created online or intended for computer applications, making them difficult to come by in conventional libraries.  The Internet Archive has collected three petabytes of data thus far and accrues another 100 terabytes monthly. The Internet Archive is intended as a resource for researchers, scholars and the general public.  While some basic computer programming knowledge is required to access the materials, the following types of information are available from its web site:

The Wayback Machine (archived web pages)

The Internet Archive's most extensive collection is the Wayback Machine, which is a catalogue of more than 150 billion web pages published online since 1996.  After entering a specific web address in the search box, users may view that web page on any date that it was recorded in the archive.  Searching archives for "nytimes.com," for example, returns 1,316 screenshot results archived since 1996.  The "compare dates" feature enables users to select pages from two different dates and view them together for changes.  The advanced search narrows the query to a specific date or range of dates for a web address, but a keyword search has not yet been developed.  Pages can be saved as screenshots, or sent to an external service to convert the pages to PDF format.

Several collections of aggregated web pages related to a single topic (i.e. an election year or a major current event) are available for browsing, as well as a collection that includes 2 billion pages assembled during a world wide web crawl in 2007.

Text (books, articles)

The Internet Archive houses over 2.3 million text publications, which can be viewed in a web page or saved as PDFs.  Most files can also be downloaded in specialized text formats, including EPUB, Kindle/Mobi, Daisy, Full Text or DiVu.  For more information about downloading and the available text formats, read here.

Moving images (television programs, movies, news, broadcasts, vlogs)

Many of the 292,000 video uploads can be streamed from the Internet Archives' web site, and coding is available for embedding the videos in other web pages.  Alternatively, the movies can be downloaded as Ogg Video, MPEG1 (VCD), MPEG2 (DVD), MPEG4 (Quicktime) files.  The Internet Archive recommends VLC Media Player or Quicktime as the most compatible video players for viewing downloads.  For a sample of moving image content, there is an option to download animated GIF or thumbnail images.  More information about the Moving Image Archive is available here.

Audio (radio, podcasts, audio books, music)

The Internet Archive has aggregated more than 575,000 items in its Audio Archive.  These files can be streamed in two different M3U speeds as well as downloaded in two MP3 sizes or Ogg Vorbis format.  Again, VLC Media Player and Quicktime are suggested for listening to downloaded items.

Software (rare computer programs, games, USGS DRG maps)

The majority of the 34,000 programs available in the Software Archive come from the Tucows Software Library.  Users can view JPEG or GIF screenshots to preview software before downloading EXE and XML files to run it on a personal computer.

For more information about the Internet Archive and its resources, read here.