Concordia.ca   /   Library   /   Research   /   Digital Preservation   /   Web archiving

Finding web archives

The web as a primary source

“The Internet is the largest and fastest-growing contemporary cultural text. It captures social actions and interactions. It provides more up-to-date information about the world than any traditional media source. Web resources are unique digital documentary heritage that may not be duplicated in any other medium. This makes digital preservation of web resources via web archiving essential to the continuity of the nation’s memory, culture, and history” (Library and Archives Canada).

Back to top

Web archives

The following is a list of significant web archives covering a range of types of materials and topics.

Government of Canada Web Archive
The Government of Canada Web Archive provides access to Canadian web content collected and preserved by Library and Archives Canada. The archive features full-text search functionality and provides access to Federal government web content dating back to 2005. It also features curated thematic collections, including websites related to the COVID-19 pandemic in Canada, Federal Royal Commissions and Commissions of Inquiry, and websites of organizations connected with the Truth and Reconciliation Commission of Canada.

Bibliothèque et archives nationales du Québec (BanQ) web collection
The collection gives access to a selection of Quebec websites archived by the BanQ, including the websites of public bodies, news websites, and websites related to curated themes or topics. Content is primarily in French.

Archive-It collections
The Internet Archive’s Archive-It is a subscription service that allows institutions to archive and provide access to collections of web content. Archive-It provides full text search capability for all public collections. The public can browse and search collections by keywords, subject, and organization type. Since 2006, Archive-It has provided web archiving services to over 800 organizations in over 24 countries, including libraries, cultural memory and research institutions, social impact and community groups, and educational and open knowledge initiatives.

Rhizome ArtBase
The Rhizome ArtBase archive, established in 1999, is an archive of net art and other born-digital artworks from 1983 to the present day. Users can browse the archive by date or by artist name.

Web Archives for Longitudinal Knowledge (WALK) Portal
This is a prototype site for a national web archives portal, featuring the University of Toronto's Canadian Political Parties and Political Interest Groups collection. This website allows you to search content from 50 political parties and political interest groups, from October 2005 to March 2015. Search options include keyword searching, graphing trends over time, and advanced search features, including words in proximity to each other.

Library of Congress Web Archive
The Library of Congress web archives are organized in thematic and event-based collections, and contain websites documenting a variety of U.S. and international organizations representing a broad range of subjects and topic areas. Examples include select U.S. government sites from the Legislative, Judicial, and Executive branch agencies; select foreign government sites; campaign websites and political parties documenting U.S. and select foreign elections; non-profit organizations; journalism and news; creative sites such as those documenting comics, music, authors, and art; legal sites; and international organizations. While most web archives are collected as a part of one or more event or thematic archives, the Library also preserves other sites within its general web archives.

Back to top

Web archives as data

Archives Unleashed Toolkit
The Archives Unleashed Toolkit is an open-source platform for analyzing web archives built on Apache Spark, which provides powerful tools for analytics and data processing.

Back to top
Back to top arrow up, go to top of page