0% found this document useful (0 votes)

73 views15 pages

The Evolution of Web Archiving

Uploaded by

Mad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views15 pages

The Evolution of Web Archiving

Uploaded by

Mad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Int J Digit Libr

DOI 10.1007/s00799-016-0171-9

The evolution of web archiving

Miguel Costa1 · Daniel Gomes2 · Mário J. Silva3

Received: 1 May 2015 / Revised: 12 April 2016 / Accepted: 12 April 2016

Abstract Web archives preserve information published on 1 Introduction

the web or digitized from printed publications. Much of this
information is unique and historically valuable. However, the The world wide web has a democratic nature, where every-
lack of knowledge about the global status of web archiving one can publish all kinds of information using different
initiatives hamper their improvement and collaboration. To types of media. News, blogs, wikis, encyclopedias, photos,
overcome this problem, we conducted two surveys, in 2010 interviews and public opinions are just a few examples of
and 2014, which provide a comprehensive characterization this vast list. Part of this information is unique and histori-
on web archiving initiatives and their evolution. We iden- cally valuable. For instance, the speech of a president after
tified several patterns and trends that highlight challenges winning an election or the announcement of an imminent
and opportunities. We discuss these patterns and trends that invasion of a foreign country, might become as valuable in
enable to define strategies, estimate resources and provide the future as ancient manuscripts are today. However, since
guidelines for research and development of better technol- the web is so dynamic, a large amount of information is lost
ogy. Our results show that during the last years there was a everyday. Several studies quantify this loss: 80 % of web
significant growth in initiatives and countries hosting these pages are not available in their original form after 1 year
initiatives, volume of data and number of contents preserved. [1]; 13 % of web references in scholarly articles disappear
While this indicates that the web archiving community is ded- after 27 months [2]; 11 % of social media resources, such
icating a growing effort on preserving digital information, as the ones posted on Twitter, are lost after 1 year [3]. All
other results presented throughout the paper raise concerns this information will likely vanish in a few years, creating
such as the small amount of archived data in comparison with a knowledge gap about the present for future generations.
the amount of data that is being published online. We are already experiencing unsatisfied information needs
due to missing pages or old formats of documents that are
Keywords Web archiving · Digital preservation · Survey not readable by the latest software version.1 Pioneers of the
Internet, such as Vint Cerf, recently warned about the dan-
ger of future generations who will have little or no record
of the twenty-first century.2 International organizations are
also concerned with the web ephemerality problem. The
UNESCO recognized the importance of digital preservation
B Miguel Costa in 2003, by stating that the disappearance of digital infor-
migcosta@gmail.com
mation constitutes an impoverishment of the heritage of all
1 Departamento de Informática, Faculdade de Ciências, nations [4]. In 2010, the UNESCO endorsed the Universal
Universidade de Lisboa, Lisbon, Portugal Declaration on Archives, which states that archives play an
2 Foundation for National Scientific Computing, Lisbon, essential role in the development of societies by safeguard-
Portugal
3 1 http://en.wikipedia.org/wiki/Digital_obsolescence.
INESC-ID, Instituto Superior Técnico, Universidade de
Lisboa, Lisbon, Portugal 2 http://www.bbc.com/news/science-environment-31450389.

123
M. Costa et al.

ing and contributing to individual and community memory ogy. We also compared our two surveys against the results
[5]. It is, therefore, important to preserve these data, not only obtained from other surveys whenever possible.
for historical and social research [6–12], but also to support The analysis evidences a significant growth in the num-
current technology, such as assessing the trustworthiness of ber of initiatives, countries hosting these initiatives, volume
statements [13], detecting web spam [14], improving web of data and number of contents preserved, which indicates
information retrieval [15] or forecasting events [16]. a growing effort that has been employed by the web archiv-
At least 68 web archiving initiatives undertaken by ing community to preserve the web. A cause for concern is
national libraries, national archives, private companies and the small amount of archived data in comparison with the
consortia of organizations are acquiring and preserving parts amount of data being published on the web. This will likely
of the web. Together, they hold more than 534 billion files originate a knowledge gap about the present time. On the
(17 PB) and this number continues to grow as new initiatives other hand, the amount of archived data is larger and grows
arise. Some country code top-level domains and thematic faster than the amount processed by any commercial web
collections are being archived regularly,3 while other col- search engine, which raises scalability challenges in giving
lections related to important events, such as September 11, efficient and effective data access. In fact, the search tools
are created at particular points in time.4 Web archives also have not changed in the last years, being essentially based
contribute to the preservation of content born in non-digital on commonly used web search technology that does not take
formats that were afterwards digitized and published online, into account the specificities of web archiving. These tools
such as The Times Archive5 with news since 1785. As result, have a poor performance and greatly affect the finding of
web archives contain often millions or billions of archived historical information [18].
documents and cover decades or even centuries in the case The remainder of this paper is organized as follows. Sec-
of digitized publications. The historic interest in these docu- tion 2 describes the background and covers related work.
ments is also growing as they age, becoming a unique source Section 3 describes the methodology for conducting the sur-
of past information for widely diverse areas, such as soci- veys on web archiving initiatives in 2010 and 2014. Section 4
ology, history, anthropology, politics, journalism, linguistics presents the results obtained in the surveys and the analysis of
or marketing. the advancements made in web archiving during that period.
However, despite the existence of web archives since 1996 Section 5 finalizes with the conclusions.
and their joint efforts to preserve digital information, infor-
mation about web archiving initiatives and the services they
provide is scarce. Without knowing the status of current web 2 Related work
archiving it is impossible to understand its strengths, lim-
itations and the developments that are still needed to turn Cultural heritage institutions, such as museums, libraries and
these document repositories into useful sources of informa- archives, have been preserving the intangible culture of our
tion. Without knowing the preferences, trends and needs of society (e.g., folklore, traditions, language) and the legacy
the web archiving community it is difficult to adapt current of physical artifacts (e.g., monuments, books, works of art).
technology to the emerging challenges and develop strate- Web archives are a novel form of cultural heritage institutions
gies to anticipate future problems. Motivated by this lack of mandated to preserve similar artifacts. However, the artifacts
knowledge in the research community, we conducted two sur- of web archives are born-digital and digitized contents.
veys to gather results about existing web archiving initiatives Web archives are a special type of digital libraries. Both
across the globe. The first survey, already published, pro- share the responsibility of preserving information for future
vided a comprehensive characterization of world wide web generations. This includes all types of multimedia, such as
archiving initiatives in 2010 [17]. The second survey was images and videos, besides the digital counterparts of printed
carried out in 2014 and provides an updated characterization documents. The main difference is that web archives usu-
of these initiatives. Both surveys analyzed the same metrics, ally grow to a data size that exceeds traditional organization
which enabled to study the evolution of the characteristics of and management of typical digital libraries. Digital libraries
web archiving initiatives, such as the location, creation year, are based on meta-data describing manually curated artifacts
selection policy, used formats, number of people engaged, and catalogs of these artifacts, which are usually used to
volume of archived data, access type and employed technol- explore and search digital collections, for instance, through
faceted search. However, the experience from the Pandora
(National Library of Australia)6 and the Minerva (Library of
3 E.g., Internet Archive available at http://www.archive.org. Congress)7 projects showed that this is not a viable option for
4 E.g., Library of Congress Web Archives available at http://www.loc.
gov/minerva. 6 http://pandora.nla.gov.au.
5 http://www.thetimes.co.uk/tto/archive/. 7 http://www.loc.gov/minerva.

123
The evolution of web archiving

Fig. 1 A version of 1992 of the first web site. This earliest version found at CERN describes the world wide web project

web archives. The size of the web makes traditional methods the International Internet Preservation Consortium (IIPC),
for cataloging too time consuming and expensive, beyond the which leads the development of several open-source tools,
capability of libraries staff. One of the conclusions from the standards and best practices for web archiving [21]. A time
final report of the Minerva project is that automatic index- line of some of these initiatives can be obtained online.9
ing should be the primary strategy for information discovery Previous initiatives archived a large number of web sites
[19]. according to some selection policy. In addition to these, there
The first web site, presented in Fig. 1, was created by are services that enable any person to permanently archive a
Tim Berners-Lee at the European Organisation for Nuclear web page given a URL, such as Perma.cc,10 WebCitation11 or
Research (CERN) and published in August 1991. This site Archive.is.12 Each archived page receives a unique link, such
describes the basis of the world wide web and is back online as a Digital Object Identifier, to direct readers to its original
at its original URL.8 The first web archives appeared only version that will remain available online. Several user needs
in 1996 and do not contain sites prior to this date with the are met by these services, such as scholars preserving web
exception of some pages recovered from backups stored in pages cited in their work [22] or Supreme Courts preserving
floppy disks or CDs. The Internet Archive, a USA-based non- citations in their published decisions [23].
profit foundation, was one of the first web archives and has
been broadly archiving the web since 1996. It leads the most 2.1 Data access
ambitious initiative. In 2013, the Internet Archive was pre-
serving 240 billion archived documents with a total of about Much of the effort on web archive development focuses on
5 PB of data [20]. In 2014, it held 376 billion archived web acquiring, storing, managing and preserving data [19]. How-
pages, which represent 13.8 PB of data. The Pandora and ever, data must also be accessible to users who need to exploit
Tasmanian web archives from Australia, and the Kulturarw3 and analyze them. Due to the challenge of indexing all the col-
web archive from Sweden, were also created in 1996. Many
other initiatives followed since then and a significant effort
9 http://timeline.webarchivists.org.
has been employed by the research community in the web
10 https://perma.cc/.
archiving domain. Many of these initiatives are members of
11 http://webcitation.org/.
8 http://info.cern.ch/hypertext/WWW/TheProject.html. 12 http://archive.is/.

123
M. Costa et al.

Fig. 2 User interface of the Internet Archive’s Wayback Machine

lected data, the prevalent discovery method in web archives is of web archives support this type of search [26]. However,
based on URL search, which returns a list of chronologically URL search is limited, as it forces the users to remember the
ordered versions for a given URL, such as in the Internet URLs, some of which refer to content that ceased to exist
Archive’s Wayback Machine [24,25]. Figure 2 depicts the many years ago.
user interface of the Wayback Machine after searching a Another type of access is meta-data search, i.e., the search
URL. A survey on European web archives reported that 68 % by meta-data attributes, such as category or theme. Meta-

123
The evolution of web archiving

Fig. 3 Time Explorer application

data search is provided by 65 % of European web archives lytical tools are being researched to fulfill informational
[26]. For instance, the Library of Congress Web Archives13 needs for specific users requiring richer answers such as
supports search on bibliographic records. Some web archives historians or journalists [32,33]. Such tools would help to
support filtering results by domain and media type, while explain the stories of the past and predicting future events
others organize collections by subject or genre to provide through the analysis and modeling of the evolution of data.
browsing functionality, such as the Pandora Australia’s web Web archives are an exceptional data source to extract and
archive [27]. Most web archives support narrowing the search leverage this evolution. A good example is the work of
results by date range. Leskovec et al. who tracked short units of information (e.g.,
Full-text search has become the dominant form of infor- phrases) from news as they spread across the web and evolve
mation discovery, especially in web search systems such as throughout time [34]. This tracking provided a coherent rep-
Google. These systems have a strong influence on the way resentation of the news cycle, showing the rise and decline
users search in other settings. This explains why full-text of main topics in the media. Another example is the work of
search was reported as the most desired web archive func- Radinsky and Horvitz who mined news and the web to pre-
tionality [28] and the most used when supported [29]. Despite dict future events [16]. For instance, they found a relationship
the high computational resources required for this purpose, between droughts and storms in Angola that catalyze cholera
70 % of the European web archives surveyed support full-text outbreaks. Anticipating these events may have a huge impact
search for at least a part of their collections. Still, previous on world populations. Hoffart et al. built a large knowledge
studies showed that the search services provided by these base in which entities, facts, and events are anchored in both
web archives are poor and frequently deemed unsatisfactory time and space [35]. Web archives can be the source to extract
[18,30]. these data, which will then be used for temporal analysis.
There are several access tools created for web archiving. For instance, since the veracity of facts is time dependent,
The site14 of the International Internet Preservation Con- it would be interesting to identify whether and when they
sortium (IIPC) has a list with many tools for acquisition, become inaccurate.
curation, storage and access. Thomas et al. present a com- Novel types of interfaces are also being researched to sup-
prehensive list of available tools and services that can be used port data analysis over time. The Time Explorer, depicted in
in web archives [31]. Fig. 3, combines several interfaces integrated in the same
application designed for analyzing how topics evolve over
2.2 Data analysis time [36]. The core of the interface is a time line with the
main titles extracted from the news and a frequency graph
The existing search tools require a substantial human effort with the number of news and entities most frequently asso-
when exploring and analyzing complex topics. Hence, ana- ciated with a given query displayed over the time axis. The
interface also displays a list of the most representative enti-
13
ties (people and locations) that occur on matching news and
http://www.loc.gov/webarchiving.
14
that can be used to narrow the search. The Zoetrope system
http://www.netpreserve.org/web-archiving/tools-and-software.

123
M. Costa et al.

also enables exploring archived data [37]. It introduces the 3 Methodology

concept of lenses that can be placed on any part of a web
page to see all its previous versions. These lenses can be fil- During October 2010, we gathered information from web
tered by queries and time, and combined with other lenses to archiving initiatives across the globe [17]. We read the offi-
compare and analyze archived data (e.g., check traffic maps cial sites of known web archive initiatives and published
at 6 p.m. on rainy days). There are other examples, such as documentation, but had little success because the published
the visualization resources offered by the UK web archive,15 information was frequently insufficient or obsolete. Plus,
which include N -gram charts of the occurrence of terms or many official sites were exclusively available in the native
phrases over time and tag clouds of content written on web language of the hosting country (e.g., Chinese) and automatic
sites. Browser plug-ins that highlight changes between pages, translation tools were insufficient to obtain the required infor-
such as the DiffIE Add-on for Internet Explorer, are also of mation. Thus, we decided to contact directly the community
great help for data analysis [38]. to obtain answers to the following questions:

1. What is the name of your web archiving initiative (please

2.3 Research projects
state if you want to remain anonymous)?
2. How many people work at your web archive (in person-
Several research projects have been initiated for improv-
month)?
ing web archiving technologies. The Living Web Archives
3. Which is the amount of data that you have archived (num-
(LiWA) aimed to provide contributions to make archived
ber of files, disk space occupied)?
information accessible and addressed IR challenges, such
as web spam detection, terminology evolution, capture of
stream video, and assuring temporal coherence of archived The questions were sent to a web archive discussion list,
content [39]. LiWA was followed by the Longitudinal published on the site of the Portuguese Web Archive and
Analytics of Web Archive data (LAWA), which aimed to disseminated through its communication channels (Twit-
build an experimental testbed for large-scale data analyt- ter, Facebook, RSS). We obtained 27 answers. Then, we
ics [40]. Particular emphasis is given to developing tools sent direct e-mails to the remaining web archives referenced
for aggregating, querying and analyzing web archive data by the International Internet Preservation Consortium [21],
that have been crawled over extended time periods. The National Library of Australia in its Preserving Access to
Web Archive Retrieval Tools (WebART) project focus on Digital Information (PADI) page18 and International Web
the development of web archive access tools specifically tai- Archiving Workshops.19 We were able to establish contact
lored to facilitate research in humanities and social sciences and obtain direct answers from 33 web archiving initiatives.
[41]. The Collect-all ARchives to COmmunity MEMories Finally, we distributed the collected data among the respon-
(ARCOMEM) project was about developing innovative tools dents for validation.
and methods to help preserve and exploit the social web The methodology used in this research enabled web
[42], while the SCAPE project16 addressed solutions for archivists to openly present information about their initia-
large-scale digital preservation. The Memento project adds tives. For some situations, we had to actively interact with
a temporal dimension to the HTTP protocol so that archived the respondents to clarify our intents and obtain the required
versions of a document can be served by the web server hold- information. We observed that terminology and language bar-
ing that document or by existing web archives if the web riers led to different interpretations of the questions by the
server does not contain the requested versions [43]. Users respondents, who involuntarily provided inaccurate answers.
only have to install a browser plug-in, which makes this an For instance, we assumed in the third question that each
easy solution to adopt. Users can also search via the Time archived file was the result of a successful HTTP download
Travel portal17 across several web archives. This portal works (e.g., page, image or video), but some respondents inter-
as a metasearch engine for web archives. Old versions of web preted it as the number of files created to store web content
pages can be reconstructed by combining parts returned by in bulk, such as files in ARC format [44]. The post hoc statis-
web archives that support the Memento’s API, which enables tical analysis of the obtained answers enabled the detection
the integration of archived content and cooperation among of abnormal values and correction of these errors through
web archives. interaction with the respondents. We believe that the adopted
methodology enabled the extraction of more accurate infor-
mation and valuable insights about web archiving initiatives
15 http://www.webarchive.org.uk/ukwa/visualisation.
16 http://www.scape-project.eu. 18 http://www.nla.gov.au/padi.
17 http://timetravel.mementoweb.org. 19 http://iwaw.europarchive.org.

123
The evolution of web archiving

Fig. 4 Wikipedia page with list of web archiving initiatives

than a typical one-shot online survey with closed answers. (NDSA) in 2011 and 2013, and they covered organizations
However, the cost of processing the results for statistical of the USA involved or planning to archive content from
analysis was significantly higher. the web [45,46]. These surveys are referred to from now on
This survey was published in 2011 [17]. The data col- as the NDSA2011 and NDSA2013 surveys. In this paper,
lected and validated enabled the creation of a Wikipedia page we analyze and compare the results of the surveys whenever
named List of Web Archiving Initiatives,20 so that the pub- possible, despite our surveys having covered world wide web
lished information could be collaboratively kept up-to-date. archiving initiatives, while the IMF2010 survey focused just
Since then, the web archiving community has been updating on initiatives from Europe and the NDSA surveys on initia-
this information, making it a useful resource. Figure 4 shows tives from the USA. Still, all surveys took place between 2010
the Wikipedia page that contains three tables populated with and 2014, which makes their results comparable in time.
information about the web archiving initiatives, such as their
name, country, creation year, employed technologies, num-
ber of employees, number and volume of archived contents, 4 Results
archived formats, type of crawl and access methods.
4.1 Web archiving initiatives
To observe how web archiving changed since the first sur-
vey, in 2014 we conducted the same analysis on the data
Table 1 shows general statistics about web archiving initia-
published in the Wikipedia page and compared it against the
tives surveyed in 2010 and 2014. Web archiving initiatives
results of 2010. In case of doubt or lack of information, we
are very heterogeneous in size and scope. For instance, the
consulted the official sites of the initiatives or their scientific
web archive (WA) of Čačak aims to preserve sites related to
publications.
this Serbian city, while the Internet Archive has the objec-
tive of archiving the global web. The obtained results show
3.1 Comparison with other surveys
that web archives exclusively hold content related to their
hosting country, region or institution. However, there are a
After our first survey in 2010, three other surveys were
few initiatives, such as the Internet Memory Foundation and
conducted on web archiving which obtained related infor-
the Portuguese Web Archive, that also preserve information
mation, such as the access type provided by the initiatives
related to foreign countries.
and the technology used to support them. The first survey
was conducted by the Internet Memory Foundation over Table 1 General statistics of web archiving initiatives
European web archives in 2010, from now on referred to
Characteristics 2010 2014 (%)
as the IMF2010 survey [26]. The second and third surveys
were conducted by the National Digital Stewardship Alliance Total initiatives 42 68 +61.9
Countries hosting initiatives 26 33 +26.9
20 http://en.wikipedia.org/wiki/List_of_Web_Archiving_Initiatives.

123
M. Costa et al.

We detected an increase in the number of web archiving Table 2 Staff statistics of web archiving initiatives
initiatives, from 42 in 2010 to 68 in 2014. Since the creation Characteristics 2010 2014
and operation of a web archive is complex and costly, sev-
eral initiatives exist to provide web archiving services (WAS) Total people (full time) 112 108 −3.6 %
that can be independently operated by third-party archivists Total people (part-time) 166 197 +18.7 %
to harvest, build and preserve collections of digital content. Total people 278 305 +9.7 %
These WAS enable focused archiving of web content by orga- Median people (full time) 2.5 2 −20.0 %
nizations, such as universities or libraries, that otherwise Median people (part-time) 2 2 0.0 %
could not manage their own archives. In 2014, there were Average people (full time) 3.5 2.2 −37.1 %
11 initiatives (16 %) providing WAS against the previous Average people (part-time) 5 4 −20.0 %
3 (7 %) offered in 2010. Some of these new WAS are the
Aleph Archives,21 Hanzo Archives22 and Reed Archives.23
The oldest WAS are the Archive-It,24 ArchiveTheNet25 and
In 2014, the size of the teams continued to be highly vari-
Web Archiving Service.26 Of the 11 WAS, 6 operate in the
able, where initiatives had teams without any person working
USA, where most of them offer electronic discovery (edis-
in full time, such as the University of Texas at San Antonio
covery) services for enterprises, which are required by law
WA, while other teams had 12 people working in full time,
since 2006 for the discovery of information in civil litigation
such as the Internet Archive, or 80 people working in part-
or government investigations. In 2014, at least 19 % of the ini-
time, such as the Library of Congress. As shown in Table 2,
tiatives were using WAS. In 2010, this percentage was 16 %.
in 2014, the web archiving initiatives had in total 108 peo-
ple working in full time and 197 in part-time. There was
4.1.1 Human resources an increase from 278 to 305 people working in this area.
The teams continued to be mostly small, having a median
The measurement of human resources engaged in web staff of 2 people in full time (average of 2.2) and 2 people
archiving activities was not straightforward (question 2). in part-time (average of 4). There were 3 initiatives that did
Most respondents could not provide an effort measurement not have any person dedicated full time, against the 11 of
in person-month. The presented reasons were that the teams 2010. Despite the large increase of the number of initiatives,
were too variable and some services were hired to third-party the total number of people working on them increased only
organizations out of their control. Instead, most of the respon- slightly, which led to a decrease in the median and average
dents described their staff and hiring conditions. The obtained team size. The NDSA2013 survey shows a different reality
results of 2010 show that web archiving engaged at least 112 with less people working in web archiving. The USA initia-
people in full time and 166 in part-time. The web archive tives have a median staff of 0.25 people in full time. Only
teams were typically small, presenting a median staff of 2.5 19 % of the USA initiatives devote at least one person to
people in full time (average of 3.5) and 2 people in part-time handle web archiving tasks. The small size of the teams are
(average of 5). The staff was mostly composed of librarians likely due to the high percentage of initiatives that use WAS
and computer engineers. The results show that 11 initiatives instead of running their own web archiving system.
(26 %) did not have any person dedicated full time. The effort
of part-time workers was variable, for instance, at the Library
of Congress they spent only a few hours a month. Most of 4.1.2 Geographic location
the human resources were invested on data acquisition and
quality control. The IMF2010 survey corroborates that web Figure 5a presents the countries that hosted web archiving
archive teams are small, but the number of staff depends on initiatives in 2010. The 42 initiatives were spread across
the phase of the project. Its results show that 38 % of fully 26 countries. There were 23 initiatives hosted in Europe,
operational initiatives count more than five full-time employ- 10 in North America, 6 in Asia and 3 in Oceania. Half
ees, while 67 % that started a project count between two and of the initiatives were hosted in countries belonging to the
five employees. Organisation for Economic Co-operation and Development
(OECD). From the 34 countries that belong to the OECD,
21 (62 %) hosted at least one web archiving initiative, which
21 http://aleph-archives.com/. is an indicator of the importance of web archiving in devel-
22 http://www.hanzoarchives.com/. oped countries. Most of the countries hosted one (74 %) or
23 http://www.reedarchives.com/. two initiatives (22 %). The only country that hosted more than
24 http://www.archive-it.org. two was the USA with a total of nine initiatives. Although
25 http://archivethe.net. being part of a country, initiatives like the Tasmanian WA
26 http://webarchives.cdlib.org. (Australia), North Carolina WA (USA) or Digital Heritage

123
The evolution of web archiving

Fig. 5 Countries hosting web

archiving initiatives in a 2010
and b 2014 (in green) (color
figure online)

Catalonia (Spain) were hosted at autonomous states and ica (previously 10), 8 in Asia (previously 6), 3 in Oceania
aimed at preserving regional content. (equal) and 1 in Africa (previously 0). Notice that some ini-
Figure 5b presents the location of all countries hosting tiatives have more than one location. There were increases in
web archiving initiatives in 2014. The 68 web archiving ini- almost all continents, especially in Europe and North Amer-
tiatives are spread by 33 countries. In 2010, there were only ica. Africa received its first initiative hosted in Egypt, while
26 countries hosting web archiving initiatives, which shows South America does not have any yet.
a growing awareness of the importance of web archiving all When comparing the number and location of initiatives
over the world. The USA continues to be the country with the with other surveys, we detected that many were missing. The
most initiatives, increasing from 9 in 2010 to 19 in 2014. The IMF2010 survey found 41 European initiatives fully opera-
second country with most initiatives is France, with five ini- tional in 2010, while we found 38 in 2014. The NDSA2011
tiatives. Germany and Switzerland share the third place with and NDSA2013 surveys found 49 and 64 active initiatives in
four initiatives each. The distribution of the initiatives over the USA, but we found only 19 in 2014. This difference is
the world is 38 in Europe (previously 23), 22 in North Amer- mostly due to college and universities, i.e., 36 in 2011 and

123
M. Costa et al.

Fig. 6 Cumulative number of 70

initiatives created per year

cumulave nr. of iniaves

1996

1997

1999

2000

2001

2002

2004

2005

2007

2008

2009

2010

2012

2013
1998

2003

2006

2011
creaon year

48 in 2013, included in the NDSA surveys and that were not archives selected specific sites for archiving. This selection
included in our surveys. Future surveys should make an effort is determined by multiple factors such as consent by the
to cover all these initiatives. Nevertheless, both NDSA and authors or relevance for inclusion in thematic collections
our surveys show a growing trend of initiatives. (e.g., elections or natural disasters). However, 80 % of the
web archives exclusively held content related to their host-
4.1.3 Growth ing country, region or institution. Of the 42 initiatives, 11
(26 %) also performed broad crawls of the web, including all
Figure 6 displays the evolution of the number of web archiv- sites hosted under a given domain name or geographical loca-
ing initiatives created per year, including the new initiatives tion. The IMF2010 survey reported that 23 % of European
recorded on the Wikipedia page. There was a growth from web archives run domain crawls, while 71 % performed the-
4 initiatives in 1996 to 14 initiatives in 2003, which repre- matic or selective crawls. The NDSA2011 survey reported
sents an average of 1.8 new initiatives per year. After 2003, that all USA initiatives archived web content from their own
many new initiatives appeared to solve the web ephemeral- institution, as well as content from other organizations or
ity problem. For instance, in 2005 and 2007, nine and eight individuals for future research.
initiatives were created, respectively. There was an average Our results show that in 2014, at least 45 initiatives (66 %)
growth of 5.4 initiatives per year from 2004 to 2012. There is performed selective crawls and 20 (29 %) country code top-
no information on new initiatives created in 2013. One possi- level domain (ccTLD) or broad crawls of the web. Almost
ble explanation for the significant and constant growth since all initiatives continue to exclusively hold content related
2003 was the concern raised by the United Nations Edu- to their hosting country, region or institution. There are three
cational, Scientific and Cultural Organization (UNESCO) initiatives that archive ccTLD of other countries besides their
regarding the preservation of the digital heritage [4]. The own. The Internet Archive and the Internet Memory Founda-
NDSA2013 survey also shows a constant growth, especially tion share a vision to preserve web content from all over the
between 2006 and 2013, when there was a great increase world. The Portuguese Web Archive preserves content from
of initiatives mainly due to universities starting their web 4 countries that have Portuguese as their official language.
archiving programs. Universities created 39 (out of 67) ini-
tiatives during these 8 years, which indicates an emergent 4.2.2 Volume size
awareness in the academic community of the USA about the
importance of preserving web content. Figure 7 presents the distribution of the size of archived
collections measured in total volume of data and number
4.2 Archived data of contents. Notice that one HTML page containing three
embedded images results in the archive of four contents.
4.2.1 Selection policy There was an increase of initiatives with collections between
10 and 100 TB in detriment of collections between 1 and
Since the resources are scarce and not all the web can be 10 TB. While in 2010, 50 % of the initiatives preserved col-
preserved, the selection policy of most web archiving ini- lections smaller than 10 TB and 31 % preserved collections
tiatives is to preserve the most relevant parts of the web between 10 and 100 TB, in 2014 these percentages were
from their own perspective. In the survey of 2010, all web 42 and 40 %, respectively. The percentage of initiatives with

123
The evolution of web archiving

Fig. 7 Size of archived 50 50

collections measured in: a
volume of data (terabytes) and b 45 45
number of contents (e.g., 40 40
images, pages, videos)
35 35

% iniaves
% iniaves
30 30
25 25

20 2010 20 2010

2014 15 2014
15
10 10

5 5

0 0

(a) (b)

50
collections larger than 100 TB continues to be 19 %. In accor-
45
dance with this finding, the percentage of initiatives with 40
collections between 100 and 1000 million contents decreased
% of iniaves

35
from 43 to 33 %, mostly because the percentage of initia- 30
tives with collections with more than 1000 million contents 25
2010
increased from 22 to 33 %. 20
15 2014
World wide web archives preserved from 1996 to 2010
10
a total of 181,978 million contents (6.6 PB). The Internet
5
Archive by itself held 150,000 million contents (5.5 PB). In 0
2014, all initiatives had archived together at least 534,604
million contents, which sums around 17 PB of data. This
represents an increase from 2010 to 2014 of 294 % on con-
tents and 258 % on volume of data. The Internet Archive Fig. 8 Usage of file formats to store web content
continue to be by far the web archive with the largest col-
lection with 376,000 million contents. The information of WARC format was published by the International Organiza-
its volume of data was not available in the Wikipedia page. tion for Standardization (ISO) as the official standard format
Hence, we extrapolated from the 2010 results and estimated for archiving web content and it was exclusively used by
13.8 PB of data. 10 % of the initiatives in 2010 [49]. The ARC and WARC
The selection policies of some initiatives intersect, which formats were dominant in 2010, being used by 54 % of the
leads to a replication of archived content [47,48]. For initiatives.
instance, initiatives hosted in the same country may preserve There was a decrease, from 26 % in 2010 to 13 % in 2014,
some of the same sites. Initiatives with a broader scope, such of initiatives using exclusively the ARC format. These initia-
as the Internet Archive, preserve some content that are also tives likely changed to the WARC format that increased 3 %
archived by national initiatives. The overlap of archived con- points and the ARC/WARC formats that also increased 3 %
tent is not contemplated in this paper. points. The ARC and WARC formats continue to be by far
the most predominant, being used in 2014 by 47 % of web
4.3 Access and technologies archiving initiatives against the 54 % in 2010. Besides his-
torical reasons, the widespread of the ARC/WARC formats
4.3.1 Formats to store archived content was motivated by the Archive-Access project, which freely
provides open-source tools to process this type of files [50].
Figure 8 presents the evolution of file formats used to store There are only 10 % of initiatives using other file formats in
archived content. The ARC format defined by the Internet 2014, such as the HTTrack format. Still, 43 % of the initia-
Archive was the de facto standard in 2010 [44]. In 2009, the tives did not report the adopted format in the Wikipedia page.

123
M. Costa et al.

100 lic online access to part of their collections. Netarkivet.dk

90 of Denmark provided online access on-demand only for
80 research purposes. The Finnish Web Archive provided online
70 access to meta-data, but not to archived contents. The Biblio-
% of iniaves

60 thèque nationale de France (BnF), Web@rchive of Austria

50 2010
and Preservation .ES of Spain, granted access exclusively
40 2014
through special rooms on their facilities.
30 The IMF2010 survey found that 50 % of the European ini-
20 tiatives performed web archiving protected by a law enacted
10 or passed. Regarding the policy for accessing archived data,
0 41 % of the initiatives provided access for everyone, 28 %
URL Meta-data Full-text
online access with restrictions, 18 % on-site access for any-
Fig. 9 Search methods provided by web archives one, 21 % on-site access with restrictions and 21 % did not
provide any access of their contents. The NDSA2013 sur-
vey indicates that when proving public access to archived
4.3.2 Search methods web content, 63 % of the USA initiatives neither notified
nor sought permission from content owners, 15 % notified
Figure 9 presents the search methods provided by the initia- content owners, and 21 % sought permission.
tives over their collections in 2010 and 2014. The obtained The information available on the Wikipedia page about
results of 2010 showed that 89 % of the initiatives support the access restrictions is not sufficient for a statistical analy-
search over multiple versions of a given URL published over sis. Still, some initiatives recorded their restrictions. The
time, 79 % enable searching through meta-data and 67 % WebArchiv of Czech Republic provides unlimited access
provide full-text search over archived contents. These results only from public terminals in the National Library. The Chi-
differ from the IMF2010 survey, which reported 68, 65 and nese Web Archive and the Web@rchive of Austria provide
70 % of European initiatives supporting URL, meta-data and access to content in their National Libraries. The Finnish
full-text search, respectively. The percentage of European Web Archive also provides on-site access to contents. For
web archives offering URL and meta-data search are sig- the Netarkivet.dk of Denmark, the online access is granted
nificantly lower, but slightly higher in full-text search. The only to researchers and the BnF Web Legal Deposit of France
NDSA surveys show similar results in 2011 and 2013. The grants access only to authorized users.
URL search and full-text search are also the most provided
search methods. The NDSA surveys reported other methods
4.3.4 Technology
frequently used, such as browsing by URL and title.
Our results of 2014 are almost the same as in 2010, with
Figure 10 depicts the technologies being used by the
a small relative decrease in all search methods. The most
initiatives that manage their own systems. In 2010, the
predominant is the search by URL, then the search by meta-
Archive-Access tools were dominant (62 %), including
data and last, by full-text search. There were two initiatives
the Heritrix, NutchWAX and Wayback Machine projects
that provided full text, but only to a part of their collections
that support content harvesting, full-text and URL search,
(one 30 % and the other 15 %).

4.3.3 Access restrictions 70

60
In 2010, some initiatives held the copyright of the archived 50
% of iniaves

contents (e.g., German Bundestag, Canada WA) or explicitly 40

required the consent of the authors before archiving (e.g., UK
30
WA, OASIS of Korea). The Tasmanian WA operated since 2010
its inception under the assumption that web sites fall within 20 2014
the definition of books. Thus, no permission to capture from 10
publishers was required. The Internet Archive and the Por- 0
tuguese Web Archive proactively archive and provide access
to contents, but remove access on-demand. On the other
hand, for 16 initiatives (38 %) the access to collections was
somehow restricted. The Library of Congress, WebArchiv of
Czech Republic and Australia Web Archive provided pub- Fig. 10 Technologies used by web archives

123
The evolution of web archiving

respectively. However, respondents frequently mentioned small teams that mainly work on the acquisition and cura-
that full-text search was hard to implement and that the tion of data. Almost all initiatives exclusively hold content
performance of NutchWAX was unsatisfactory, being one related to their hosting country, region or institution, which
reason for the partial indexing of their collections. Nonethe- stresses the need for each country to finance at least one ini-
less, in 2010, NutchWAX supported full-text search for the tiative at national level.
Finnish Web Archive (148 million), Canada Web Archive Web archiving initiatives have been in existence since
(170 million), Digital Heritage of Catalonia (200 million), 1996 and their number has been growing since then. Par-
California Digital Library (216 million) and BnF (15 % ticularly, from 2010 to 2014 there was a large increase in the
of a collection of 200TB). The IMF2010 survey shows number of initiatives, hosting countries, number of contents
that the European initiatives used similar tools. They used and volume of archived data. Currently, web archiving ini-
Heritrix to crawl web content (80 %), and for search, tiatives hold 17 PB (534,604 million contents), which shows
they used the Wayback Machine (67.5 %) or NutchWAX a growing awareness of the importance of web archiving all
(70 %). over the world and a continued effort of the community in
Despite the increase from 3 in 2010 to 11 in 2014 of mitigating the web ephemerality problem.
web archive services (WAS), the number of initiatives that On the other hand, despite the social and economic impact
used WAS increased just 3 % points, from 16 to 19 %. The of losing the information that is being exclusively pub-
Archive-It is the service most used, summing a total of seven lished on the web, the obtained results show that the human
initiatives. There was an increase from 9 to 19 % of initia- resources invested in web archiving are still scarce and the
tives doing some in-house development. This software was size of teams are even decreasing. The lack of resources will
mostly developed by WAS, such as the Hanzo Archives’ probably originate a historical void in the future about our
access tools, or curation tools developed by libraries, such current time. Our results already show that only a small part
as the DigiBoard of the Library of Congress Web Archives. of the web has been preserved.
These increases contributed to the decrease of the use of The web archiving community is adopting common data
Archive-Access tools. Still, the Archive-Access tools con- formats and tools. The ARC and WARC are the predominant
tinue to predominate, with 57 % of the initiatives using at data formats to store archived content, but in the last years
least one of its tools in 2014, against the 62 % in 2010. there was a shifting from ARC to WARC likely to take advan-
Lucene and Solr together continue to be used by 10 % of tage of the new format enhancements, which enables, for
the initiatives with a growing trend toward Solr. instance, to manage duplicated content and record contextual
The NDSA surveys show different results, where the USA meta-data. Regarding technology, most initiatives continue
initiatives contracted much more WAS. There were 60 % of to use Lucene-based solutions to support full-text search,
initiatives in 2011 and 63 % in 2013 that exclusively used such as NutchWAX or Solr, the Wayback Machine to sup-
WAS. Archive-It is the dominant external service used by port URL search and display archived content, and Heritrix to
approximately 70 % of the initiatives and the California crawl web content. This continuity could be explained by the
Digital Library WAS is the second most used with 17 %. significant number of developers and web archive initiatives
Regarding technology to capture web content, Heritrix is that contribute to enhance these projects.
the most used tool by USA initiatives (29 %), followed by The predominant methods for discovering archived con-
HTTrack (18 %). The Wayback Machine increased from tent have remained the URL, meta-data and full-text search.
76 % in 2011 to 89 % in 2013 as the preferred tool to view However, the respondents of the surveys mentioned that the
contents. existing technology provides unsatisfactory search results
and full text, which is the preferred method by the users, is
hard to implement. Moreover, recent studies show that these
5 Conclusion technologies provide poor search results, making difficult for
users to find the desired information. With the fast growth
Web archiving has been gaining interest and recognition from of archived data, this problem is only exacerbated. Hence,
modern societies around the world. Still, there is a lack of the development of efficient and effective search technology
knowledge in the research community about the most recent is urgent to access the massive data already stored in web
developments in web archiving and the existing initiatives. archives.
This paper provides an updated global overview on these
issues and discusses evolution trends. Acknowledgments This work could not have been done without the
support of the Portuguese Web Archive team. We also thank FCT
Based on two conducted surveys, we observed that web for the financial support of the Research Units of LaSIGE (PEst-
archiving initiatives are typically hosted by developed coun- OE/EEI/UI0408/2014) and INESC-ID (UID/CEC/50021/2013), and
tries, but we can find them spread all over the world in almost the DataStorm Research Line of Excellency (EXCL/EEI-ESS/0257/
every continent. Web archives are generally composed of 2012).

123
M. Costa et al.

References 21. Grotke, A.: IIPC—2008 member profile survey results. Techni-
cal report, International Internet Preservation Consortium (IIPC)
1. Ntoulas, A., Cho, J., Olston, C.: What’s new on the web? The (2008)
evolution of the web from a search engine perspective. In: Proc. of 22. Klein, M., Van de Sompel, H., Sanderson, R., Shankar, H., Bal-
the 13th International Conference on World Wide Web, pp. 1–12 akireva, L., Zhou, K., Tobin, R.: Scholarly context not found: one
(2004) in five articles suffers from reference rot. PloS One 9(12), 1–39
2. Dellavalle, R., Hester, E., Heilig, L., Drake, A., Kuntzman, J., (2014)
Graber, M., Schilling, L.: Going, going, gone: lost internet ref- 23. Lazun, M.J.: “Link Rot” and legal resources on the web: a 2013
erences. Science 302(5646), 787–788 (2003) analysis by the chesapeake digital preservation group. Technical
3. SalahEldeen, H., Nelson, M.: Losing my revolution: how many Report, The Chesapeake Digital Preservation Group (2013)
resources shared on social media have been lost? In: Theory and 24. Tofel, B.: ‘Wayback’ for accessing web archives. In: Proc. of the
Practice of Digital Libraries, pp. 125–137 (2012) 7th International Web Archiving Workshop (2007)
4. UNESCO: Charter on the preservation of digital heritage. 25. Jaffe, E., Kirkpatrick, S.: Architecture of the Internet Archive. In:
In: Adopted at the 32nd Session of the General Conference Proc. of SYSTOR 2009: The Israeli Experimental Systems Con-
of UNESCO (2003). http://portal.unesco.org/ci/en/files/13367/ ference, pp. 1–10 (2009)
10700115911Charter_en.pdf/Charter_en.pdf. Accessed 17 Oct 26. Internet Memory Foundation: Web archiving in Europe. Technical
2003 Report, Internet Memory Foundation (2010)
5. UNESCO: Universal declaration on archives. In: Adopted at the 27. Niu, J.: Functionalities of web archives. D-Lib Mag. 18(3/4) (2012)
ICA Annual General Meeting in Malta (2010). http://www.ica.org/ 28. Ras, M., van Bussel, S.: Web archiving user survey. Technical
6573/reference-documents/universal-declaration-on-archives. Report, National Library of the Netherlands (Koninklijke Biblio-
html. Accessed 17 Sept 2010 theek) (2007)
6. Kitsuregawa, M., Tamura, T., Toyoda, M., Kaji, N.: Socio-sense: a 29. Costa, M., Silva, M.J.: Characterizing search behavior in web
system for analysing the societal behavior from long term web archives. In: Proc. of the 1st International Temporal Web Analytics
archive. In: Proc. of the 10th Asia-Pacific Web Conference on Workshop, pp. 33–40 (2011)
Progress in WWW Research and Development, pp. 1–8 (2008) 30. Costa, M., Silva, M.J.: Evaluating web archive search systems. In:
7. Arms, W.Y., Aya, S., Dmitriev, P., Kot, B., Mitchell, R., Walle, L.: Proc. of the 13th International Conference on Web Information
A research library based on the historical collections of the Internet Systems Engineering, pp. 440–454 (2012)
Archive. D-Lib Mag. 12(2) (2006) 31. Thomas, A., Meyer, E.T., Dougherty, M., Van den Heuvel, C.,
8. Arms, W., Huttenlocher, D., Kleinberg, J., Macy, M., Strang, D.: Madsen, C., Wyatt, S.: Researcher engagement with web archives:
From Wayback Machine to Yesternet: new opportunities for social challenges and opportunities for investment. Technical Report,
science. In: Proc. of the 2nd International Conference on e-Social Joint Information Systems Committee (JISC) (2010)
Science (2006) 32. Spaniol, M., Masanès, J., Baeza-Yates, R.: The 5th temporal web
9. Ackland, R.: Virtual observatory for the study of online networks analytics workshop (tempweb’15). In: Proc. of the Companion
(VOSON)—progress and plans. In: Proc. of the 1st International Publication of the 24th International Conference on World Wide
Conference on e-Social Science (2005) Web, pp. 863–864 (2015)
10. Foot, K., Schneider, S.: Web Campaigning. The MIT Press, Cam- 33. Spaniol, M., Masanès, J., Baeza-Yates, R.: The 4th temporal web
bridge (2006) analytics workshop (tempweb’14). In: Proc. of the Companion
11. Franklin, M.: Postcolonial Politics, the Internet, and Everyday Life: Publication of the 23rd International Conference on World Wide
Pacific Traversals Online. Routledge (2004) Web, pp. 863–864 (2014)
12. Gomes, D., Costa, M.: The importance of web archives for human- 34. Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the
ities. Int. J. Humanit. Arts Comput. 8(1), 106–123 (2014) dynamics of the news cycle. In: Proc. of the 15th ACM SIGKDD
13. Yamamoto, Y., Tezuka, T., Jatowt, A., Tanaka, K.: Honto? Search: International Conference on Knowledge Discovery and Data Min-
estimating trustworthiness of web information by search results ing, pp. 497–506 (2009)
aggregation and temporal analysis. In: Advances in Data and Web 35. Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: a
Management, pp. 253–264 (2007) spatially and temporally enhanced knowledge base from wikipedia.
14. Chung, Y., Toyoda, M., Kitsuregawa, M.: A study of link farm Artif. Intell. 194, 28–61 (2013)
distribution and evolution using a time series of web snapshots. In: 36. Matthews, M., Tolchinsky, P., Blanco, R., Atserias, J., Mika, P.,
Proc. of the 5th International Workshop on Adversarial Information Zaragoza, H.: Searching through time in the New York Times. In:
Retrieval on the Web, pp. 9–16 (2009) Proc. of the 4th Workshop on Human–Computer Interaction and
15. Elsas, J., Dumais, S.: Leveraging temporal dynamics of document Information Retrieval, pp. 41–44 (2010)
content in relevance ranking. In: Proc. of the 3rd ACM International 37. Adar, E., Dontcheva, M., Fogarty, J., Weld, D.S.: Zoetrope: inter-
Conference on Web Search and Data Mining, pp. 1–10 (2010) acting with the ephemeral web. In: Proc. of the 21st Annual ACM
16. Radinsky, K., Horvitz, E.: Mining the web to predict future events. Symposium on User Interface Software and Technology, pp. 239–
In: Proc. of the 6th ACM International Conference on Web Search 248 (2008)
and Data Mining, pp. 255–264 (2013) 38. Teevan, J., Dumais, S., Liebling, D., Hughes, R.: Changing how
17. Gomes, D., Miranda, J., Costa, M.: A survey on web archiving people view changes on the web. In: Proc. of the 22nd Annual
initiatives. In: Proc. of the International Conference on Theory and ACM Symposium on User Interface Software and Technology, pp.
Practice of Digital Libraries, pp. 408–420 (2011) 237–246 (2009)
18. Costa, M., Couto, F.M., Silva, M.J.: Learning temporal-dependent 39. Masanès, J.: LiWA news #3: living web archives (2011). http://
ranking models. In: Proc. of the 37th Annual ACM SIGIR Confer- liwa-project.eu/images/videos/Liwa_Newsletter-3.pdf. Accessed
ence (2014) March 2011
19. Masanès, J.: Web Archiving. Springer, New York (2006) 40. Weikum, G., Ntarmos, N., Spaniol, M., Triantafillou, P., Benczur,
20. Kahle, B.: Wayback machine: now with 240,000,000,000 (2013). A.A., Kirkpatrick, S., Rigaux, P., Williamson, M.: Longitudinal
http://blog.archive.org/2013/01/09/updated-wayback/. Accessed analytics on web archive data: it’s about time! In: Proc. of the 5th
30 Apr 2016 Conference on Innovative Data Systems Research, pp. 199–202
(2011)

123
The evolution of web archiving

41. Huurdeman, H.C., Ben-David, A., Sammar, T.: Sprint methods for 47. Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nel-
web archive research. In: Proc. of the 5th Annual ACM Web Sci- son, M.L.: How much of the web is archived? In: Proc. of the
ence Conference, pp. 182–190 (2013) 11th Annual International ACM/IEEE joint Conference on Digital
42. Risse, T., Peters, W.: ARCOMEM: from collect-all ARchives to Libraries, pp. 133–136 (2011)
COmmunity MEMories. In: Proc. of the 21st International Confer- 48. AlSum, A., Weigle, M.C., Nelson, M.L., Van de Sompel, H.:
ence Companion on World Wide Web, pp. 275–278 (2012) Profiling web archive coverage for top-level domain and content
43. Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., language. Int. J. Digit. Libr. 14(3–4), 149–166 (2014)
Ainsworth, S., Shankar, H.: Memento: time travel for the web. 49. ISO 28500:2009: Information and documentation—WARC
CoRR (2009). arXiv:0911.1112 file format (2009). http://www.iso.org/iso/catalogue_detail.htm?
44. Burner, M., Kahle, B.: Arc file format (1996). http://www.archive. csnumber=44717. Accessed 30 Apr 2016
org/web/researcher/ArcFileFormat.php. Accessed Sept 1996 50. IIPC: Internet Archive ARC access tools (2009). http://
45. NDSA Content Working Group: Web archiving survey report. archive-access.sourceforge.net/. Accessed 30 Apr 2016
Technical Report, National Digital Stewardship Alliance (2012)
46. Bailey, J., Grotke, A., Hanna, K., Hartman, C., McCain, E., Moffatt,
C., Taylor, N.: Web archiving in the United States: a 2013 survey.
Technical Report, National Digital Stewardship Alliance (2014)

123

Literature Review of User Needs: Descriptive Metadata For Web Archiving
No ratings yet
Literature Review of User Needs: Descriptive Metadata For Web Archiving
52 pages
The Metainterface Part 1 PDF
No ratings yet
The Metainterface Part 1 PDF
43 pages
Digital Labor Scholz
No ratings yet
Digital Labor Scholz
74 pages
Marxist Analysis of Digital Labor
100% (1)
Marxist Analysis of Digital Labor
11 pages
Alle Artikelen Van Negroponte
No ratings yet
Alle Artikelen Van Negroponte
200 pages
Making Media Production Practices and PR PDF
No ratings yet
Making Media Production Practices and PR PDF
28 pages
Hacktivism & Net Politics Unveiled
No ratings yet
Hacktivism & Net Politics Unveiled
12 pages
POSTINTERNET Art After The Internet
No ratings yet
POSTINTERNET Art After The Internet
5 pages
Safiya Umoja Noble. Algorithms of Opression. Cap. 4
No ratings yet
Safiya Umoja Noble. Algorithms of Opression. Cap. 4
16 pages
Kucklich - Precarious Playbour
No ratings yet
Kucklich - Precarious Playbour
13 pages
Queering Cyberspace Towards A Space Identity Discussion
No ratings yet
Queering Cyberspace Towards A Space Identity Discussion
9 pages
Hall, Stuart. Introduction To Paper Voices
No ratings yet
Hall, Stuart. Introduction To Paper Voices
260 pages
The Communication Review: Click For Updates
100% (1)
The Communication Review: Click For Updates
11 pages
IMedia The Gendering of Objects, Environments and Smart Materials (Sarah Kember (Auth.) ) (Z-Lib.
No ratings yet
IMedia The Gendering of Objects, Environments and Smart Materials (Sarah Kember (Auth.) ) (Z-Lib.
129 pages
Academic Insights on Archives
100% (1)
Academic Insights on Archives
34 pages
Lovink Rossiter Organizationaftersocialmedia-Web PDF
No ratings yet
Lovink Rossiter Organizationaftersocialmedia-Web PDF
185 pages
Wark McKenzie Telesthesia Communication Culture and Class 2012
No ratings yet
Wark McKenzie Telesthesia Communication Culture and Class 2012
236 pages
The YouTube Bibliography Ver 4-0 - Michael Strange Love
100% (1)
The YouTube Bibliography Ver 4-0 - Michael Strange Love
63 pages
Papacharissi-Virtual Sphere 2.0
No ratings yet
Papacharissi-Virtual Sphere 2.0
16 pages
In Search of Space in The Digital City
100% (1)
In Search of Space in The Digital City
3 pages
Bermingham - Landscape and Ideology - 16.50.13
No ratings yet
Bermingham - Landscape and Ideology - 16.50.13
281 pages
Marvin Carolyn When Old Technologies Were New Thinking About Electric Communication in The Late Nineteenth Century 1988
No ratings yet
Marvin Carolyn When Old Technologies Were New Thinking About Electric Communication in The Late Nineteenth Century 1988
294 pages
Digital Prosumption Labour On Social Media in The Context of The Capitalist Regime of Time
No ratings yet
Digital Prosumption Labour On Social Media in The Context of The Capitalist Regime of Time
27 pages
RICHARD FLORIDA - Transformation of Everyday Life, From Rise of The Creative Class
No ratings yet
RICHARD FLORIDA - Transformation of Everyday Life, From Rise of The Creative Class
3 pages
MIS 3302 Exam 1 Review and Study Guide
100% (1)
MIS 3302 Exam 1 Review and Study Guide
19 pages
Resurrecting The Technological Past: Towards A New Media History
No ratings yet
Resurrecting The Technological Past: Towards A New Media History
6 pages
02 - Digital Storytelling 2022-23 - The Narrative Turn (Christian Salmon)
No ratings yet
02 - Digital Storytelling 2022-23 - The Narrative Turn (Christian Salmon)
52 pages
A Sociological Analysis of "OK Boomer"
No ratings yet
A Sociological Analysis of "OK Boomer"
17 pages
The Digital Dilemma
No ratings yet
The Digital Dilemma
84 pages
Opensource Community Yearbook 2010-2019
No ratings yet
Opensource Community Yearbook 2010-2019
58 pages
Art Platforms and Cultural Production On The Internet 1st Edition Olga Goriunova New Release 2025
No ratings yet
Art Platforms and Cultural Production On The Internet 1st Edition Olga Goriunova New Release 2025
139 pages
Cryptolog 55
No ratings yet
Cryptolog 55
24 pages
Bruno Latour: What Is Iconoclash?
No ratings yet
Bruno Latour: What Is Iconoclash?
3 pages
History, Evolution, and Impact of Digital Libraries
No ratings yet
History, Evolution, and Impact of Digital Libraries
30 pages
ARTE Mark Tribe - New Media Art
No ratings yet
ARTE Mark Tribe - New Media Art
31 pages
Memory in Motion Archives Technology and The Social Ina Blom Instant Download
100% (1)
Memory in Motion Archives Technology and The Social Ina Blom Instant Download
56 pages
Fundamentals of It It Revolution:Milestones: Aayush A 15010224001 Bba LLB
No ratings yet
Fundamentals of It It Revolution:Milestones: Aayush A 15010224001 Bba LLB
5 pages
Guide To Cryptologic Acronyms & Abbreviations, 1940-1980
100% (1)
Guide To Cryptologic Acronyms & Abbreviations, 1940-1980
49 pages
4.christiane Paul Context and Archive
No ratings yet
4.christiane Paul Context and Archive
20 pages
Sexualized Labour in Digital Culture - Instagram Influencers Porn
No ratings yet
Sexualized Labour in Digital Culture - Instagram Influencers Porn
39 pages
Coleman - Annual Review of Anthropology & Digital Media
No ratings yet
Coleman - Annual Review of Anthropology & Digital Media
23 pages
The Experience Economy Evolution
No ratings yet
The Experience Economy Evolution
25 pages
Svitlana Matviyenko The Imaginary App 1
No ratings yet
Svitlana Matviyenko The Imaginary App 1
328 pages
Archives & Artifacts in History
No ratings yet
Archives & Artifacts in History
3 pages
Occupational, Organizational and Institutional Models in Mass Media Research
No ratings yet
Occupational, Organizational and Institutional Models in Mass Media Research
30 pages
Perspectives On The Postdigital: Beyond Rhetorics of Progress and Novelty
No ratings yet
Perspectives On The Postdigital: Beyond Rhetorics of Progress and Novelty
15 pages
Rosa Menkman - Glitch Studies Manifesto
No ratings yet
Rosa Menkman - Glitch Studies Manifesto
13 pages
Larkin. The Politics and Poetics of Infrastructure
No ratings yet
Larkin. The Politics and Poetics of Infrastructure
20 pages
Berardi Franco (Bifo) - Heroes - Asesinato Masivo Y Suicidio
No ratings yet
Berardi Franco (Bifo) - Heroes - Asesinato Masivo Y Suicidio
19 pages
Hobsbawm - La Invención de Las Tradiciones
No ratings yet
Hobsbawm - La Invención de Las Tradiciones
6 pages
PDF SpeculativeRealities PDF
No ratings yet
PDF SpeculativeRealities PDF
55 pages
Blog Theory 1st Edition Jodi Dean Instant Download
No ratings yet
Blog Theory 1st Edition Jodi Dean Instant Download
135 pages
Tacit Narratives The Meaning of Archives-Vol 1-2001
100% (1)
Tacit Narratives The Meaning of Archives-Vol 1-2001
12 pages
Search The Past With The Portuguese Web Archive
No ratings yet
Search The Past With The Portuguese Web Archive
4 pages
2021 - A Look Into The Future - Spinger Daniel Gomes
No ratings yet
2021 - A Look Into The Future - Spinger Daniel Gomes
2 pages
Vlassenroot És Mtsai. - 2019 - Web Archives As A Data Resource For Digital Schola
No ratings yet
Vlassenroot És Mtsai. - 2019 - Web Archives As A Data Resource For Digital Schola
27 pages
Daniel Gomes (Editor), Elena Demidova (Editor), Jane Winters (Editor), Thomas Risse (Editor) - The Past Web - Exploring Web Archives-Springer (2021)
No ratings yet
Daniel Gomes (Editor), Elena Demidova (Editor), Jane Winters (Editor), Thomas Risse (Editor) - The Past Web - Exploring Web Archives-Springer (2021)
301 pages
Bots, Seeds and People
No ratings yet
Bots, Seeds and People
14 pages
Web Archiving Guidance
No ratings yet
Web Archiving Guidance
15 pages
Dougherty Heuvel
No ratings yet
Dougherty Heuvel
19 pages
Mccloy 1986
No ratings yet
Mccloy 1986
319 pages
João Brilhante Da Silva
No ratings yet
João Brilhante Da Silva
173 pages
Lautenbach, F., & Heyder, A. (2019) - Changing Attitudes To Inclusion in Preservice Teacher Education A Systematic Review. Educational Research, 1-23
No ratings yet
Lautenbach, F., & Heyder, A. (2019) - Changing Attitudes To Inclusion in Preservice Teacher Education A Systematic Review. Educational Research, 1-23
24 pages
Promoting Inclusive Communities in Diverse Classrooms: Teacher Attunement and Social Dynamics Management
No ratings yet
Promoting Inclusive Communities in Diverse Classrooms: Teacher Attunement and Social Dynamics Management
21 pages
Promoting Social Inclusion in Educational Settings: Challenges and Opportunities
No ratings yet
Promoting Social Inclusion in Educational Settings: Challenges and Opportunities
22 pages
Clements 2001
No ratings yet
Clements 2001
8 pages
Enhancing Employees Information Security Awareness in Private and Public Organisations - A Systematic Literature Review
No ratings yet
Enhancing Employees Information Security Awareness in Private and Public Organisations - A Systematic Literature Review
22 pages
Library & Archival Security
No ratings yet
Library & Archival Security
17 pages
Adapting Search User Interfaces To Web Archives: David Cruz Daniel Gomes
No ratings yet
Adapting Search User Interfaces To Web Archives: David Cruz Daniel Gomes
4 pages
Arsi University College of Social Sciences, Humanities and Law Department of Sociology and Social Work
100% (2)
Arsi University College of Social Sciences, Humanities and Law Department of Sociology and Social Work
42 pages
Cat 1 Syllabus and Question Bank Feei
No ratings yet
Cat 1 Syllabus and Question Bank Feei
3 pages
Privacy Coins for Crypto Investors
No ratings yet
Privacy Coins for Crypto Investors
8 pages
Su8000 01 - en GB
No ratings yet
Su8000 01 - en GB
5 pages
Guidelines For GNSS Positioning in The Oil and Gas Industry: February
No ratings yet
Guidelines For GNSS Positioning in The Oil and Gas Industry: February
91 pages
1.3 Linear Equations in 2 Variables
No ratings yet
1.3 Linear Equations in 2 Variables
1 page
En374-4 302020.4.20
No ratings yet
En374-4 302020.4.20
3 pages
BSB Group vs. Go: Theft Case Analysis
No ratings yet
BSB Group vs. Go: Theft Case Analysis
14 pages
Return-Oriented Programming Attacks
No ratings yet
Return-Oriented Programming Attacks
2 pages
Discrete Time System Stability
100% (1)
Discrete Time System Stability
5 pages
Padam Bahadur Bishwokarma Iti
No ratings yet
Padam Bahadur Bishwokarma Iti
2 pages
Talend Questions
No ratings yet
Talend Questions
4 pages
Windows+Server+2016+Essential+Guide
No ratings yet
Windows+Server+2016+Essential+Guide
131 pages
FinAcc 6
No ratings yet
FinAcc 6
24 pages
Instant Noodle Production Steps
No ratings yet
Instant Noodle Production Steps
45 pages
Sps. Lanaria Vs Planta
100% (1)
Sps. Lanaria Vs Planta
2 pages
Persona Non Grata
No ratings yet
Persona Non Grata
27 pages
RET670 Test Report Differential
50% (2)
RET670 Test Report Differential
3 pages
Australian Super PDS
No ratings yet
Australian Super PDS
28 pages
UDYAM
No ratings yet
UDYAM
1 page
Open Pit Optimization
No ratings yet
Open Pit Optimization
6 pages
COPA Training - Account Based & Costing Based
No ratings yet
COPA Training - Account Based & Costing Based
2 pages
Chilean Maritime Law Overview
No ratings yet
Chilean Maritime Law Overview
3 pages
Linkedin: Email Print
No ratings yet
Linkedin: Email Print
2 pages
Wayside Amenities Guidelines
No ratings yet
Wayside Amenities Guidelines
8 pages
Log Sheet: Mec532 Mechanical Engineering Design Ii
No ratings yet
Log Sheet: Mec532 Mechanical Engineering Design Ii
1 page
Human Resource Management: Stephen P. Robbins Mary Coulter
No ratings yet
Human Resource Management: Stephen P. Robbins Mary Coulter
45 pages
Accident Prevention
100% (2)
Accident Prevention
106 pages
01 Production Manual - CM602
No ratings yet
01 Production Manual - CM602
56 pages
Quarterly Percentage Tax Return: (From Schedule 1 Item 7)
No ratings yet
Quarterly Percentage Tax Return: (From Schedule 1 Item 7)
2 pages