The Darknet and The Future of Content Distribution: Peter Biddle, Paul England, Marcus Peinado, and Bryan Willman
The Darknet and The Future of Content Distribution: Peter Biddle, Paul England, Marcus Peinado, and Bryan Willman
Microsoft Corporation1
Abstract
1 Introduction
People have always copied things. In the past, most items of value were physical
objects. Patent law and economies of scale meant that small scale copying of physical
objects was usually uneconomic, and large-scale copying (if it infringed) was stoppable
using policemen and courts. Today, things of value are increasingly less tangible: often
they are just bits and bytes or can be accurately represented as bits and bytes. The
widespread deployment of packet-switched networks and the huge advances in computers
and codec-technologies has made it feasible (and indeed attractive) to deliver such digital
works over the Internet. This presents great opportunities and great challenges. The
opportunity is low-cost delivery of personalized, desirable high-quality content. The
challenge is that such content can be distributed illegally. Copyright law governs the
legality of copying and distribution of such valuable data, but copyright protection is
increasingly strained in a world of programmable computers and high-speed networks.
For example, consider the staggering burst of creativity by authors of computer
programs that are designed to share audio files. This was first popularized by Napster, but
today several popular applications and services offer similar capabilities. CD-writers have
become mainstream, and DVD-writers may well follow suit. Hence, even in the absence
of network connectivity, the opportunity for low-cost, large-scale file sharing exists.
The darknet is the distribution network that emerges from the injection of objects
according to assumption 1 and the distribution of those objects according to assumptions
2 and 3.
One implication of the first assumption is that any content protection system will leak
popular or interesting content into the darknet, because some fraction of users--possibly
experts–will overcome any copy prevention mechanism or because the object will enter
the darknet before copy protection occurs.
The term “widely distributed” is intended to capture the notion of mass market
distribution of objects to thousands or millions of practically anonymous users. This is in
contrast to the protection of military, industrial, or personal secrets, which are typically not
widely distributed and are not the focus of this paper.
Like other networks, the darknet can be modeled as a directed graph with labeled
edges. The graph has one vertex for each user/host. For any pair of vertices (u,v), there is
a directed edge from u to v if objects can be copied from u to v. The edge labels can be
used to model relevant information about the physical network and may include
information such as bandwidth, delay, availability, etc. The vertices are characterized by
their object library, object requests made to other vertices, and object requests satisfied.
To operate effectively, the darknet has a small number of technological and
infrastructure requirements, which are similar to those of legal content distribution
networks. These infrastructure requirements are:
1. facilities for injecting new objects into the darknet (input)
2. a distribution network that carries copies of objects to users (transmission)
3. ubiquitous rendering devices, which allow users to consume objects (output)
4. a search mechanism to enable users to find objects (database)
5. storage that allows the darknet to retain objects for extended periods of time.
Functionally, this is mostly a caching mechanism that reduces the load and
exposure of nodes that inject objects.
The dramatic rise in the efficiency of the darknet can be traced back to the general
technological improvements in these infrastructure areas. At the same time, most attempts
to fight the darknet can be viewed as efforts to deprive it of one or more of the
infrastructure items. Legal action has traditionally targeted search engines and, to a lesser
extent, the distribution network. As we will describe later in the paper, this has been
partially successful. The drive for legislation on mandatory watermarking aims to deprive
the darknet of rendering devices. We will argue that watermarking approaches are
technically flawed and unlikely to have any material impact on the darknet. Finally, most
content protection systems are meant to prevent or delay the injection of new objects into
the darknet. Based on our first assumption, no such system constitutes an impenetrable
barrier, and we will discuss the merits of some popular systems.
We see no technical impediments to the darknet becoming increasingly efficient
(measured by aggregate library size and available bandwidth). However, the darknet, in
all its transport-layer embodiments, is under legal attack. In this paper, we speculate on
the technical and legal future of the darknet, concentrating particularly, but not exclusively,
on peer-to-peer networks.
The rest of this paper is structured as follows. Section 2 analyzes different
manifestations of the darknet with respect to their robustness to attacks on the
infrastructure requirements described above and speculates on the future development of
the darknet. Section 3 describes content protection mechanisms, their probable effect on
the darknet, and the impact of the darknet upon them. In sections 4 and 5, we speculate
on the scenarios in which the darknet will be effective, and how businesses may need to
behave to compete effectively with it.
Http
Http
FTP
PC
Web
PC Search FTP
Engine http
Search
PC
PC
PC
(c) - Napster
PC PC
PC
(d) - Gnutella
PC
PC
PC
PC
PC
Napster TCP / PC
Search gnutella
Engine Napster Search
Protocol Engine
TCP/UDP
PC Gnutella
Protocol PC
PC
PC
PC PC
PC PC
PC
PC
Figure 1: Historical evolution of the Darknet. We highlight the location of the search engine (if present) and the effective bandwidth (thicker lines represent
higher bandwidth). Network latencies are not shown, but are much longer for the sneakernet than for the IP-based networks.
2.3.1. Napster
Napster was the service that ignited peer-to-peer file sharing in 1999 [14]. There
should be little doubt that a major portion of the massive (for the time) traffic on Napster
was of copyrighted objects being transferred in a peer-to-peer model in violation of
copyright law. Napster succeeded where central servers had failed by relying on the
distributed storage of objects not under the control of Napster. This moved the injection,
storage, network distribution, and consumption of objects to users.
However, Napster retained a centralized database2 with a searchable index on the file
name. The centralized database itself became a legal target [15]. Napster was first
enjoined to deny certain queries (e.g. “Metallica”) and then to police its network for all
copyrighted content. As the size of the darknet indexed by Napster shrank, so did the
number of users. This illustrates a general characteristic of darknets: there is positive
feedback between the size of the object library and aggregate bandwidth and the appeal
of the network for its users.
2
Napster used a farm of weakly coupled databases with clients attaching to just one of the
server hosts.
2.3.2. Gnutella
The next technology that sparked public interest in peer-to-peer file sharing was
Gnutella. In addition to distributed object storage, Gnutella uses a fully distributed
database described more fully in [13]. Gnutella does not rely upon any centralized server
or service – a peer just needs the IP address of one or a few participating peers to (in
principle) reach any host on the Gnutella darknet. Second, Gnutella is not really “run” by
anyone: it is an open protocol and anyone can write a Gnutella client application. Finally,
Gnutella and its descendants go beyond sharing audio and have substantial non-infringing
uses. This changes its legal standing markedly and puts it in a similar category to email.
That is, email has substantial non-infringing use, and so email itself is not under legal
threat even though it may be used to transfer copyrighted material unlawfully.
2.4.3 Attacks
In light of these weaknesses, attacks on gnutella-style darknets focus on their object
storage and search infrastructures. Because of the prevalence of super-peers, the gnutella
darknet depends on a relatively small set of powerful hosts, and these hosts are promising
targets for attackers.
Darknet hosts owned by corporations are typically easily removed. Often, these hosts
are set up by individual employees without the knowledge of corporate management.
Generally corporations respect intellectual property laws. This together with their
reluctance to become targets of lawsuits, and their centralized network of hierarchical
management makes it relatively easy to remove darknet hosts in the corporate domain.
While the structures at universities are typically less hierarchical and strict than those
of corporations, ultimately, similar rules apply. If the .com and .edu T1 and T3 lines were
pulled from under a darknet, the usefulness of the network would suffer drastically.
This would leave DSL, ISDN, and cable-modem users as the high-bandwidth servers
of objects. We believe limiting hosts to this class would present a far less effective piracy
network today from the perspective of acquisition because of the relative rarity of high-
bandwidth consumer connections, and hence users would abandon this darknet.
However, consumer broadband is becoming more popular, so in the long run it is probable
that there will be adequate consumer bandwidth to support an effective consumer darknet.
The obvious next legal escalation is to bring direct or indirect (through the affiliation)
challenges against users who share large libraries of copyrighted material. This is already
happening and the legal threats or actions appear to be successful [7]. This requires the
collaboration of ISPs in identifying their customers, which appears to be forthcoming due
to requirements that the carrier must take to avoid liability 3 and, in some cases, because of
corporate ties between ISPs and content providers. Once again, free riding makes this
attack strategy far more tractable.
It is hard to predict further legal escalation, but we note that the DMCA (digital
millennium copyright act) is a far-reaching (although not fully tested) example of a law that
is potentially quite powerful. We believe it probable that there will be a few more rounds of
technical innovations to sidestep existing laws, followed by new laws, or new
interpretations of old laws, in the next few years.
3
The Church of Scientology has been aggressive in pursuing ISPs that host its copyright
material on newsgroups. The suit that appeared most likely to result in a clear finding, filed
against Netcom, was settled out of court. Hence it is still not clear whether an ISP has a
responsibility to police the users of its network.
University / school: PC
Server
PC
.EDU ISP
Server
PC Gnutella /
Internet .COM PC
.EDU
Corporations:
ISP
Server Police their intranets
PC
Rendering Devices (PCs): Access Network Provider:
Mandated watermark detection Surveillance / IP-address - user mapping
Figure 2: Policing the darknet. Gnutella-style networks appear hard to police because they are highly distributed, and there are
thousands or millions of peers. Looking more closely there are several potential vulnerabilities.
2.4.4 Conclusions
All attacks we have identified exploit the lack of endpoint anonymity and are aided by
the effects of free riding. We have seen effective legal measures on all peer-to-peer
technologies that are used to provide effectively global access to copyrighted material.
Centralized web servers were effectively closed down. Napster was effectively closed
down. Gnutella and Kazaa are under threat because of free rider weaknesses and lack of
endpoint anonymity.
Lack of endpoint anonymity is a direct result of the globally accessible global object
database, and it is the existence of the global database that most distinguishes the newer
darknets from the earlier small worlds. At this point, it is hard to judge whether the darknet
will be able to retain this global database in the long term, but it seems seems clear that
legal setbacks to global-index peer-to-peer will continue to be severe.
However, should Gnutella-style systems become unviable as darknets, systems, such
as Freenet or Mnemosyne might take their place. Peer-to-peer networking and file sharing
does seem to be entering into the mainstream – both for illegal and legal uses. If we
couple this with the rapid build-out of consumer broadband, the dropping price of storage,
and the fact that personal computers are effectively establishing themselves as centers of
home-entertainment, we suspect that peer-to-peer functionality will remain popular and
become more widespread.
Figure 3: Interconnected small-worlds darknets. Threats of surveillance close global darknets. Darknets form around social groups, but use high-
bandwidth, low latency communications (the internet) and are supported by search engines. Custom applications, Instant-Messenger-style applications
or simple shared file-systems host the darknet. People’s social groups overlap so objects available in one darknet diffuse to others: in the terminology
used in this paper, each peer that is a member of more than one darknet is an introduction host for objects obtained from other darknets.
Questions of this type have been studied in different contexts in a variety of fields
(mathematics, computer science, economics, and physics). A number of empirical studies
seek to establish structural properties of different types of small world networks, such as
social networks [20] and the world-wide web [3]. These works conclude that the diameter
of the examined networks is small, and observe further structural properties, such as a
power law of the degree distribution [5], A number of authors seek to model these
networks by means of random graphs, in order to perform more detailed mathematical
analysis on the models [2],[8],[21],[22] and, in particular, study the possibility of efficient
search under different random graph distributions [18],[19]. We will present a quantitative
study of the structure and dynamics of small-worlds networks in an upcoming paper, but to
summarize, small-worlds darknets can be extremely efficient for popular titles: very few
peers are needed to satisfy requests for top-20 books, songs, movies or computer
programs. If darknets are interconnected, we expect the effective introduction rate to be
large. Finally, if darknet clients are enhanced to actively seek out new popular content, as
opposed to the user-demand based schemes of today, small-worlds darknets will be very
efficient.
3.3 Software
The DRM-systems described above can be used to provide protection for software, in
addition other objects (e.g. audio and video). Alternatively, copy protection systems for
computer programs may embed the copy protection code in the software itself.
The most important copy-protection primitive for computer programs is for the
software to be bound to a host in such a way that the program will not work on an
unlicensed machine. Binding requires a machine ID: this can be a unique number on a
machine (e.g. a network card MAC address), or can be provided by an external dongle.
For such schemes to be strong, two things must be true. First, the machine ID must
not be “virtualizable.” For instance, if it is trivial to modify a NIC driver to return an invalid
MAC address, then the software-host binding is easily broken. Second, the code that
performs the binding checks must not be easy to patch. A variety of technologies that
revolve around software tamper-resistance can help here [4].
We believe that binding software to a host is a more tractable problem than protecting
passive content, as the former only requires tamper resistance, while the latter also
requires the ability to hide and manage secrets. However, we observe that all software
copy-protection systems deployed thus far have bee broken. The definitions of BOBE-
strong and BOBE-weak apply similarly to software. Furthermore, software is as much
subject to the dynamics of the darknet as passive content.
4 Policing Hosts
If there are subverted hosts, then content will leak into the darknet. If the darknet is
efficient, then content will be rapidly propagated to all interested peers. In the light of this,
technologists are looking for alternative protection schemes. In this section we will
evaluate watermarking and fingerprinting technologies.
4.1 Watermarking
Watermarking embeds an “indelible” invisible mark in content. A plethora of schemes
exist for audio/video and still image content and computer programs.
There are a variety of schemes for exploiting watermarks for content-protection.
Consider a rendering device that locates and interprets watermarks. If a watermark is
found then special action is taken. Two common actions are:
1) Restrict behavior: For example, a bus-adapter may refuse to pass content that
has the “copy once” and “already copied once” bits set.
2) Require a license to play: For example, if a watermark is found indicating that
content is rights-restricted then the renderer may demand a license indicating
that the user is authorized to play the content.
Such systems were proposed for audio content – for example the secure digital music
initiative (SDMI) [16], and are under consideration for video by the copy-protection
technical working group (CPTWG) [12].
There are several reasons why it appears unlikely that such systems will ever become
an effective anti-piracy technology. From a commercial point of view, building a
watermark detector into a device renders it strictly less useful for consumers than a
competing product that does not. This argues that watermarking schemes are unlikely to
be widely deployed, unless mandated by legislation. The recently proposed Hollings bill is
a step along these lines [11].
We contrast watermark-based policing with classical DRM: If a general-purpose
device is equipped with a classical DRM-system, it can play all content acquired from the
darknet, and have access to new content acquired through the DRM-channel. This is in
stark distinction to reduction of functionality inherent in watermark-based policing.
Even if watermarking systems were mandated, this approach is likely to fail due to a
variety of technical inadequacies. The first inadequacy concerns the robustness of the
embedding layer. We are not aware of systems for which simple data transformations
cannot strip the mark or make it unreadable. Marks can be made more robust, but in
order to recover marks after adversarial manipulation, the reader must typically search a
large phase space, and this quickly becomes untenable. In spite of the proliferation of
proposed watermarking schemes, it remains doubtful whether robust embedding layers for
the relevant content types can be found.
A second inadequacy lies in unrealistic assumptions about key management. Most
watermarking schemes require widely deployed cryptographic keys. Standard
watermarking schemes are based on the normal cryptographic principles of a public
algorithm and secret keys. Most schemes use a shared-key between marker and detector.
In practice, this means that all detectors need a private key, and, typically, share a single
private key. It would be naïve to assume that these keys will remain secret for long in an
adversarial environment. Once the key or keys are compromised, the darknet will
propagate them efficiently, and the scheme collapses. There have been proposals for
public-key watermarking systems. However, so far, this work does not seem practical and
the corresponding schemes do not even begin to approach the robustness of the
cryptographic systems whose name they borrow.
A final consideration bears on the location of mandatory watermark detectors in client
devices. On open computing devices (e.g. personal computers), these detectors could, in
principle, be placed in software or in hardware. Placing detectors in software would be
largely meaningless, as circumvention of the detector would be as simple as replacing it
by a different piece of software. This includes detectors placed in the operating system, all
of whose components can be easily replaced, modified and propagated over the darknet.
Alternatively, the detectors could be placed in hardware (e.g. audio and video cards).
In the presence of the problems described this would lead to untenable renewability
problems --- the hardware would be ineffective within days of deployment. Consumers, on
the other hand, expect the hardware to remain in use for many years. Finally, consumers
themselves are likely to rebel against “footing the bill” for these ineffective content
protection systems. It is virtually certain, that the darknet would be filled with a continuous
supply of watermark removal tools, based on compromised keys and weaknesses in the
embedding layer. Attempts to force the public to “update” their hardware would not only be
intrusive, but impractical.
In summary, attempts to mandate content protection systems based on watermark
detection at the consumer’s machine suffer from commercial drawbacks and severe
technical deficiencies. These schemes, which aim to provide content protection beyond
DRM by attacking the darknet, are rendered entirely ineffective by the presence of even a
moderately functional darknet.
4.2 Fingerprinting
Fingerprint schemes are based on similar technologies and concepts to watermarking
schemes. However, whereas watermarking is designed to perform a-priori policing,
fingerprinting is designed to provide a-posteriori forensics.
In the simplest case, fingerprinting is used for individual-sale content (as opposed to
super-distribution or broadcast – although it can be applied there with some additional
assumptions). When a client purchases an object, the supplier marks it with an
individualized mark that identifies the purchaser. The purchaser is free to use the content,
but if it appears on a darknet, a policeman can identify the source of the content and the
offender can be prosecuted.
Fingerprinting suffers from fewer technical problems than watermarking. The main
advantage is that no widespread key-distribution is needed – a publisher can use
whatever secret or proprietary fingerprinting technology they choose, and is entirely
responsible for the management of their own keys.
Fingerprinting has one problem that is not found in watermarking. Since each
fingerprinted copy of a piece of media is different, if a user can obtain several different
copies, he can launch collusion attacks (e.g. averaging). In general, such attacks are very
damaging to the fingerprint payload.
It remains to be seen whether fingerprinting will act as a deterrent to theft. There is
currently no legal precedent for media fingerprints being evidence of crime, and this case
will probably be hard to make – after all, detection is a statistical process with false
positives, and plenty of opportunity for deniability. However, we anticipate that there will
be uneasiness in sharing a piece of content that may contain a person’s identity, and that
ultimately leaves that person’s control.
Note also that with widely distributed watermarking detectors, it is easy to see
whether you have successfully removed a watermark. There is no such assurance for
determining whether a fingerprint has been successfully removed from an object because
users are not necessarily knowledgeable about the fingerprint scheme or schemes in use.
However, if it turns out that the deterrence of fingerprinting is small (i.e. everyone shares
their media regardless of the presence of marks), there is probably no reasonable legal
response. Finally, distribution schemes in which objects must be individualized will be
expensive.
5 Conclusions
There seem to be no technical impediments to darknet-based peer-to-peer file sharing
technologies growing in convenience, aggregate bandwidth and efficiency. The legal
future of darknet-technologies is less certain, but we believe that, at least for some classes
of user, and possibly for the population at large, efficient darknets will exist. The rest of this
section will analyze the implications of the darknet from the point of view of individual
technologies and of commerce in digital goods.
5.1 Technological Implications
DRM systems are limited to protecting the content they contain. Beyond our first
assumption about the darknet, the darknet is not impacted by DRM systems. In light of our
first assumption about the darknet, DRM design details, such as properties of the tamper-
resistant software may be strictly less relevant than the question whether the current
darknet has a global database. In the presence of an infinitely efficient darknet – which
allows instantaneous transmission of objects to all interested users – even sophisticated
DRM systems are inherently ineffective. On the other hand, if the darknet is made up of
isolated small worlds, even BOBE-weak DRM systems are highly effective. The interesting
cases arise between these two extremes – in the presence of a darknet, which is
connected, but in which factors, such as latency, limited bandwidth or the absence of a
global database limit the speed with which objects propagate through the darknet. It
appears that quantitative studies of the effective “diffusion constant” of different kinds of
darknets would be highly useful in elucidating the dynamics of DRM and the darknet.
Proposals for systems involving mandatory watermark detection in rendering devices
try to impact the effectiveness of the darknet directly by trying to detect and eliminate
objects that originated in the darknet. In addition to severe commercial and social
problems, these schemes suffer from several technical deficiencies, which, in the
presence of an effective darknet, lead to their complete collapse. We conclude that such
schemes are doomed to failure.