Collage is a new platform for constructing censorship-resistant applications. This document gives an overview of how to build new applications using Collage.
For general information on Collage, please see http://gtnoise.net/collage.
Collage provides a framework for constructing censorship-resistant message channels, by storing data inside user-generated content. Some examples of applications you could build using Collage:
- Publish daily news articles hidden inside photos on Flickr.
- Send secret emails hidden inside your tweets on Twitter.
- Distribute Tor bridge node addresses using blog posts.
Collage only provides tools that you can use to build these message channels. It doesn't by itself allow for communication. Each application using Collage must provide:
- A set of user-generated content (e.g., photos on Flickr, videos on YouTube, etc.)
- A mechanism for storing data inside this user-generated content. We provide some example techniques, e.g., for storing data inside JPEGs.
- A set of tasks that upload and download content from user-generated content hosts
This may look like a lot of work. Luckily, we include examples of these components with Collage that you can modify for your application. Collage uses these components to construct a custom message channel for your application.
This is a brief summary of the code tree.
collage/: The core of Collage.collage_apps/: A collection of sample Collage applications.proxy/: An application for fetching news articles stored hidden in user-generated content.taskmodules/: Pluggable modules for fetching vectors on content hosts.
vectors/: For embedding messages inside user-generated content vectors (e.g., JPEGs, tweets, etc.).providers/: For obtaining sets of vectors, e.g., from a directory, from generous users, etc.tasks/: For uploading and downloading vectors to and from user-generated content hosts.
collage_donation/: A system for accepting donations of user-generated content from end-users.server/: The server backend for the donation system, which stores vectors while they are encoded with censored messages.client/: The donation clients, which accept donations (e.g., photos) from generous users.flickr_web_client/: A Web application that accepts photo donations from Flickr users.tk_client/: A standalone GUI application that accepts photos from Flickr users.
This section describes how to build a Collage message channel. You must specify three components:
- A
Vector, which represents user-generated content (e.g., photos, videos, etc). - A
VectorProvider, which obtainsVectors from an external source. - A set of
Tasks that upload and download Vectors to and from user-generated content hosts (e.g., Flickr, YouTube, etc.)
These components fit together as follows:
- One user (the "sender") composes a message and stores it inside a set of
Vectors he obtains from aVectorProvider. - The sender executes the
sendmethod of someTasks to upload hisVectors to a user-generated content host. - Later, another user (the "receiver") executes the
receivemethod of the sameTasks to download the sender'sVectors from the content host. - The receiver recovers the sender's message from the downloaded
Vectors.
Vector, VectorProvider and Task are Python classes which I now describe
in more detail. I encourage you to also look at the source code for these
classes.
A Vector represents user-generated content, e.g., a single JPEG image. Aside
from storing vector's data (e.g., an in-memory representation of the JPEG), the
Vector must provide an encode method that hides a message inside the
Vector (or throws an exception if that message will not fit). It must also
provide a corresponding decode method to extract the hidden message.
Typically these encode and decode methods are implemented using an
information hiding algorithm like steganography or watermarking. For example,
we use OutGuess to store messages inside JPEG data.
Important: The encode method must store the message inside the vector
without altering its visual appearence. Otherwise, it would be easy to detect
uses of Collage and make it much easier to block.
- Examples:
collage_apps/vectors - What you must do: Provide a new
Vectorsubclass, withencode,decodeandis_encodedmethods. - For more information:
collage/vectorlayer.py
A VectorProvider obtains Vectors from somewhere. This can be from a
directory on disk, a photo sharing program such as iPhoto, from an external
source such as donations from Flickr users, or pretty much anywhere else.
Typically, when developing a Collage application you use
DirectoryVectorProvider to fetch Vectors from a local repository of
vectors. However, when you deploy your application in the real world, you will
need to get your Vectors from somewhere else.
- Examples:
collage_apps/providers, specificallylocal.py. - What you must do: Provide a new
VectorProvidersubclass, with a_find_vectormethod. - For more information:
collage/vectorlayer.py
A Task uploads and downloads Vectors from a user-generated content host,
e.g., Flickr. Each Task performs a specific action on a content host that
yield some Vectors, for example:
- Search for "blue flowers" on Flickr and download the top 20 resulting photos.
- Download the videos from user "JohnDoe"'s on YouTube.
- Look at trending tweets on Twitter.
Each Task has two key methods: send and receive. send uploads a
Vector to a user-generated content host, while receive downloads a set of
Vectors from that content host, some of which might contain Collage data.
Important: For each Task, the Vectors uploaded by send must be
accessible by receive. For example, if send uploads pictures of "pink
dinosaurs" to Flickr, then receive should, e.g., perform a keyword search for
"pink dinosaurs".
To construct these tasks, we recommend you use
Selenium, a Web automation framework bundled with
Collage. The example in collage_apps/proxy/taskmodules/flickr_new.py
demonstrates how to do this.
- Examples:
collage_apps/tasks,collage_apps/proxy/taskmodules - What you must do: Provide new
Tasksubclasses, withsend,receiveandcan_embedmethods. - For more information:
collage/messagelayer.py(theTaskclass, near the bottom of the file.)