An Open Data Exchange For The Web of Things: Sun Microsystems Laboratories, Menlo Park, California, USA
An Open Data Exchange For The Web of Things: Sun Microsystems Laboratories, Menlo Park, California, USA
Network:
an Open Data Exchange for the Web of Things
Abstract-An increasing number of embedded devices of aU applications by students, researchers and hobbyists around
sorts (sensors, mobile phones, cameras, smart meters, traffic the world. A large number of these applications require
lights, home appliances etc.) are now capable of communicating facilities for collecting, storing, searching, sharing and ana
and sharing data over the Internet. We have developed a web
based infrastructure called Sensor.Network for storing, sharing,
lyzing sensor data. These requirements aren't unique to Sun
searching, visualizing and analyzing data from heterogeneous SPOTs. There are several commercial offerings that attempt
devices and facilitating easy interaction amongst devices and to address this need for specific application domains. For
with end users through an open, REST-based API. Such a data example, Sentilla [2] and SynapSense [3] provide wireless
exchange can enhance our understanding of the world around
instrumentation solutions for data center monitoring and
us and offer valuable insights for tackling a wide range of
issues-from global ones like sustainable resource management
energy management including data collection, storage, anal
to local ones like improving rush-hour traffic flow. The de ysis, alerts and reporting tools. Similarly, Johnson Controls
sign and implementation of a service like this raises several [4] (a maker of HVAC and building management systems),
questions: What are the right data abstractions? How should Echelon [5] (maker of the Meterus smart energy meter)
one balance ease of sharing with privacy concerus? W hat are
and Fitbit [6] (which makes the fitbit fitness and sleep
effective mechanisms for searching, visualizing and analyzing
data? How can one facilitate data-centric collaboration and the
tracker) all offer custom software for post-processing data
composition of loosely-coupled ''mashups'' between sensors and from their devices and (in some cases) for managing their
actuators (e.g. a humidity sensor from one vendor controlling equipment controllers. However, in all of these cases, there
a sprinkler system from another). This paper describes the is a tight coupling between the devices and the back-end
design choices we made in addressing many of these questions
services and tools. It is not easy to interface devices from
and the rationale behind them. We also provide a brief survey
of other comparable projects and evaluate them against a set
one manufacturer with the services and tools from another,
of common criteria. thereby making them closed systems.
I. INTRODUCTION
Authorized licensed use limited to: University of Canberra. Downloaded on February 18,2021 at 02:47:27 UTC from IEEE Xplore. Restrictions apply.
and allows the composition of new services incorporating tion, authorization and encryption. Search and organization
heterogenous sensors and actuators from different manu are critical when dealing with large amounts of data. We
facturers and potentially owned by different entities. Such also need a rich set of tools for visualizing, analyzing and
an open data-exchange is attractive for several additional collaborating around data.
reasons: We are working closely with several customers and
have used their feedback to fine tune the design of Sen
1) It enables investigation into correlations between sen
sor.Network over multiple iterations. These customers in
sor data from multiple disparate sources. For example,
clude: (i) the United States Geological Survey (USGS)
the RunKeeper iPhone application [7], provides run
which has launched an initiative [8] to convert 15,100 acres
ners with useful information such as distance, time,
of commercial salt ponds at the south end of San Francisco
pace and path traveled using sensors built into the
Bay to a mix of tidal marsh, mudflat and other wetland
iPhone. If this data were managed using an open
habitats, (ii) Vodafone which is using our service for storing,
data exchange, a runner could potentially correlate
analyzing and visualizing energy consumption data from
these measurements with data from temperature and
a SmartMetering pilot project, (iii) Conservation through
humidity sensors in the area for greater insight when
Research Education and Action (CREA) [9], a non-profit
evaluating her own performance.
organization, which is using Sun SPOTs to monitor abiotic
2) In many scientific communities we've interacted with,
environmental variables in the Cocobolo Nature Reserve,
lots of data still (sadly) sits on individual laptops as
Panama, and (iv) a host of hobbyists with data from exper
scattered text files or spread sheets. In many cases,
iments ranging from bicycle rides to environmental moni
original data from related studies is hard or impossible
toring of tomato plants to a high altitude weather balloon
to find and only papers with interpretations of the
launch into near space [10]. We feel that our experience
raw data are available. By facilitating data sharing, an
could be useful to other efforts [11]-[15] we are aware of
open data exchange makes it possible for a scientist to
that share similar goals.
access raw data from someone else's experiment and
The rest of this paper is organized as follows: Section II
draw new conclusions.
delves deeper into the design choices we faced and the
3) It enables collaborative classification (e. g. with tag
decisions that guided our implementation. Section ill is
ging), annotation, editing (e. g. to discard data from
a brief survey of comparable efforts to build open data
a miscalibrated sensor), analysis and visualization of
exchanges and Section IV summarizes our contributions and
data using the web as a common platform. In place
future work we plan to undertake.
of static images published in technical journals, one
can enable online forums where teams of scientists II. SENSOR.NETWORK DESIGN
can experiment with and discuss different types of
This section discusses the key issues we faced in archi
visualizations for that same data set. Furthermore, for
tecting Sensor.Network and the rationale behind the design
long running experiments, we can create live plots
choices we made. These issues include choosing good
plots that are redrawn by retrieving the latest data
data abstractions and Application Programming Interfaces
from the open data exchange whenever the web page
(APIs), building effective security mechanisms and tools for
containing the visualization is rendered in a user's
data analysis and visualization.
browser.
4) It spares domain experts the effort of setting up the A. The Datastream Abstraction
IT environment required to store their sensor data
At the core of the Sensor.Network architecture is the
reliably, managing access controls and performance
notion of a datastream. A datastream refers to a time-series
tuning a compute infrastructure. Instead, the barrier
of sensor values that are sampled together. Each value has a
is reduced to learning the API and tools exposed by
name, a type, units, and an optional valid range associated
the open data exchange.
with it. For example, the datastream "My location" may have
The design of an open data exchange must address several three sensor values: latitude, longitude and altitude, all of
critical issues. Data formats, abstractions, APIs and data type float, with the first two measured in degrees, with a
insertion/retrieval models (e. g. push vis pull, polling vis valid range of -90 to +90 and -180 to +180, respectively
alerts) need to be flexible enough to meet the requirements and the last in meters.
of many varied domains yet simple enough for users not The datastream abstraction decouples the physical sensor
trained in computer science. Users must be able to exercise from the high-level phenomenon of interest to the end user.
flexible and fine-grained control over how their sensor data Consider a user interested in measuring the light exposure
is shared (e. g. , read-only, time-delayed, low-fidelity) and of an outdoor plant using Sun SPOTs. Let's say the user
with whom (owner-only, specific individuals) and when and defines a "My plant's light exposure" datastream and starts
where alerts are sent. This requires support for authentica- inserting light readings (as described in section II-B2) from a
739
Authorized licensed use limited to: University of Canberra. Downloaded on February 18,2021 at 02:47:27 UTC from IEEE Xplore. Restrictions apply.
Edit a datastream
the tomato plant."
Nama: Etherwal! on my desk
• A media URI (e. g. image or video) with additional
Description: Energy con�umtd by electronic devices on my desk (optional)
Tags: energv. tlherwan (comma-separated. optional)
information about the datastream.
MecliaURI: http://bl095.5un.com/vipul/ruource/Etherwall/eth, (URI for image etc, optional) • A primary category, e. g. energy, health and fitness,
Media type:
(1m;; ---=n environment, etc. Such categorization helps with orga
Primary category: I Energy :)
location: o Mobile/Unspecified 0 Fixed nization and in building communities around particular
����"I"'!=f1I[l<ag marX8f or type a street address in the search
specify location. interests.
• Tags which have proven to be an effective search aid
longitude: -122.14758396148682
on popular services like Flickr and YouTube.
• Location for use in a geographical view or search of a
datastream, if the datastream is not 'mobile'.
• Sampling period for periodic datastreams. This infor
Sampling:
Privacy:
o Aperiodic 0 Periodic, once every � milliseconds
mation is used by the system to recognize when sensor
Who can read data? I Owner and specifIC Individuals : I I myBuddlts : I ( Crutt group ) data collection is experiencing unexpected interrup
Who can insert data? I Onfy owner :
) tions.
Values:
• Access permissions which determine what operations
1. Temperature flN' etlsiu!
are allowed for different classes of users.
2. ActiYC!�r flN' Wans • A numeric identifier which is unique across the Sen
3. CurrentRMS flN' Amp
sor.Network system. While other pieces of information
are editable and specified by a user, this identifier
is immutable and assigned by the system when the
Figure 2. Metadata associated with a datastream includes: name, descrip datastream is first registered.
tion, tags, media type and URI, category, location, sampling period, access
permissions and the name, type, units for each sensor value.
B. REST-based API
740
Authorized licensed use limited to: University of Canberra. Downloaded on February 18,2021 at 02:47:27 UTC from IEEE Xplore. Restrictions apply.
<?xm1 version="I.O"?> <?xm1 version="I.O"?>
<datastream> <sampleData>
<name>My Pothos Plant</name> <sensorNodeId>OOI4.4FOI.OOOO.OIAB</sensorNodeId>
<description> <timestamp>2009-07-30 T l 3:31:37.459Z</timestamp>
light and temperature readings from my office plant. <value>700</value>
</description> <value>29.4</value>
<tag>light</tag> </sampleData>
<tag>temperature</tag>
<value> (a) Data.xm1
<name>light</name>
<type>int</type> 2009-10-02T09:30:00.000Z,OOI4.4FO l .OOOO.Ol AB,250,19.5
<units>lumen</units> 2009-10-02T09:45:00.000Z,OOI4.4FO l .OOOO.Ol AB.285.20.6
</value>
<value>
(b) Datacsv
<name>temperature</name>
<type>ftoat</type>
<units>celsius<lunits> Figure 4. Example data descriptor in (a) XML, and (b) CSV format.
<rnin>-50.0</rnin>
<max>50.0</max>
</value>
</datastream> Many of our current and potential users have legacy data
in the form of CSV files and we provide a mechanism to
Figure 3. Datastream.xml: An example of a minimal datastream descriptor. bulk-upload multiple samples in a single operation. We do
require that users pre-format their CSV file to conform to
a specific order (timestamp, sensor node identifier, sensor
1) Creating/Editing a datastream: Datastreams are de values) so the system can parse the samples correctly.
fined using an XML descriptor that includes name, tags, curl -dump-header hdrs.txt -header "X-SensorNetworkAPIKey: apiKey"
location (if any), values etc. as shown in Figure 3. To create -request POST -header "Content-type: textlplain" -data "@Data.csv"
741
Authorized licensed use limited to: University of Canberra. Downloaded on February 18,2021 at 02:47:27 UTC from IEEE Xplore. Restrictions apply.
and the runner looking for temperature and humidity read control is implemented using a UNIX-like model of per
ings for her run need the ability to find datastreams meeting missions based on user classes and access types. There are
specific criteria (e. g. based on sensor type and location). three built-in user classes for each datastream: (i) owner (ii)
We provide a mechanism for organizing datastreams into registered users of Sensor.Network, and (iii) everyone. In
some broad categories (described in Section II-A) and this addition, each user can create arbitrary groups of users and
list is still evolving. However, we strongly feel that a authorize different groups for different operations.4 Besides
flexible search mechanism is much more important3 and data insertion and retrieval, we are investigating other forms
have designed the system to support searching based on of access to sample data e. g. low-fidelity or time-delayed
datastream name, description, tags, owner, location and even access.
value names and units. 3) Confidentiality: Encryption is essential to guarantee
that authentication credentials like passwords and API keys
D. Security Mechanisms or sensor data are not exposed to eavesdropping by unautho
rized entities. The use of a REST-based API makes it easy
Sharing is a central idea for our service and some might
to layer the entire protocol interaction over HTTPS.
see security mechanisms for access control and privacy as
being at odds with it. However, our experience suggests that E. Tools for Visualization and Analysis
users are more willing to store their data on Sensor.Network
50
if they can control how it is shared and with whom.
1) Authentication: As illustrated in Section II-B, most 40
,
operations on Sensor.Network require the invoking entity
� 30
to authenticate itself. Operations initiated in a browser .§. ..
use username/password and those initiated programmati i 20 •
.
<I)
20 40
.
60 60 100 120
Cadence (rpm)
"X-SensorNetworkAPIKey" field. Each user is assigned a
unique system-generated API key that identifies them. A +
Lundy
Map I Satellite I Hybrid
®
user may request the generation of additional API keys ""UontgGI'I'IffY
o QOIdHI
to delegate authority for limited operations or for specific
time periods to other entities without having to share their
•
password. Yosemite
National Park
742
Authorized licensed use limited to: University of Canberra. Downloaded on February 18,2021 at 02:47:27 UTC from IEEE Xplore. Restrictions apply.
Table I
A BRIEF COMPARISON OF OPEN DATA EXCHANGE DESIGNS
Sensor Network I SensorBase [11], [12] I Sensorpedia [13] I Pachube [14] SenseWeb [15]
Data Abstraction Datastreams Database table Atom Feeds I/O Feeds (similar to Web service representing
datastream) sensor node
Open API RESTful SOAP RESTful RESTful SOAP
Data archival Yes Yes No Yes Unknown
Data formats XML,CSV, JSON XML,CSV, JSON Atom, GeoRSS EEML, CSV; JSON, Text, Excel
RSS, Atom
Data uploading Push Push Not Applicable Push, Pull Unknown
Authentication Password & API Key Password Unknown Password & API Key Password
Authorization Rich Fine-grained Fine-grained Unknown Write by owner only, Coarse-grained
Read by everyone
Search by Name, Description, Project name, Thble Feed Name, Feed De- Feed name, Feed de- Sensor name, Owner
Tags, Owner, Value name, Thgs scription, Thgs scription, Thgs
names, Value units
Graphical visualization Rich, interactive, embed- Static line plots, Map Map view, others un- line plots, Map view Application provided
dable plots, Dashboard & view known
Map views
AlertslNotifications Compound expressions Compound expressions Unknown Simple expressions (in- None
secure*)
*Notifications can be set on any target URL without proving ownership or control over the URL.
support pan-and-zoom, location traces support a time slider For example, a scientist trying to correlate sensor readings
as in Figure 5) and can be embedded in web-pages, blogs from two different projects, would need to understand and
etc. not hosted at Sensor.Network (see [10]). In addition, implement SQL joins. SensorBase seems to have some the
we also offer views that capture essential information about most advanced notification mechanisms that the user can
groups of datastreams. For example, a dashboard view is set on a per-table basis based on the satisfaction of multiple
useful for conveying recent activity and access permissions conditions.
while a map view is useful for geographical search. The The Sensorpedia [13] project at Oak Ridge National Labo
ability to dynamically generate tag clouds for a specific ratory (ORNL), the SenseWeb [15] project at Microsoft and
set of datastreams (e. g. , recently active or belonging to Nokia's SensorPlanet [23] are all examples of similar ini
a specific geographical area) is an interesting means for tiatives within industrial or government, i. e. non-academic,
studying popular categories and spotting trends. research organizations. There is very little public information
The ability to plug in different analysis engines, e. g. available for SensorPlanet. SenseWeb is the most mature
one that processes a time-series of GPS and accelerometer project amongst these three. It offers a platform on which
readings to deduce activity (walking, running, driving), is other applications can be built. One such application is Sen
another important requirement. We expect many of these sorMap [24] which mashes up sensor data from SenseWeb
modules to be domain specific and our users are excited on a geographical map interface. However, SenseWeb's use
about the possibility of sharing and reusing such modules of SOAP [25] makes it more heavy-weight compared to a
within their scientific communities. REST-based service and this complexity raises the barrier
to entry for potential users. Sensorpedia is the newest of
III. RELATED W ORK these three services with the stated aim of utilizing Web 2.0
This section briefly surveys other projects similar to ours social networking principles for organizing and providing
and the comparison is summarized in Table I. Our evaluation access to sensor network related data sets. Users can publish
is based on publicly available information. These projects and subscribe to sensor feeds using Atom. The service
may very well have the vision of implementing additional does not appear to offer any data archiving which makes
functionality that isn't currently exposed to their users. it unsuitable for users (e. g. hobbyists) that, for lack of
Open data exchanges came about as a result of Wireless ability or resources, do not wish to set up their own data
Sensor Network (WSN) research groups investigating ways store. They appear to have under development an interesting
to collect sensor data and manage it easily via ubiquitous spreadsheet-inspired visual programming tool for mixing
clients like web browsers. One of the earliest such initiatives sensor data from multiple sources but this functionality is
is SensorBase [11], [12] by the Center For Embedded currently not publicly available nor is there any information
Networked Sensing (CENS) at UCLA. It uses a relational on the security model or visualization support.
database table as its data abstraction and an SQL-centric Pachube [14] stands out among all the services we looked
API. Users can group related tables into "projects". This at because it is the only one backed by venture capital fund
has the benefit of being very general but is too low-level ing. Its vision of facilitating different sensor and actuator
for domain experts without a computer science background. devices to connect easily with each other is very similar to
743
Authorized licensed use limited to: University of Canberra. Downloaded on February 18,2021 at 02:47:27 UTC from IEEE Xplore. Restrictions apply.
ours. When we were first starting out, we evaluated Pachube [4] Johnson Controls. http://www.johnsoncontrols.com/
in its early days and noticed several shortcomings that
[5] Echelon Corp. http://www.echelon.com/
prompted us to pursue an independent effort. While some of
those shortcomings have been addressed subsequently, such [6] Fitbit. http://www.fitbit.com/
as support for "pushing" data into Pachube (the pull model
[7] RunKeeper. http://www.runkeeper.com/
doesn't work for sensor nodes behind firewalls), others still
remain. Pachube's EEML [26], the data format used to insert [8] USGS. South Bay Salt Pond Restoration Project. http:
and retrieve sensor data, mixes sensor data and metadata //www.southbayrestoration.org/
and causing metadata to be repeated unnecessarily with [9] Conservation through Research Education and Action
individual data samples. The security model doesn't appear (CREA). http://www.crea-panama.org/
to be well developed: all data is viewable by anyone, even
without an account on Pachube, and the alert mechanism is
[10] G. Klein et al. High altitude weather balloon launch into
near space. http://hibal.org/missions/apteryx/
prone to "spamming" as described in Section II-D.
[11] Sensorbase. http://www.sensorbase.org/
IV. CONCLUSION AND FUTURE WORK
visualization tools that support collaborative annotation and [24] S. Nath,J. Liu,and F. Zhao,"SensorMap for wide-area sensor
editing of data. We also plan to study other data storage webs," IEEE Computer Magazine, vol. 40,no. 7,pp. 90-93,
and processing models (e. g. , Hadoop [31]) that may offer July 2007.
a scalability advantage over relational databases by moving
[25] W 3C. Simple Object Access Protocol. http://www.w3c.org/
computation closer to data storage. TRlsoapl
the anonymous referees for their feedback on an earlier draft [27] V. Gupta. SPOTWeb. http://blogs.sun.com/vipuVentry/sun_
of this paper. spots_spotweb_and_sensor
[1] Project Sun spar. http://www.sunspotworld.com/ [29] R project for statistical computing. http://www.r-project.org/
744
Authorized licensed use limited to: University of Canberra. Downloaded on February 18,2021 at 02:47:27 UTC from IEEE Xplore. Restrictions apply.