Information Privacy and Security
Context of this talk…
• Do we sacrifice privacy by using various
network services (Internet, online social
networks, mobile phones)?
• How does the structure/topology of a network
affect its privacy properties?
• Techniques for enhancing privacy?
• Privacy is hard!
Information security
• Information security typically focuses on
computer-based data systems; and within that
category, the shared systems that may be
networked together to access large, sensitive
data collections, because those are generally
the most important information assets to
protect.
Goals and Measures: Three times
Three
Information security’s goals and the measures to achieve those goals are
often described in threes
Confidentiality, Integrity, and
Availability (CIA)
• In order to fulfill privacy requirements, security measures first
and foremost aim to assure confidentiality. That is, that
sensitive information is accessed only by appropriate persons
for appropriate reasons. It is also important to take steps to
assure the integrity (which refers to the accuracy of the data
for its intended use) and availability of data for its legitimate
users of the data for its intended use) and availability of data
for its legitimate users. Integrity is the security equivalent of
terms like data validity and reliability. It includes concerns
with verifying data’s sources and subsequent non-corruption
on the way to the user. Data integrity principles are lost if the
data are corrupted (so no longer valid or reliable for their
intended uses) or cannot be relied upon as accurately
sourced.
Physical, Technical, and Administrative
(PTA)
• Physical safeguards include everything from locks to proper
lighting. Simple material barriers are critical, and too often
forgotten to protect everything from USB flash drives to
corporate datacenters. They are the first “line of defense.”
• Technical safeguards include a broad spectrum of measures,
such as device data encryption (which is unlocked with strong
passwords / biometrics), anti-malware software, and
encrypted communications.
• Administrative safeguards can be summarized as “the rules,”
such as policies about who is granted access to what types of
data. It also includes rules like “use a strong password” and
“do not share it.”
Prevention, Detection, and Response
(PDR)
• It is generally best to prevent bad things from occurring,
including information security violations. Exposure or loss of
data can be catastrophic for both the organization and the
individuals who work for it. Moreover, such exposures are
potentially even more damaging for the persons whose data
are exposed.
• A security system must also have measures to detect what
was not prevented.
• It should also identify potential and actual security issues, and
respond with appropriate speed to limit the damage.
• There is both a complimentary and an inherent trade-off. For
example, with PDR, unless preventive measures can be
perfect (which they almost never can be) some attention to
detection and response is essential.
Rules for Physical Security
• Door locks, alarms, and other physical security devices are used to keep areas
• secure when not open for business.
• Unattended areas are kept secure with door locks and other devices whenever
• possible, even during business hours.
• Access to sensitive equipment and data is controlled -- that includes access to
• printers, fax machines, computers, and paper files.
• Visitors are appropriately monitored and escorted (as necessary). Persons in
• restricted areas are politely challenged for ID.
• Keys, ID badges, and anything else that controls physical access are kept secure.
• Theft or loss of such items is reported to appropriate organizational authorities
• immediately.
What do we mean by privacy?
• Louis Brandeis (1890)
– “right to be left alone”
– protection from institutional threat:
government, press
• Alan Westin (1967)
– “right to control, edit, manage, and
delete information about themselves
and decide when, how, and to what
extent information is communicated
to others”
Privacy vs. security
Privacy: what information goes where?
Security: protection against unauthorized
access
• Security helps enforce privacy policies
• Can be at odds with each other
– e.g., invasive screening to make us
more “secure” against terrorism
Privacy-sensitive information
• Identity
– name, address, NIN
• Location
• Activity
– web history, contact history, online purchases
• Health records
• …and more
Tracking on the web
• IP address
– Number identifying your computer on the Internet
– Visible to site you are visiting
– Not always permanent
72.21.214.128
• Cookies
– Text stored on your computer by site
– Sent back to site by your browser
– Used to save prefs, shopping cart, etc. Internet
– Can track you even if IP changes
152.3.136.66
OSNs: State-of-the-Art
• Fun
• Popular
• Platform
“Facebook Wants You To Be Less Private”
Online social networks
• Pros
– Simplifies data analysis
– High availability
• Cons
– Single point of attack
– No longer control access
to own data
Centralized structure
Personal
data
Alternatives?
• Anonymization
– Do not use real names
• Encryption
• Decentralization
– Tighter control over data
Anonymization
• Hide identity, remove
identifying info
• Proxy server: connect through
a third party to hide IP
• Health data released for
research purposes: remove
name, address, etc
Threat: deanonymization
• Netflix Prize dataset, released 2006
• 100,000,000 (private) ratings from 500,000 users
• Competition to improve recommendations
– i.e., if user X likes movies A,B,C, will also like D
• Anonymized: user name replaced by a number
Threat: deanonymization
• Problem: can combine “private” ratings from Netflix
with public reviews from IMDB to identify users in
dataset
• May expose embarrassing info about members…
Threat: deanonymization
User Movie Rating
User Movie Rating
1234 Rocky II 3/5
dukefan The Wizard 8/10
1234 The Wizard 4/5
dukefan The Dark Knight 10/10
1234 The Dark Knight 5/5
dukefan Rocky II 6/10
…
…
1234 Girls Gone Wild 5/5
User 1234 is dukefan!
Threat: deanonymization
• Lesson: cannot always anonymize data simply by
removing identifiers
• Vulnerable to aggregating data from multiple
sources/networks
• Humans are predictable
– E.g., try Rock-paper-scissors vs AI
P2P Architecture
Personal
data
Decentralization: pros and cons
• True ownership of data
• Maintenance burden
• Cost
• Business model
• User experience
Location privacy
• Mobile phones:
– Always in your pocket
– Always connected
– Always knows where it is: GPS
• Location-based services
• Location-based ads
• What are we giving up?
Mobile phones
Why, when and what to disclose?
• It is not a simple question!
• Tradeoff between functionality
• Also important whom to disclose it to?
– Relatives
– Co-workers
– Friends
• There have been studies about this
– Not easy to classify
– People want to disclose only what is useful
How is your data used by apps?
• Many “free” apps supported by ads
• Analytics: profiling users
• Research: found it common for popular free
apps to send location+device ID to advertising
and analytics servers
• What can we do?
– More visibility into what app does with data once
it reads it
AppScope
• Monitors app behavior to determine when
privacy sensitive information leaves the phone
Takeaways
• Decentralized network structure can enhance
privacy
• Difficult to achieve true anonymity
• Fine-grained control over data can help
– Tension with usability
Resources
• Duke “Office Hours” on Privacy in Social Media
– http://ondemand.duke.edu/video/23686/landon-cox-on-privacy-and-soci
• “Someone Is Watching Us” on WUNC
– http://wunc.org/tsot/archive/Someone_Is_Watching_Us.mp3/view
Acknowledgments
• Thanks to Peter Gilbert, who prepared a
significant amount of this material for us.