What is OnionScan?
OnionScan is a free and open source tool
for investigating the Dark Web. For all the
amazing technological innovations in the
anonymity and privacy space, there is
always a constant threat that has no
effective technological patch - human
error.
Whether it is operational security leaks or
software misconfiguration - most often
times the attacks on anonymity don't
come from breaking the underlying
systems, but from ourselves.
OnionScan has two primary goals:
    We want to help operators of hidden
services find and fix operational security
issues with their services. We want to help
them detect misconfigurations and we
want to inspire a new generation of
anonymity engineering projects to help
make the world a more private place.
    Secondly we want to help researchers
and investigators monitor and track Dark
Web sites. In fact we want to make this as
easy as possible. Not because we agree
with the goals and motives of every
investigation force out there - most often
we don't. But by making these kinds of
investigations easy, we hope to create a
powerful incentive for new anonymity
technology (see goal #1)
            Installing
       A Note on Dependencies
OnionScan requires either Go 1.6 or 1.7.
In order to install OnionScan you will need
the following dependencies not provided
by the core go standard library:
     golang.org/x/net/proxy - For the Tor
SOCKS Proxy connection.
       golang.org/x/net/crypto - For PGP
parsing
       golang.org/x/net/html - For HTML
parsing
   github.com/rwcarlsen/goexif - For EXIF
data extraction.
    github.com/HouzuoGuo/tiedot/db - For
crawl database.
See the wiki for guidance.
    Grab with go get
    go get github.com/s-rah/onionscan
 Compile/Run from git cloned source
Once you have cloned the repository into
somewhere that go can find it you can run
  go install github.com/s-rah/onionscan
and then run the binary in
         $GOPATH/bin/onionscan.
Alternatively, you can just do go run
     github.com/s-rah/onionscan.go
to run without compiling.
          Quick Start
For a simple report detailing the high,
medium and low risk areas found with a
hidden service:
 onionscan notarealhiddenservice.onion
The most interesting output comes from
the verbose option:
           onionscan --verbose
       notarealhiddenservice.onion
There is also a JSON output, if you want to
integrate with another program or
application:
         onionscan --jsonReport
       notarealhiddenservice.onion
If you would like to use a proxy server
listening  on   something   other  that
127.0.0.1:9050, then you can use the
--torProxyAddress flag:
                    onionscan
    --torProxyAddress=127.0.0.1:9150
        notarealhiddenservice.onion
More detailed documentation on usage
can be found in doc.
What is scanned for?
A list of privacy and security problems
which are detected by OnionScan can be
found here.
You can also directly configure the types of
scanning that onionscan does using the
scans parameter.
./bin/onionscan        --scans       web
notarealhiddenservice.onion
Running the OnionScan Correlation Lab
If you are a researcher monitoring multiple
sites you will definitely want to use the
OnionScan Correlation Lab - a web
interface hosted by OnionScan that allows
you to discover, search and tag different
identity correlations.
What is scanned for?
Below is an incomplete list of the kinds of
scans and correlations that OnionScan
supports.
           Web sites
When OnionScan detects a web server, it
is scanned for the issues described in this
section.
 Apache mod_status
       Leak
This should not be news, you should not
have it enabled. If you do have it enabled,
attacks can:
   Build a better fingerprint of your server,
including php and other software versions.
   Determine client IP addresses if you are
co-hosting a clearnet site.
   Determine your IP address if your setup
allows.
      Determine other sites you are co-
hosting.
   Determine how active your site is.
   Find secret or hidden areas of your site
   and much, much more.
Seriously, don't even run the tool, go to
your site and check if you have /server-
status reachable. If you do, turn it off!
    Open Directories
Basic web security 101, if you leave
directories open then people are going to
scan them, and find interesting things -
old versions of images, temp files etc.
Many sites use common structures style/,
images/ etc. The tool checks for common
variations, and allows the user to submit
others for testing.
           EXIF Tags
Whether you create them yourself or allow
users to upload images, you need to
ensure the metadata associated with the
image is stripped.
Many, many websites still do not properly
sanitise image data, leaving themselves or
their users at risk of deanonymization.
  Server Fingerprint
Sometimes, even without mod_status we
can determine if two sites are hosted on
the same infrastructure. We can use the
following    attributes to   make    this
distinction:
  Server HTTP Header
      Technology Stack (e.g. php, jquery
version etc.)
    Website folder layout e.g. do you use
/style or /css or do you use wordpress.
   Fingerprints of images
        Analytics IDs
Some onion services use 3rd party
analytics providers to track usage of their
site. These providers often require a
unique code to be embedded within the
site - this code can be used to determine if
two sites share a common operator or to
find clearnet sites using the same code.
       PGP Identities
OnionScan extracts PGP identities from
webpages in order to grab identifiers like
email address / identities & GPG versions.
                 SSH
OnionScan collected information about
SSH endpoints including software versions
and the SSH public key fingerprint. These
can be correlated against other onion
services or clearnet servers in order to try
and identifier the actual sever location.
         FTP & SMTP
OnionScan collected information from
other non-web servers, most notably
software banners. These banners are often
misconfigured to reveal information about
the target server - including OS version,
and sometimes hostnames and IP
addresses.
The software version itself can also be a
correlation vector.
      Cryptocurrency
          Clients
OnionScan     scans      for    common
cryptocurrency clients including Bitcoin
and Litecoin.
From these it extract other connected
onion services as well as the user agent.
  Protocol Detection
OnionScan also detects for the presence of
many other protocols including IRC, XMPP,
VNC & Ricochet.
 Crawl configuration
     Providing crawl configuration
The     directory  from  which    crawl
configurations are fetched from is
specified using the command-line option
-crawlconfigdir <path>.
In this directory there should be a file per
hidden service that needs specific
configuration     options.  For    example
ab23cd45ef67gh76.onion.json; though the
name of the file is not parsed so any
naming convention can be used.
Configuring the scan for a service does not
automatically cause it to be scanned. They
still need to be specified explicitly, either
on the command line or in a -list file.
        Configuration
          structure
        {
                                     "onion":
"aabbccddeeffgghh.onion",
        "base": "/forums",
        "exclude": [
           "/profile",
           "/settings"
        ],
        "relationships":[
              {
               "name":"User",
               "triggeridentifierregex":"inde
x\\.php\\?action=profile;u=([0-9]*)",
               "extrarelationships":[
                    {
                     "name":"Name",
                                "regex":"<div
class=\"username\"><h4>(.*)            <span
class=\"position\">"
                    },
                    {
                      "name":"Position",
                              "regex":"<span
class=\"position\">(.*)</span></h4>"
                     }
                ]
             },
             {
               "name":"Post",
                "triggeridentifierregex":"inde
x\\.php\\?topic=([0-9]*)",
               "extrarelationships":[
                    {
                      "name":"Topic",
                           "regex":"Topic: (.*)
 \\(Read"
                     }
                ]
              }
        ]
        }
The following configuration parameters
can currently be specified:
   onion: The hostname of the service to
configure scanning for. This should be just
the hostname, and have no http:// prefix
or path components.
   base: configures the base path, relative
to the root of the site (to ignore all other
parts of a site and focus on a specific set
of URLs e.g. /forums)
     exclude: tells the scanner to ignore
URLs which contain one or more of the
given strings - this allows explicitly
ignoring uninteresting URLs (e.g. /profile
or /settings) and also for avoiding URLs
which might mess up the scan (e.g.
/logout)
      relationships: configures OnionScan
with custom relationships. Like many
preconfigured relationships, these are
specified by a trigger URL regular
expression.    The     triggeridentifierregex
must specify 1 group that contains the
Identifer of the relationships. This will be
stored in OnionScan as a relationship
mapping Onion->Identifier and given the
from attribute of the relationship name.
    For example, in the above structure,
two relationships are defined User and
Post.
         For User the trigger regex is
index\.php\?action=profile;u=([0-9]*)
which when found will store an identifier
marking the users profile ID as the
identifier.
   User also specifies two sub-relationships
Name and Position which are specified by
different regular expressions that should
be found on the same page as the one
identified by the trigger URL. These will be
stored by OnionScan as Onion->Name and
Onion->Postion with the from attribute set
to the originally captured ID.