<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title> on Jason Wilder&#39;s Blog</title>
        <generator uri="https://gohugo.io">Hugo</generator>
        <link>http://jasonwilder.com/</link>
        
        <language>en-us</language>  
        <updated>Mon, 13 Oct 2014 00:00:00 UTC</updated>
        
        <item>
            <title>A Simple Way To Dockerize Applications</title>
            <link>http://jasonwilder.com/blog/2014/10/13/a-simple-way-to-dockerize-applications/</link>
            <pubDate>Mon, 13 Oct 2014 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2014/10/13/a-simple-way-to-dockerize-applications/</guid>
            <description>

&lt;p&gt;Dockerizing an application is the process of converting an application to run within a Docker
container.  While dockerizing most applications is straight-forward, there are a few problems that
need to be worked around each time.&lt;/p&gt;

&lt;p&gt;Two common problems that occur during dockerization are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Making an application use environment variables when it relies on configuration files&lt;/li&gt;
&lt;li&gt;Sending application logs to STDOUT/STDERR when it defaults to files in the container&amp;rsquo;s file system&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This post introduces a new tool: &lt;code&gt;dockerize&lt;/code&gt; that simplifies these two common dockerization issues.&lt;/p&gt;

&lt;h2 id=&#34;the-problems:fd44463203ad86373665cf224be60a58&#34;&gt;The Problems&lt;/h2&gt;

&lt;h3 id=&#34;configuration:fd44463203ad86373665cf224be60a58&#34;&gt;Configuration&lt;/h3&gt;

&lt;p&gt;Many applications use configuration files to control how they work.  Different runtime environments
have different values for various sections of a file.  For example, database connection details
for a development environment would be different than a production environment.  Similarly, API
keys and other sensitive details would be different across environments.&lt;/p&gt;

&lt;p&gt;There are a few ways to handle these environmental differences with docker containers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Embed all environment details in the image and use a control environment variable to indicate
which file to use at run time. (e.g. &lt;code&gt;APP_CONFIG=/etc/dev.config&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Use volumes to bind mount the configuration data at run time&lt;/li&gt;
&lt;li&gt;Use wrapper scripts that modify configuration data with tools like &lt;code&gt;sed&lt;/code&gt; that environment variable&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Embedding all environment details is not ideal because environmental changes should not require a
rebuild of an image.  It&amp;rsquo;s also less secure since sensitive data such as API keys, login credentials,
etc.. for all environments are stored in the image.  Comprimising a development environment could
leak production details.  Having these kinds of details in any image should really be avoided.&lt;/p&gt;

&lt;p&gt;Using volumes keep these details out of the image but makes deployment more complicated since
you can no longer just deploy the image.  You must also coordinate configuration file changes
along with the image.&lt;/p&gt;

&lt;p&gt;Injecting environment variables into custom files is not always trivial as well.  You can sometimes
craft a &lt;code&gt;sed&lt;/code&gt; command or write some custom scripts to it but it&amp;rsquo;s repetitive work.  This does
produce an image that works well in a docker ecosystem though.&lt;/p&gt;

&lt;h3 id=&#34;logging:fd44463203ad86373665cf224be60a58&#34;&gt;Logging&lt;/h3&gt;

&lt;p&gt;Docker containers that log to STDOUT and STDERR are easier to troubleshoot, monitor and integrate
into a &lt;a href=&#34;http://jasonwilder.com/blog/2012/01/03/centralized-logging/&#34;&gt;centralized logging system&lt;/a&gt;.
Logs can be acessed directly with the &lt;code&gt;docker logs&lt;/code&gt; command
and through the docker logs API calls.  There are also many tools that can automatically pull docker logs and
ship them off if they log to STDOUT and STDERR.&lt;/p&gt;

&lt;p&gt;Unfortunately, many applications log to one or more files on the file system by default.  While
this can usually be &lt;a href=&#34;http://jasonwilder.com/blog/2014/03/17/docker-log-management-using-fluentd/&#34;&gt;worked around&lt;/a&gt;, it&amp;rsquo;s tedious to figure out the nuances of each applications
logging configuration.&lt;/p&gt;

&lt;h2 id=&#34;using-dockerize:fd44463203ad86373665cf224be60a58&#34;&gt;Using Dockerize&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;http://github.com/jwilder/dockerize&#34;&gt;dockerize&lt;/a&gt; is a small Golang application that simplifies
the dockerization process by:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generating configuration files using templates and the containers environment variables at startup&lt;/li&gt;
&lt;li&gt;Tailing arbitrary log files to STDOUT and STDERR&lt;/li&gt;
&lt;li&gt;Starting a process to run within the container&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&#34;an-example:fd44463203ad86373665cf224be60a58&#34;&gt;An Example&lt;/h3&gt;

&lt;p&gt;To demonstrate how it works, we&amp;rsquo;ll walk through dockerizing a generic nginx container with &lt;code&gt;dockerize&lt;/code&gt;.
We start with:&lt;/p&gt;

&lt;p&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre&gt;&lt;span class=&#34;k&#34;&gt;FROM&lt;/span&gt;&lt;span class=&#34;s&#34;&gt; ubuntu:14.04&lt;/span&gt;

&lt;span class=&#34;c&#34;&gt;# Install Nginx.&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;RUN&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;echo&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;quot;deb http://ppa.launchpad.net/nginx/stable/ubuntu trusty main&amp;quot;&lt;/span&gt; &amp;gt; /etc/apt/sources.list.d/nginx-stable-trusty.list
&lt;span class=&#34;k&#34;&gt;RUN&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;echo&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;quot;deb-src http://ppa.launchpad.net/nginx/stable/ubuntu trusty main&amp;quot;&lt;/span&gt; &amp;gt;&amp;gt; /etc/apt/sources.list.d/nginx-stable-trusty.list
&lt;span class=&#34;k&#34;&gt;RUN&lt;/span&gt; apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C300EE8C
&lt;span class=&#34;k&#34;&gt;RUN&lt;/span&gt; apt-get update
&lt;span class=&#34;k&#34;&gt;RUN&lt;/span&gt; apt-get install -y nginx

&lt;span class=&#34;k&#34;&gt;RUN&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;echo&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;quot;daemon off;&amp;quot;&lt;/span&gt; &amp;gt;&amp;gt; /etc/nginx/nginx.conf

&lt;span class=&#34;k&#34;&gt;EXPOSE&lt;/span&gt;&lt;span class=&#34;s&#34;&gt; 80&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;CMD&lt;/span&gt;&lt;span class=&#34;s&#34;&gt; nginx&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/p&gt;

&lt;p&gt;Next we&amp;rsquo;ll install &lt;code&gt;dockerize&lt;/code&gt; and run &lt;code&gt;nginx&lt;/code&gt; through it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;FROM ubuntu:14.04

# Install Nginx.
RUN echo &amp;quot;deb http://ppa.launchpad.net/nginx/stable/ubuntu trusty main&amp;quot; &amp;gt; /etc/apt/sources.list.d/nginx-stable-trusty.list
RUN echo &amp;quot;deb-src http://ppa.launchpad.net/nginx/stable/ubuntu trusty main&amp;quot; &amp;gt;&amp;gt; /etc/apt/sources.list.d/nginx-stable-trusty.list
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C300EE8C
RUN apt-get update
RUN apt-get install -y wget nginx

RUN echo &amp;quot;daemon off;&amp;quot; &amp;gt;&amp;gt; /etc/nginx/nginx.conf

RUN wget https://github.com/jwilder/dockerize/releases/download/v0.0.1/dockerize-linux-amd64-v0.0.1.tar.gz
RUN tar -C /usr/local/bin -xvzf dockerize-linux-amd64-v0.0.1.tar.gz

ADD dockerize /usr/local/bin/dockerize

EXPOSE 80

CMD dockerize nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Nginx logs to two different log files under &lt;code&gt;/var/log/nginx&lt;/code&gt; by default. It would be nice have the nginx access
and error log streamed to the console if your run this container interactively or if you
&lt;code&gt;docker logs nginx&lt;/code&gt; so you can see what&amp;rsquo;s happening.&lt;/p&gt;

&lt;p&gt;We can fix that by passing &lt;code&gt;-stdout &amp;lt;file&amp;gt;&lt;/code&gt; and &lt;code&gt;-stderr &amp;lt;file&amp;gt;&lt;/code&gt; as command-line
options.  These can also be passed multiple times if there are several files to tail.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CMD dockerize -stdout /var/log/nginx/access.log -stderr /var/log/nginx/error.log nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now when you run the container, nginx logs are available via &lt;code&gt;docker logs nginx&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To demonstrate the templating, we&amp;rsquo;ll make this a into a more generic proxy server than can be
configured using environment variables.  We&amp;rsquo;ll define
the environment variable &lt;code&gt;PROXY_URL&lt;/code&gt; to be a URL of a site to proxy.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;PROXY_URL=&amp;quot;http://jasonwilder.com&amp;quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When the container is started with this variable, &lt;code&gt;dockerize&lt;/code&gt; will use it to generate an nginx server
location path.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s the template:&lt;/p&gt;

&lt;p&gt;{% raw %}&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;server {
    listen 80 default_server;
    listen [::]:80 default_server ipv6only=on;

    root /usr/share/nginx/html;
    index index.html index.htm;

    # Make site accessible from http://localhost/
    server_name localhost;

    location / {
      access_log off;
      proxy_pass {{ .Env.PROXY_URL }};
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;{% endraw %}&lt;/p&gt;

&lt;p&gt;Then our final Dockerfile would look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;FROM ubuntu:14.04

# Install Nginx.
RUN echo &amp;quot;deb http://ppa.launchpad.net/nginx/stable/ubuntu trusty main&amp;quot; &amp;gt; /etc/apt/sources.list.d/nginx-stable-trusty.list
RUN echo &amp;quot;deb-src http://ppa.launchpad.net/nginx/stable/ubuntu trusty main&amp;quot; &amp;gt;&amp;gt; /etc/apt/sources.list.d/nginx-stable-trusty.list
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C300EE8C
RUN apt-get update
RUN apt-get install -y wget nginx

RUN echo &amp;quot;daemon off;&amp;quot; &amp;gt;&amp;gt; /etc/nginx/nginx.conf

RUN wget https://github.com/jwilder/dockerize/releases/download/v0.0.1/dockerize-linux-amd64-v0.0.1.tar.gz
RUN tar -C /usr/local/bin -xvzf dockerize-linux-amd64-v0.0.1.tar.gz

ADD default.tmpl /etc/nginx/sites-available/default.tmpl

EXPOSE 80

CMD dockerize -template /etc/nginx/sites-available/default.tmpl:/etc/nginx/sites-available/default -stdout /var/log/nginx/access.log -stderr /var/log/nginx/error.log nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;-template &amp;lt;src&amp;gt;:&amp;lt;dest&amp;gt;&lt;/code&gt; options indicates that the template at &lt;code&gt;/etc/nginx/sites-available/default.tmpl&lt;/code&gt;
should be generated and written to &lt;code&gt;/etc/nginx/sites-available/default&lt;/code&gt;. Multiple templates can be
specified as well.&lt;/p&gt;

&lt;p&gt;Run this container with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker run -p 80:80 -e PROXY_URL=&amp;quot;http://jasonwilder.com&amp;quot; --name nginx -d nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can then access &lt;code&gt;http://localhost&lt;/code&gt; and it will proxy to this site.&lt;/p&gt;

&lt;p&gt;This is a simplistic example but it can easily be extended using the embedded &lt;code&gt;split&lt;/code&gt; function
and &lt;code&gt;range&lt;/code&gt; statement to handle multiple proxy values or other options.  There are also a
few other &lt;a href=&#34;https://github.com/jwilder/dockerize#using-templates&#34;&gt;template functions&lt;/a&gt; available.&lt;/p&gt;

&lt;h2 id=&#34;conclusion:fd44463203ad86373665cf224be60a58&#34;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;While this example is somewhat simplistic, many applications need some shims to make them run well
within docker.  &lt;code&gt;dockerize&lt;/code&gt; is a generic utility to help with this process.&lt;/p&gt;

&lt;p&gt;You can find the code at &lt;a href=&#34;http://github.com/jwilder/dockerize&#34;&gt;jwilder/dockerize&lt;/a&gt;.&lt;/p&gt;
</description>
        </item>
        
        <item>
            <title>Projects</title>
            <link>http://jasonwilder.com/projects/</link>
            <pubDate>Fri, 10 Oct 2014 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/projects/</guid>
            <description>&lt;p&gt;Some projects I develop and maintain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://github.com/jwilder/docker-gen&#34;&gt;docker-gen&lt;/a&gt; - Generate configuration files, scripts and run commands when docker containers are started and stopped.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://github.com/jwilder/nginx-proxy&#34;&gt;nginx-proxy&lt;/a&gt; - Automated virtual host proxying using nginx for Docker containers.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://github.com/jwilder/docker-squash&#34;&gt;docker-squash&lt;/a&gt; - Squash docker images to make them smaller.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://github.com/jwilder/docker-discover&#34;&gt;docker-discover&lt;/a&gt; &amp;amp; &lt;a href=&#34;http://github.com/jwilder/docker-register&#34;&gt;docker-register&lt;/a&gt; - Service registration and discovery containers for docker using Etcd and Haproxy.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://github.com/jwilder/dockerize&#34;&gt;dockerize&lt;/a&gt; - A simple way to dockerize applications.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://github.com/jwilder/gofana&#34;&gt;gofana&lt;/a&gt; - Self-contained grafana server supporting HTTPS, Auth, and dashboard storage.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://github.com/influxdb/influxdb&#34;&gt;influxdb&lt;/a&gt; - Distributed time-series database.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://github.com/litl/galaxy&#34;&gt;galayx&lt;/a&gt; - Docker microservice PaaS.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://github.com/litl/shuttle&#34;&gt;shuttle&lt;/a&gt; - Dynamic HTTP(S)/TCP/UDP Proxy&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://github.com/jwilder/mongodb-tools&#34;&gt;mongodb-tools&lt;/a&gt; - Tools for analyzing mongo DB indexes and collections&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        
        <item>
            <title>Squashing Docker Images</title>
            <link>http://jasonwilder.com/blog/2014/08/19/squashing-docker-images/</link>
            <pubDate>Tue, 19 Aug 2014 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2014/08/19/squashing-docker-images/</guid>
            <description>

&lt;p&gt;A common problem when building docker images is that they can get big quickly.
A base image can be a tens to hundreds of MB in size.  Installing a
few packages and running a build can easily create a image that is a 1GB or larger.  If you
build an application in your container, build artifacts can stick around and end up getting deployed.&lt;/p&gt;

&lt;p&gt;Large images are problematic when you start publishing images to a registry.  More layers creates
more requests and larger layers take longer to transfer.  Unfortunately, deleting things
in later layers does not actually remove them from the image due to the way AUFS layers work.&lt;/p&gt;

&lt;p&gt;There are a few options to address this problem but this post will show you
how you can squash your images to make them smaller without requiring big changes to your
development and deployment workflow.&lt;/p&gt;

&lt;h2 id=&#34;other-solutions:b12568c4a66b575663392dd8a145700e&#34;&gt;Other Solutions&lt;/h2&gt;

&lt;p&gt;Other people have written about this problem as well and have tried different solutions to the
problem.&lt;/p&gt;

&lt;h3 id=&#34;using-small-base-images:b12568c4a66b575663392dd8a145700e&#34;&gt;Using Small Base Images&lt;/h3&gt;

&lt;p&gt;A few strategies rely on starting with very small base images.  For example, you could use &lt;a href=&#34;http://blog.docker.com/2013/06/create-light-weight-docker-containers-buildroot/&#34;&gt;buildroot&lt;/a&gt;
and craft a barebones image.  There is also the very small &lt;a href=&#34;http://blog.xebia.com/2014/07/04/create-the-smallest-possible-docker-container/&#34;&gt;scratch base image&lt;/a&gt; that could be
a starting point.  Another strategy is to install a binary package
into your container.  If you&amp;rsquo;re using Go, building &lt;a href=&#34;https://medium.com/@kelseyhightower/optimizing-docker-images-for-static-binaries-b5696e26eb07&#34;&gt;static binaries&lt;/a&gt; might
be an option.&lt;/p&gt;

&lt;p&gt;These options are pretty sophisticated and might work for you but they may not fit your development
workflow easily.  While deploying static binaries would work well for Go projects, it can be complicated
if you have Python, Node or Ruby projects that may just wrap C libraries.&lt;/p&gt;

&lt;h3 id=&#34;publishing-tools:b12568c4a66b575663392dd8a145700e&#34;&gt;Publishing Tools&lt;/h3&gt;

&lt;p&gt;Another set of options out there are separate tools for modifying and creating new images from existing images.  There is a &lt;a href=&#34;https://gist.github.com/vieux/6156567&#34;&gt;python script&lt;/a&gt;, &lt;a href=&#34;https://github.com/dqminh/docker-flatten&#34;&gt;docker-flatten&lt;/a&gt;, &lt;a href=&#34;http://3ofcoins.net/2013/09/22/flat-docker-images/&#34;&gt;docker-compile.pl&lt;/a&gt; and &lt;a href=&#34;https://github.com/3ofcoins/docker-images/blob/master/script/docker-rebase.rb&#34;&gt;docker-rebase.rb&lt;/a&gt; as well as just runing &lt;code&gt;docker export &amp;lt;id&amp;gt; | docker import -&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Unfortunately, I wasn&amp;rsquo;t able to get any of these tools to work and the &lt;code&gt;docker export&lt;/code&gt; trick loses Dockerfile attributes such as &lt;code&gt;PORT&lt;/code&gt;, &lt;code&gt;VOLUMES&lt;/code&gt;, etc. which causes other problems.&lt;/p&gt;

&lt;h3 id=&#34;official-support:b12568c4a66b575663392dd8a145700e&#34;&gt;Official Support&lt;/h3&gt;

&lt;p&gt;Fortunately, docker appears to be aware of the problem with large images so hopefully this problem won&amp;rsquo;t require custom tools to solve.  There is already a &lt;code&gt;docker squash&lt;/code&gt; pull request (&lt;a href=&#34;https://github.com/docker/docker/pull/4232&#34;&gt;4232&lt;/a&gt;), a Dockerfile syntax change proposal (&lt;a href=&#34;https://github.com/docker/docker/issues/7115&#34;&gt;7115&lt;/a&gt;), a squash build dependencies proposal (&lt;a href=&#34;https://github.com/docker/docker/issues/7115&#34;&gt;6906&lt;/a&gt;) as well as a flatten images proposal (&lt;a href=&#34;https://github.com/docker/docker/issues/332&#34;&gt;332&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Hopefully one or more of these will address the problem in the future.&lt;/p&gt;

&lt;h2 id=&#34;docker-squash:b12568c4a66b575663392dd8a145700e&#34;&gt;docker-squash&lt;/h2&gt;

&lt;p&gt;Since this is a problem that affects me currently, I created a tool to squash images before pushing them
to a registry.  &lt;code&gt;docker-squash&lt;/code&gt;is a standalone Go application that works similarly to the idea described in &lt;a href=&#34;https://github.com/docker/docker/issues/332&#34;&gt;332&lt;/a&gt;.  It&amp;rsquo;s intended to be used as a publishing tool in your workflow
and would be run before pushing to a registry.&lt;/p&gt;

&lt;p&gt;The way it works is that you save, squash and load an image with something like &lt;code&gt;docker save &amp;lt;ID&amp;gt; | docker-squash -t &amp;lt;TAG&amp;gt; [-from &amp;lt;ID&amp;gt;] | docker load&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The resulting image has all of the layers beneath the initial &lt;code&gt;FROM&lt;/code&gt; layer squashed into a single layer.  The other layers defining &lt;code&gt;PORT&lt;/code&gt;, etc.. are retained as well.&lt;/p&gt;

&lt;p&gt;The default options retains the base image layer so that it does not need to be repeatedly transferred
when pushing and pulling updates to the image.&lt;/p&gt;

&lt;h2 id=&#34;example:b12568c4a66b575663392dd8a145700e&#34;&gt;Example&lt;/h2&gt;

&lt;p&gt;I have a simple Go test image called &lt;a href=&#34;https://github.com/jwilder/whoami&#34;&gt;jwilder/whoami&lt;/a&gt; that I&amp;rsquo;ll
use as an example.  When you run it, it listens on a port 8080 and returns the hostname of the
container over HTTP.&lt;/p&gt;

&lt;h3 id=&#34;starting-image:b12568c4a66b575663392dd8a145700e&#34;&gt;Starting Image&lt;/h3&gt;

&lt;p&gt;Viewing it&amp;rsquo;s history shows that it&amp;rsquo;s pretty big (&lt;strong&gt;423.7MB&lt;/strong&gt;) for just simple 20 line Go app.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker images jwilder/whoami
REPOSITORY          TAG                 IMAGE ID            CREATED              VIRTUAL SIZE
jwilder/whoami      latest              63e174c2ca3d        29 minutes ago       423.7 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Viewing the history shows how the size is broken down between the layers:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker history jwilder/whoami:latest
IMAGE               CREATED             CREATED BY                                      SIZE
63e174c2ca3d        14 seconds ago      /bin/sh -c #(nop) CMD [/app/http]               0 B
e4eea4411c00        14 seconds ago      /bin/sh -c #(nop) EXPOSE map[8000/tcp:{}]       0 B
c50f2b65cab3        14 seconds ago      /bin/sh -c #(nop) ENV PORT=8000                 0 B
589338fba5eb        15 seconds ago      /bin/sh -c go build -o http                     7.031 MB
651626d6e364        15 seconds ago      /bin/sh -c #(nop) WORKDIR /app                  0 B
8dfc0bb00563        16 seconds ago      /bin/sh -c #(nop) ADD dir:78239d85b32dd28e4cb   21.8 kB
fc294d2b22cb        17 seconds ago      /bin/sh -c apt-get update &amp;amp;&amp;amp; apt-get install    191.3 MB
c4ff7513909d        3 days ago          /bin/sh -c #(nop) CMD [/bin/bash]               0 B
cc58e55aa5a5        3 days ago          /bin/sh -c apt-get update &amp;amp;&amp;amp; apt-get dist-upg   32.67 MB
0ea0d582fd90        3 days ago          /bin/sh -c sed -i &#39;s/^#\s*\(deb.*universe\)$/   1.895 kB
d92c3c92fa73        3 days ago          /bin/sh -c rm -rf /var/lib/apt/lists/*          0 B
9942dd43ff21        3 days ago          /bin/sh -c echo &#39;#!/bin/sh&#39; &amp;gt; /usr/sbin/polic   194.5 kB
1c9383292a8f        3 days ago          /bin/sh -c #(nop) ADD file:c1472c26527df28498   192.5 MB
511136ea3c5a        14 months ago
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Breaking this down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;225.4MB - &lt;code&gt;511136ea3c5a&lt;/code&gt;..&lt;code&gt;c4ff7513909d&lt;/code&gt; is the &lt;code&gt;ubuntu:14.04&lt;/code&gt; base image.&lt;/li&gt;
&lt;li&gt;191.3MB - &lt;code&gt;fc294d2b22cb&lt;/code&gt; is installing the Go SDK (&lt;code&gt;golang-go&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;7MB - &lt;code&gt;589338fba5eb&lt;/code&gt; builds my app&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&#34;clean-up:b12568c4a66b575663392dd8a145700e&#34;&gt;Clean Up&lt;/h4&gt;

&lt;p&gt;I don&amp;rsquo;t want to ship the Go SDK with my image.  There is also a bunch of left-over &lt;code&gt;apt-get&lt;/code&gt; cache
data, as well as some extra packages I don&amp;rsquo;t need.  I&amp;rsquo;ll remove them in a new container.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker run -it jwilder/whoami:latest /bin/bash
root@5fe8a50718c3:/app# apt-get purge -y man  perl-modules vim-common vim-tiny \
&amp;gt; libpython3.4-stdlib:amd64 python3.4-minimal xkb-data \
&amp;gt; libx11-data eject python3 locales golang-go
...
$ root@5fe8a50718c3:/app# apt-get clean autoclean
$ root@5fe8a50718c3:/app# apt-get autoremove -y
$ root@5fe8a50718c3:/app# rm -rf /var/lib/{apt,dpkg,cache,log}/
$ root@5fe8a50718c3:/app# exit
&lt;/code&gt;&lt;/pre&gt;

&lt;h4 id=&#34;squash-the-image:b12568c4a66b575663392dd8a145700e&#34;&gt;Squash The Image&lt;/h4&gt;

&lt;p&gt;Next I need to create a image from that container:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker commit 5fe8a50718c3
49b5a7a88d5353fe77204ad5591a3ef100fc2807a9d6dce979fd1b17a73a68d6
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then I&amp;rsquo;ll save, squash and load it.  I&amp;rsquo;m tagging the new image with &lt;code&gt;-t jwilder/whoami:squash&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker save 49b5a7a88d5 | sudo docker-squash -t jwilder/whoami:squash | docker load
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you run &lt;code&gt;docker-squash&lt;/code&gt; with the &lt;code&gt;-verbose&lt;/code&gt; option, you can see what it&amp;rsquo;s actually doing
to the image.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker save 49b5a7a88d5 | sudo docker-squash -t jwilder/whoami:squash -verbose | docker load
Loading export from STDIN using /tmp/docker-squash683466637 for tempdir
Loaded image w/ 15 layers
Extracting layers...
  -  /tmp/docker-squash683466637/49b5a7a88d5353fe77204ad5591a3ef100fc2807a9d6dce979fd1b17a73a68d6/layer.tar
  -  /tmp/docker-squash683466637/651626d6e364ccc22ac990ba95cd0aab9256c56055087cc9a5a1790cea5250b9/layer.tar
  -  /tmp/docker-squash683466637/c50f2b65cab3b74f9bdb6f616b36f132b9a182ed883d03f11173e32fa39ab599/layer.tar
  -  /tmp/docker-squash683466637/d92c3c92fa73ba974eb409217bb86d8317b0727f42b73ef5a05153b729aaf96b/layer.tar
  -  /tmp/docker-squash683466637/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158/layer.tar
  -  /tmp/docker-squash683466637/1c9383292a8ff4c4196ff4ffa36e5ff24cb217606a8d1f471f4ad27c4690e290/layer.tar
  -  /tmp/docker-squash683466637/589338fba5eb5cc32a25a036975a5e0938f12eff0dc70b661363c13ef1a192a5/layer.tar
  -  /tmp/docker-squash683466637/63e174c2ca3d53e2b7639a440940e16e15c1970e6ad16f740ffdcc60e59e0324/layer.tar
  -  /tmp/docker-squash683466637/9942dd43ff211ba917d03637006a83934e847c003bef900e4808be8021dca7bd/layer.tar
  -  /tmp/docker-squash683466637/0ea0d582fd9027540c1f50c7f0149b237ed483d2b95ac8d107f9db5a912b4240/layer.tar
  -  /tmp/docker-squash683466637/8dfc0bb00563dab615dfcc28ab3e338089f5b1d71d82d731c18cbe9f7667435f/layer.tar
  -  /tmp/docker-squash683466637/c4ff7513909dedf4ddf3a450aea68cd817c42e698ebccf54755973576525c416/layer.tar
  -  /tmp/docker-squash683466637/cc58e55aa5a53b572f3b9009eb07e50989553b95a1545a27dcec830939892dba/layer.tar
  -  /tmp/docker-squash683466637/e4eea4411c0065f8b0c7cf6be31dd58daa5ac04d8c64d54537cbfce2eb8e3413/layer.tar
  -  /tmp/docker-squash683466637/fc294d2b22cb53cb2440ff6fece18813ee7363f5198f5e20346abfcf7cce54fe/layer.tar
Inserted new layer 27935276f797 after 1c9383292a8f
  -  511136ea3c5a
  -  1c9383292a8f /bin/sh -c #(nop) ADD file:c1472c26527df28498744f9e9e8a8304c
  -&amp;gt; 27935276f797 /bin/sh -c #(squash) from 1c9383292a8f
  -  9942dd43ff21 /bin/sh -c echo &#39;#!/bin/sh&#39; &amp;gt; /usr/sbin/policy-rc.d  &amp;amp;&amp;amp; echo
  -  d92c3c92fa73 /bin/sh -c rm -rf /var/lib/apt/lists/*
  -  0ea0d582fd90 /bin/sh -c sed -i &#39;s/^#\s*\(deb.*universe\)$/\1/g&#39; /etc/apt/
  -  cc58e55aa5a5 /bin/sh -c apt-get update &amp;amp;&amp;amp; apt-get dist-upgrade -y &amp;amp;&amp;amp; rm -
  -  c4ff7513909d /bin/sh -c #(nop) CMD [/bin/bash]
  -  fc294d2b22cb /bin/sh -c apt-get update &amp;amp;&amp;amp; apt-get install -y golang-go
  -  8dfc0bb00563 /bin/sh -c #(nop) ADD dir:78239d85b32dd28e4cb1d81ace7ffd32b8
  -  651626d6e364 /bin/sh -c #(nop) WORKDIR /app
  -  589338fba5eb /bin/sh -c go build -o http
  -  c50f2b65cab3 /bin/sh -c #(nop) ENV PORT=8000
  -  e4eea4411c00 /bin/sh -c #(nop) EXPOSE map[8000/tcp:{}]
  -  63e174c2ca3d /bin/sh -c #(nop) CMD [/app/http]
  -  49b5a7a88d53 /bin/bash
Squashing from 27935276f797 into 27935276f797
  -  Deleting whiteouts
  -  Rewriting child history
  -  Removing 9942dd43ff21. Squashed. (/bin/sh -c echo &#39;#!/bin/sh&#39; &amp;gt; /usr/sbin/policy-...)
  -  Removing d92c3c92fa73. Squashed. (/bin/sh -c rm -rf /var/lib/apt/lists/*)
  -  Removing 0ea0d582fd90. Squashed. (/bin/sh -c sed -i &#39;s/^#\s*\(deb.*universe\)$/\1...)
  -  Removing cc58e55aa5a5. Squashed. (/bin/sh -c apt-get update &amp;amp;&amp;amp; apt-get dist-upgra...)
  -  Replacing c4ff7513909d w/ new layer 72391e640b52 (/bin/sh -c #(nop) CMD [/bin/bash])
  -  Removing fc294d2b22cb. Squashed. (/bin/sh -c apt-get update &amp;amp;&amp;amp; apt-get install -y...)
  -  Removing 8dfc0bb00563. Squashed. (/bin/sh -c #(nop) ADD dir:78239d85b32dd28e4cb1d...)
  -  Replacing 651626d6e364 w/ new layer bd7b4b11874a (/bin/sh -c #(nop) WORKDIR /app)
  -  Removing 589338fba5eb. Squashed. (/bin/sh -c go build -o http)
  -  Replacing c50f2b65cab3 w/ new layer e4af8871b961 (/bin/sh -c #(nop) ENV PORT=8000)
  -  Replacing e4eea4411c00 w/ new layer 6803497b6a61 (/bin/sh -c #(nop) EXPOSE map[8000/tcp:{}])
  -  Replacing 63e174c2ca3d w/ new layer 40b8c7c33bba (/bin/sh -c #(nop) CMD [/app/http])
  -  Removing 49b5a7a88d53. Squashed. (/bin/bash)
Tarring up squashed layer 27935276f797
Removing extracted layers
Tagging 40b8c7c33bba as jwilder/whoami:squash
Tarring new image to STDOUT
Done. New image created.
  -  40b8c7c33bba Less than a second /bin/sh -c #(nop) CMD [/app/http] 3.072 kB
  -  6803497b6a61 Less than a second /bin/sh -c #(nop) EXPOSE map[8000/tcp:{}] 3.072 kB
  -  e4af8871b961 Less than a second /bin/sh -c #(nop) ENV PORT=8000 3.072 kB
  -  bd7b4b11874a Less than a second /bin/sh -c #(nop) WORKDIR /app 3.072 kB
  -  72391e640b52 Less than a second /bin/sh -c #(nop) CMD [/bin/bash] 3.072 kB
  -  27935276f797 1 seconds /bin/sh -c #(squash) from 1c9383292a8f 39.49 MB
  -  1c9383292a8f 3 days /bin/sh -c #(nop) ADD file:c1472c26527df28498744f9e9e8a83... 201.6 MB
  -  511136ea3c5a 14 months  1.536 kB
Removing tempdir /tmp/docker-squash683466637
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;My squashed layer is down from &lt;strong&gt;~198MB&lt;/strong&gt; to &lt;strong&gt;39.5MB&lt;/strong&gt;. Roughly &lt;strong&gt;80%&lt;/strong&gt; smaller.  I should be able to get
it down to &lt;strong&gt;~7MB&lt;/strong&gt; if I squash some of the apt-get updates my build pulled in with the upstream
ubuntu:14.04 base image and use a custom base image.&lt;/p&gt;

&lt;p&gt;If I was to create a custom base image, I would squash that entire image down to a single layer using
&lt;code&gt;-from root&lt;/code&gt; and update my Dockerfile to use it as the &lt;code&gt;FROM&lt;/code&gt; image.&lt;/p&gt;

&lt;p&gt;This is what &lt;code&gt;-from root&lt;/code&gt; looks like with my example images:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker save 49b5a7a88d5 | sudo docker-squash -t jwilder/whoami:squash -verbose -from root | docker load
Loading export from STDIN using /tmp/docker-squash627981871 for tempdir
Loaded image w/ 15 layers
Extracting layers...
  -  /tmp/docker-squash627981871/d92c3c92fa73ba974eb409217bb86d8317b0727f42b73ef5a05153b729aaf96b/layer.tar
  -  /tmp/docker-squash627981871/cc58e55aa5a53b572f3b9009eb07e50989553b95a1545a27dcec830939892dba/layer.tar
  -  /tmp/docker-squash627981871/1c9383292a8ff4c4196ff4ffa36e5ff24cb217606a8d1f471f4ad27c4690e290/layer.tar
  -  /tmp/docker-squash627981871/63e174c2ca3d53e2b7639a440940e16e15c1970e6ad16f740ffdcc60e59e0324/layer.tar
  -  /tmp/docker-squash627981871/8dfc0bb00563dab615dfcc28ab3e338089f5b1d71d82d731c18cbe9f7667435f/layer.tar
  -  /tmp/docker-squash627981871/c4ff7513909dedf4ddf3a450aea68cd817c42e698ebccf54755973576525c416/layer.tar
  -  /tmp/docker-squash627981871/0ea0d582fd9027540c1f50c7f0149b237ed483d2b95ac8d107f9db5a912b4240/layer.tar
  -  /tmp/docker-squash627981871/9942dd43ff211ba917d03637006a83934e847c003bef900e4808be8021dca7bd/layer.tar
  -  /tmp/docker-squash627981871/c50f2b65cab3b74f9bdb6f616b36f132b9a182ed883d03f11173e32fa39ab599/layer.tar
  -  /tmp/docker-squash627981871/49b5a7a88d5353fe77204ad5591a3ef100fc2807a9d6dce979fd1b17a73a68d6/layer.tar
  -  /tmp/docker-squash627981871/589338fba5eb5cc32a25a036975a5e0938f12eff0dc70b661363c13ef1a192a5/layer.tar
  -  /tmp/docker-squash627981871/651626d6e364ccc22ac990ba95cd0aab9256c56055087cc9a5a1790cea5250b9/layer.tar
  -  /tmp/docker-squash627981871/e4eea4411c0065f8b0c7cf6be31dd58daa5ac04d8c64d54537cbfce2eb8e3413/layer.tar
  -  /tmp/docker-squash627981871/fc294d2b22cb53cb2440ff6fece18813ee7363f5198f5e20346abfcf7cce54fe/layer.tar
  -  /tmp/docker-squash627981871/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158/layer.tar
Inserted new layer 6996f41c1688 after 511136ea3c5a
  -  511136ea3c5a
  -&amp;gt; 6996f41c1688 /bin/sh -c #(squash) from 511136ea3c5a
  -  1c9383292a8f /bin/sh -c #(nop) ADD file:c1472c26527df28498744f9e9e8a8304c
  -  9942dd43ff21 /bin/sh -c echo &#39;#!/bin/sh&#39; &amp;gt; /usr/sbin/policy-rc.d  &amp;amp;&amp;amp; echo
  -  d92c3c92fa73 /bin/sh -c rm -rf /var/lib/apt/lists/*
  -  0ea0d582fd90 /bin/sh -c sed -i &#39;s/^#\s*\(deb.*universe\)$/\1/g&#39; /etc/apt/
  -  cc58e55aa5a5 /bin/sh -c apt-get update &amp;amp;&amp;amp; apt-get dist-upgrade -y &amp;amp;&amp;amp; rm -
  -  c4ff7513909d /bin/sh -c #(nop) CMD [/bin/bash]
  -  fc294d2b22cb /bin/sh -c apt-get update &amp;amp;&amp;amp; apt-get install -y golang-go
  -  8dfc0bb00563 /bin/sh -c #(nop) ADD dir:78239d85b32dd28e4cb1d81ace7ffd32b8
  -  651626d6e364 /bin/sh -c #(nop) WORKDIR /app
  -  589338fba5eb /bin/sh -c go build -o http
  -  c50f2b65cab3 /bin/sh -c #(nop) ENV PORT=8000
  -  e4eea4411c00 /bin/sh -c #(nop) EXPOSE map[8000/tcp:{}]
  -  63e174c2ca3d /bin/sh -c #(nop) CMD [/app/http]
  -  49b5a7a88d53 /bin/bash
Squashing from 6996f41c1688 into 6996f41c1688
  -  Deleting whiteouts
  -  Rewriting child history
  -  Removing 1c9383292a8f. Squashed. (/bin/sh -c #(nop) ADD file:c1472c26527df2849874...)
  -  Removing 9942dd43ff21. Squashed. (/bin/sh -c echo &#39;#!/bin/sh&#39; &amp;gt; /usr/sbin/policy-...)
  -  Removing d92c3c92fa73. Squashed. (/bin/sh -c rm -rf /var/lib/apt/lists/*)
  -  Removing 0ea0d582fd90. Squashed. (/bin/sh -c sed -i &#39;s/^#\s*\(deb.*universe\)$/\1...)
  -  Removing cc58e55aa5a5. Squashed. (/bin/sh -c apt-get update &amp;amp;&amp;amp; apt-get dist-upgra...)
  -  Replacing c4ff7513909d w/ new layer 09a62007c3f3 (/bin/sh -c #(nop) CMD [/bin/bash])
  -  Removing fc294d2b22cb. Squashed. (/bin/sh -c apt-get update &amp;amp;&amp;amp; apt-get install -y...)
  -  Removing 8dfc0bb00563. Squashed. (/bin/sh -c #(nop) ADD dir:78239d85b32dd28e4cb1d...)
  -  Replacing 651626d6e364 w/ new layer b4f0dec85412 (/bin/sh -c #(nop) WORKDIR /app)
  -  Removing 589338fba5eb. Squashed. (/bin/sh -c go build -o http)
  -  Replacing c50f2b65cab3 w/ new layer cd499c2d09ef (/bin/sh -c #(nop) ENV PORT=8000)
  -  Replacing e4eea4411c00 w/ new layer 653dfab45562 (/bin/sh -c #(nop) EXPOSE map[8000/tcp:{}])
  -  Replacing 63e174c2ca3d w/ new layer f7f7eb6aae54 (/bin/sh -c #(nop) CMD [/app/http])
  -  Removing 49b5a7a88d53. Squashed. (/bin/bash)
Tarring up squashed layer 6996f41c1688
Removing extracted layers
Tagging f7f7eb6aae54 as jwilder/whoami:squash
Tarring new image to STDOUT
Done. New image created.
  -  f7f7eb6aae54 Less than a second /bin/sh -c #(nop) CMD [/app/http] 3.072 kB
  -  653dfab45562 Less than a second /bin/sh -c #(nop) EXPOSE map[8000/tcp:{}] 3.072 kB
  -  cd499c2d09ef Less than a second /bin/sh -c #(nop) ENV PORT=8000 3.072 kB
  -  b4f0dec85412 Less than a second /bin/sh -c #(nop) WORKDIR /app 3.072 kB
  -  09a62007c3f3 Less than a second /bin/sh -c #(nop) CMD [/bin/bash] 3.072 kB
  -  6996f41c1688 2 seconds /bin/sh -c #(squash) from 511136ea3c5a 111.9 MB
  -  511136ea3c5a 14 months  1.536 kB
Removing tempdir /tmp/docker-squash627981871
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That gets my full build down to a single layer of &lt;strong&gt;106.2MB&lt;/strong&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;REPOSITORY          TAG                 IMAGE ID            CREATED              VIRTUAL SIZE
jwilder/whoami      squash              f7f7eb6aae54        About a minute ago   106.2 MB
jwilder/whoami      latest              63e174c2ca3d        29 minutes ago       423.7 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Since I typically have base images already loaded onto my docker hosts, I would continue to use the
default squash settings and retain my parent base image so that I&amp;rsquo;m only tranferring the changes
for each image.&lt;/p&gt;

&lt;h2 id=&#34;conclusion:b12568c4a66b575663392dd8a145700e&#34;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Squashing images with &lt;code&gt;docker-squash&lt;/code&gt; can reduce image sizes significantly. Since it only needs to be run before
publishing to a registry, the regular docker build caching is not changed and you don&amp;rsquo;t lose any
of the benefits of using Docker.  Similarly, it does not require complex Dockerfile setups to get
good results so you can start using it on existing projects with little effort.&lt;/p&gt;

&lt;p&gt;If you want to try it out or learn more about it works, you can get it from
&lt;a href=&#34;https://github.com/jwilder/docker-squash&#34;&gt;github&lt;/a&gt;.&lt;/p&gt;
</description>
        </item>
        
        <item>
            <title>Docker Service Discovery Using Etcd and Haproxy</title>
            <link>http://jasonwilder.com/blog/2014/07/15/docker-service-discovery/</link>
            <pubDate>Tue, 15 Jul 2014 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2014/07/15/docker-service-discovery/</guid>
            <description>

&lt;p&gt;In a previous post, I showed a way to create an &lt;a href=&#34;http://jasonwilder.com/blog/2014/03/25/automated-nginx-reverse-proxy-for-docker/&#34;&gt;automated nginx reverse proxy&lt;/a&gt; for docker containers running on the same host.  That setup works fine for front-end web apps, but is not ideal for backend services since they are typically spread across multiple hosts.&lt;/p&gt;

&lt;p&gt;This post describes a solution to the backend service problem using service discovery for docker containers.&lt;/p&gt;

&lt;p&gt;The architecture we&amp;rsquo;ll build is modelled after &lt;a href=&#34;http://nerds.airbnb.com/smartstack-service-discovery-cloud/&#34;&gt;SmartStack&lt;/a&gt;, but uses &lt;a href=&#34;http://coreos.com/using-coreos/etcd/&#34;&gt;etcd&lt;/a&gt; instead &lt;a href=&#34;http://zookeeper.apache.org/&#34;&gt;Zookeeper&lt;/a&gt; and two docker containers running &lt;a href=&#34;https://github.com/jwilder/docker-gen&#34;&gt;docker-gen&lt;/a&gt; and &lt;a href=&#34;http://www.haproxy.org/&#34;&gt;haproxy&lt;/a&gt; instead of &lt;a href=&#34;https://github.com/airbnb/nerve&#34;&gt;nerve&lt;/a&gt; and
&lt;a href=&#34;https://github.com/airbnb/synapse&#34;&gt;synapse&lt;/a&gt; .&lt;/p&gt;

&lt;h2 id=&#34;how-it-works:5152b7114a679e7779eb00e9843508ec&#34;&gt;How It Works&lt;/h2&gt;

&lt;p&gt;&lt;img src=&#34;http://jasonwilder.com/images/docker-service-discovery.png&#34; alt=&#34;Docker Service Discovery&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Similar to SmartStack, we have components to serve as a registry (etcd), a registration side-kick process (docker-register), discovery side-kick process (docker-discover), some backend services (whoami) and finally a consumers (ubuntu/curl).&lt;/p&gt;

&lt;p&gt;The registration and discovery components work as appliances alongside the the application containers so there is no embedded registration or discovery code in the backend or consumer containers.  They just listen on ports or connect to other local ports.&lt;/p&gt;

&lt;h2 id=&#34;service-registry-etcd:5152b7114a679e7779eb00e9843508ec&#34;&gt;Service Registry - Etcd&lt;/h2&gt;

&lt;p&gt;Before anything can be registered, we need some place to track registration entries (i.e. IP and ports of services).  We&amp;rsquo;re using etcd because it has a simple programming model for service registration and supports TTLs for keys and directories.&lt;/p&gt;

&lt;p&gt;Usually, you would run 3 or 5 etcd nodes but I&amp;rsquo;m just using one to keep things simple.&lt;/p&gt;

&lt;p&gt;There is no reason why we could not use &lt;a href=&#34;http://consul.io&#34;&gt;Consul&lt;/a&gt; or any other storage option that supports TTL expiration.&lt;/p&gt;

&lt;p&gt;To start etcd:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker run -d --name etcd -p 4001:4001 -p 7001:7001 coreos/etcd
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;service-registration-docker-register:5152b7114a679e7779eb00e9843508ec&#34;&gt;Service Registration - docker-register&lt;/h2&gt;

&lt;p&gt;Registering service containers is handled by the &lt;a href=&#34;https://registry.hub.docker.com/u/jwilder/docker-register/&#34;&gt;jwilder/docker-register&lt;/a&gt; container.  This container registers other containers running on the same host in etcd.
Containers we want registered must expose a port.  Containers running the same image on different hosts are grouped together in etcd and will form
a load-balanced cluster.  How containers are groups is somewhat arbitrary and I&amp;rsquo;ve chosen the container image name for this walkthrough.  In
a real deployment, you would likely want to group things by environment, service version, or other meta-data.&lt;/p&gt;

&lt;p&gt;(&lt;em&gt;The current implementation only supports one port per container and assumes it is TCP currently. There is no reason why multiple ports and types could not be supported as well as different grouping attributes.&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;docker-register uses &lt;a href=&#34;https://github.com/jwilder/docker-gen&#34;&gt;docker-gen&lt;/a&gt; along with a &lt;a href=&#34;https://github.com/jwilder/docker-register/blob/master/etcd.tmpl&#34;&gt;python script&lt;/a&gt; as a template.
It dynamically generates a script that, when run, will register each container&amp;rsquo;s IP and PORT under a &lt;code&gt;/backends&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;docker-gen takes care of monitoring docker events and calling the generated script on an interval to ensure TTLs are kept up to date.
If docker-register is stopped, the registrations expire.&lt;/p&gt;

&lt;p&gt;To start docker-register, we need to pass in the host&amp;rsquo;s external IP where other hosts can reach it&amp;rsquo;s containers as well as the address of your etcd host.  docker-gen requires access to the docker daemon in order to call it&amp;rsquo;s API so we bind mount the docker unix socket into the container as well.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ HOST_IP=$(hostname --all-ip-addresses | awk &#39;{print $1}&#39;)
$ ETCD_HOST=w.x.y.z:4001
$ docker run --name docker-register -d -e HOST_IP=$HOST_IP -e ETCD_HOST=$ETCD_HOST -v /var/run/docker.sock:/var/run/docker.sock -t jwilder/docker-register
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;service-discovery-docker-discover:5152b7114a679e7779eb00e9843508ec&#34;&gt;Service Discovery - docker-discover&lt;/h2&gt;

&lt;p&gt;Discovering services is handled by the &lt;a href=&#34;https://registry.hub.docker.com/u/jwilder/docker-discover/&#34;&gt;jwilder/docker-discover&lt;/a&gt; container.
docker-discover polls etcd periodically and generates an haproxy config with listeners for each type of registered service.&lt;/p&gt;

&lt;p&gt;For example, containers running &lt;a href=&#34;https://registry.hub.docker.com/u/jwilder/whoami/&#34;&gt;jwilder/whoami&lt;/a&gt; are registered under &lt;code&gt;/backends/whoami/&amp;lt;id&amp;gt;&lt;/code&gt; and are exposed on host port 8000.&lt;/p&gt;

&lt;p&gt;Other containers that need to call the &lt;a href=&#34;https://registry.hub.docker.com/u/jwilder/whoami/&#34;&gt;jwilder/whoami&lt;/a&gt; service, can send requests to docker bridge IP:8000 or host IP:8000.&lt;/p&gt;

&lt;p&gt;If any of the backend services goes down, haproxy health checks remove it from the pool and will retry the request on a healthy host.
This ensure that backend services can be started and stopped as needed as well as handling inconsistencies in the the registration information while ensuring minimal client impact.&lt;/p&gt;

&lt;p&gt;Finally, stats can be viewed by accessing port 1936 on the docker-discover container.&lt;/p&gt;

&lt;p&gt;To run docker-discover:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ETCD_HOST=w.x.y.z:4001
$ docker run -d --net host --name docker-discover -e ETCD_HOST=$ETCD_HOST -p 127.0.0.1:1936:1936 -t jwilder/docker-discover
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We&amp;rsquo;re using &lt;code&gt;--net host&lt;/code&gt; so that the container uses the host&amp;rsquo;s network stack.  When this container binds port 8000, it&amp;rsquo;s actually binding
on the host&amp;rsquo;s network.  This simplifies the proxy setup.&lt;/p&gt;

&lt;h2 id=&#34;aws-demo:5152b7114a679e7779eb00e9843508ec&#34;&gt;AWS Demo&lt;/h2&gt;

&lt;p&gt;We&amp;rsquo;ll run the full thing on four AWS hosts: an etcd host, a client host and two backend hosts.  The &lt;a href=&#34;https://registry.hub.docker.com/u/jwilder/whoami/&#34;&gt;backend service&lt;/a&gt; is a simple
Golang HTTP server that returns it&amp;rsquo;s hostname (container ID).&lt;/p&gt;

&lt;h3 id=&#34;etcd-host:5152b7114a679e7779eb00e9843508ec&#34;&gt;Etcd Host&lt;/h3&gt;

&lt;p&gt;First we start our etcd registry:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ hostname --all-ip-addresses | awk &#39;{print $1}&#39;
10.170.71.226

$ docker run -d --name etcd -p 4001:4001 -p 7001:7001 coreos/etcd
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Our etcd address is &lt;code&gt;10.170.71.226&lt;/code&gt;.  We&amp;rsquo;ll use that on the other hosts.  If we were running this is a live environment, we could assign an EIP and
DNS address to it to make it easier to configure.&lt;/p&gt;

&lt;h3 id=&#34;backend-hosts:5152b7114a679e7779eb00e9843508ec&#34;&gt;Backend Hosts&lt;/h3&gt;

&lt;p&gt;Next, we start the the services and docker-register on each host.  The service is configured to listen
on port 8000 in the container and we let docker publish it on an random host port.&lt;/p&gt;

&lt;p&gt;On backend host 1:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker run -d -p 8000:8000 --name whoami -t jwilder/whoami
736ab83847bb12dddd8b09969433f3a02d64d5b0be48f7a5c59a594e3a6a3541
$ docker run --name docker-register -d -e HOST_IP=$(hostname --all-ip-addresses | awk &#39;{print $1}&#39;) -e ETCD_HOST=10.170.71.226:4001 -v /var/run/docker.sock:/var/run/docker.sock -t jwilder/docker-register
77a49f732797333ca0c7695c6b590a64a7d75c14b5ffa0f89f8e0e21ae47ae3e

$ docker ps
CONTAINER ID        IMAGE                            COMMAND                CREATED             STATUS              PORTS                     NAMES
736ab83847bb        jwilder/whoami:latest            /app/http              48 seconds ago      Up 47 seconds       0.0.0.0:49153-&amp;gt;8000/tcp   whoami
77a49f732797        jwilder/docker-register:latest   &amp;quot;/bin/sh -c &#39;docker-   28 minutes ago      Up 28 minutes                                 docker-register
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;On backend host 2:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker run -d -p 8000:8000 --name whoami -t jwilder/whoami
4eb0498e52076275ee0702d80c0d8297813e89d492cdecbd6df9b263a3df1c28
$ docker run --name docker-register -d -e HOST_IP=$(hostname --all-ip-addresses | awk &#39;{print $1}&#39;) -e ETCD_HOST=10.170.71.226:4001 -v /var/run/docker.sock:/var/run/docker.sock -t jwilder/docker-register
832e77c83591cb33bba53859153eb91d897f5a278a74d4ec1f66bc9b97deb221

$ docker ps
CONTAINER ID        IMAGE                            COMMAND                CREATED             STATUS              PORTS                     NAMES
4eb0498e5207        jwilder/whoami:latest            /app/http              59 seconds ago      Up 58 seconds       0.0.0.0:49154-&amp;gt;8000/tcp   whoami
832e77c83591        jwilder/docker-register:latest   &amp;quot;/bin/sh -c &#39;docker-   34 minutes ago      Up 34 minutes                                 docker-register
&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id=&#34;client-host:5152b7114a679e7779eb00e9843508ec&#34;&gt;Client Host&lt;/h3&gt;

&lt;p&gt;On the client host, we need to start docker-discover and a client container.  For the client container,
I&amp;rsquo;m using Ubuntu Trusty and will make some &lt;code&gt;curl&lt;/code&gt; requests.&lt;/p&gt;

&lt;p&gt;First start docker-discover:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker run -d --net host --name docker-discover -e ETCD_HOST=10.170.71.226:4001 -p 127.0.0.1:1936:1936 -t jwilder/docker-discover
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then, start a sample client container and pass in a HOST_IP.  We&amp;rsquo;re using the eth0 address but could also use docker0 IP.  We&amp;rsquo;re passing
this in as an environment variable since it is &lt;a href=&#34;http://12factor.net/config&#34;&gt;configuration that will vary between deploys&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker run -e HOST_IP=$(hostname --all-ip-addresses | awk &#39;{print $1}&#39;) -i -t ubuntu:14.04 /bin/bash
$ root@2af5f52de069:/# apt-get update &amp;amp;&amp;amp; apt-get -y install curl
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then, make some requests to the whoami service port 8000 to see them load-balanced.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 4eb0498e5207
$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 736ab83847bb
$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 4eb0498e5207
$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 736ab83847bb
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We can start some more instances on the backends:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker run -d -p :8000 --name whoami-2 -t jwilder/whoami
$ docker run -d -p :8000 --name whoami-3 -t jwilder/whoami

$ docker ps
CONTAINER ID        IMAGE                            COMMAND                CREATED             STATUS              PORTS                     NAMES
5d5c12c96192        jwilder/whoami:latest            /app/http              3 seconds ago       Up 1 seconds        0.0.0.0:49156-&amp;gt;8000/tcp   whoami-2
bb2a408b8ec5        jwilder/whoami:latest            /app/http              21 seconds ago      Up 20 seconds       0.0.0.0:49155-&amp;gt;8000/tcp   whoami-3
4eb0498e5207        jwilder/whoami:latest            /app/http              2 minutes ago       Up 2 minutes        0.0.0.0:49154-&amp;gt;8000/tcp   whoami
832e77c83591        jwilder/docker-register:latest   &amp;quot;/bin/sh -c &#39;docker-   36 minutes ago      Up 36 minutes                                 docker-register
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And make some requests again on the client hosts:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 736ab83847bb
$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 4eb0498e5207
$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m bb2a408b8ec5
$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 5d5c12c96192
$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 736ab83847bb
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Finally, we can shutdown some some containers and routes will be updated.  This kills everything on backend2.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker kill 5d5c12c96192 bb2a408b8ec5 4eb0498e5207

$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 736ab83847bb
$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 67c3cccbb8ba
$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 736ab83847bb
$ root@2af5f52de069:/# curl $HOST_IP:8000
I&#39;m 67c3cccbb8ba
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If we wanted to see how haproxy is balancing traffic or monitor for errors, we can access the client&amp;rsquo;s
host port 1936 in a web browser.&lt;/p&gt;

&lt;h2 id=&#34;wrapping-up:5152b7114a679e7779eb00e9843508ec&#34;&gt;Wrapping Up&lt;/h2&gt;

&lt;p&gt;While there are a lot of different ways to implement &lt;a href=&#34;http://jasonwilder.com/blog/2014/02/04/service-discovery-in-the-cloud/&#34;&gt;service discovery&lt;/a&gt;, SmartStack&amp;rsquo;s sidekick style
of registration and proxying keeps application code simple and easy to integrate in a distributed
environment and really fits well with Docker containers.&lt;/p&gt;

&lt;p&gt;Similarly, Docker&amp;rsquo;s event and container APIs facilitate service registration and discovery with registration services such as etcd.&lt;/p&gt;

&lt;p&gt;The code for &lt;a href=&#34;https://github.com/jwilder/docker-register&#34;&gt;docker-register&lt;/a&gt; and
&lt;a href=&#34;https://github.com/jwilder/docker-discover&#34;&gt;docker-discover&lt;/a&gt; is on github.  While both are functional
there is a lot that can be improved.  Please feel fee to submit or suggest improvements.&lt;/p&gt;
</description>
        </item>
        
        <item>
            <title>Automated Nginx Reverse Proxy for Docker</title>
            <link>http://jasonwilder.com/blog/2014/03/25/automated-nginx-reverse-proxy-for-docker/</link>
            <pubDate>Tue, 25 Mar 2014 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2014/03/25/automated-nginx-reverse-proxy-for-docker/</guid>
            <description>

&lt;p&gt;A reverse proxy server is a server that typically sits in front of other web servers in order to provide additional functionality that the web servers may not provide themselves.&lt;/p&gt;

&lt;p&gt;For example, a reverse proxy can provide SSL termination, load balancing, request routing, caching, compression or even A/B testing.&lt;/p&gt;

&lt;p&gt;When running web services in docker containers, it can be useful to run a reverse proxy in front of the containers to simplify depoyment.&lt;/p&gt;

&lt;h2 id=&#34;why-use-a-reverse-proxy-with-docker:068079669506d19d9ac0100aa8bfeb4c&#34;&gt;Why Use A Reverse Proxy With Docker&lt;/h2&gt;

&lt;p&gt;Docker containers are assigned random IPs and ports which makes
addressing them much more complicated from a client perspsective. By default, the IPs and ports are private to the host and cannot be accessed externally unless they are bound to the host.&lt;/p&gt;

&lt;p&gt;Binding the container to the hosts port can prevent multiple containers from running on the same host.  For example, only one container can bind to port 80 at a time.  This also complicates rolling out new versions of the container without downtime since the old container must be stopped before the new one is started.&lt;/p&gt;

&lt;p&gt;A reverse proxy can help with these issues as well as improve availabilty by facilitating zero-downtime deployments.&lt;/p&gt;

&lt;h2 id=&#34;generating-reverse-proxy-configs:068079669506d19d9ac0100aa8bfeb4c&#34;&gt;Generating Reverse Proxy Configs&lt;/h2&gt;

&lt;p&gt;Setting up a reverse proxy configuration can be complicated when containers are started and stopped.  Typically the configuration needs to be updated manually which is error prone and time consuming.&lt;/p&gt;

&lt;p&gt;Fortunately, Docker provides a remote API to &lt;a href=&#34;http://docs.docker.io/en/latest/reference/api/docker_remote_api_v1.10/#inspect-a-container&#34;&gt;inspect containers&lt;/a&gt; and access their IP, Ports and other configuration meta-data.  In addition, it also provides a &lt;a href=&#34;http://docs.docker.io/en/latest/reference/api/docker_remote_api_v1.10/#monitor-docker-s-events&#34;&gt;real-time events API&lt;/a&gt; that can be used for notifications when containers are started and stopped.  These APIs can be used to generate a reverse proxy config automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://github.com/jwilder/docker-gen&#34;&gt;docker-gen&lt;/a&gt; is a small utility that uses these APIs and exposes container meta-data to templates.  Templates are rendered and an optional notification command can be run to restart the service.&lt;/p&gt;

&lt;p&gt;Using &lt;a href=&#34;https://github.com/jwilder/docker-gen&#34;&gt;docker-gen&lt;/a&gt;, we can generate Nginx config files automatically and reload nginx when they change.  The same approach can also be used for &lt;a href=&#34;http://jasonwilder.com/blog/2014/03/17/docker-log-management-using-fluentd/&#34;&gt;docker log management&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&#34;nginx-reverse-proxy-for-docker:068079669506d19d9ac0100aa8bfeb4c&#34;&gt;Nginx Reverse Proxy for Docker&lt;/h2&gt;

&lt;p&gt;This example nginx template can be used to generate a reverse proxy configuration for docker containers using virtual hosts for
routing.  The template is implemented using the &lt;a href=&#34;http://golang.org/pkg/text/template/&#34;&gt;golang text/template package&lt;/a&gt;. It uses a custom &lt;code&gt;groupBy&lt;/code&gt; template function to group the running containers by their &lt;code&gt;VIRTUAL_HOST&lt;/code&gt; environment variable.  This simplifies iterating over the containers to generate a load-balanced backend and also enables zero-downtime deployments.&lt;/p&gt;

&lt;p&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre&gt;&lt;span class=&#34;k&#34;&gt;{{&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;range&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$host,&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$containers&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;:=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;groupBy&lt;/span&gt; $ &lt;span class=&#34;s&#34;&gt;&amp;quot;Env.VIRTUAL_HOST&amp;quot;&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt;
&lt;span class=&#34;s&#34;&gt;upstream&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;kn&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$host&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;

&lt;span class=&#34;kn&#34;&gt;{{&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;range&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$index,&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$value&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;:=&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$containers&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt;
    &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;kn&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$address&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;:=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;index&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$value.Addresses&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt;
    &lt;span class=&#34;s&#34;&gt;server&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;kn&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$address.IP&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:{&lt;/span&gt;&lt;span class=&#34;kn&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$address.Port&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;kn&#34;&gt;{{&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;end&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;kn&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;end&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt;

&lt;span class=&#34;err&#34;&gt;}&lt;/span&gt;

&lt;span class=&#34;s&#34;&gt;server&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;c1&#34;&gt;#ssl_certificate /etc/nginx/certs/demo.pem;&lt;/span&gt;
    &lt;span class=&#34;c1&#34;&gt;#ssl_certificate_key /etc/nginx/certs/demo.key;&lt;/span&gt;

    &lt;span class=&#34;kn&#34;&gt;gzip_types&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;text/plain&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;text/css&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/json&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/x-javascript&lt;/span&gt;
               &lt;span class=&#34;s&#34;&gt;text/xml&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/xml&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/xml+rss&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;text/javascript&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;kn&#34;&gt;server_name&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;kn&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$host&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;kn&#34;&gt;location&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;/&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
        &lt;span class=&#34;kn&#34;&gt;proxy_pass&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;http://&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;kn&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$host&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
        &lt;span class=&#34;kn&#34;&gt;include&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;/etc/nginx/proxy_params&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;kn&#34;&gt;{{&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;end&lt;/span&gt; &lt;span class=&#34;err&#34;&gt;}}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/p&gt;

&lt;p&gt;The template can be run with &lt;a href=&#34;https://github.com/jwilder/docker-gen&#34;&gt;docker-gen&lt;/a&gt; using:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker-gen -only-exposed -watch -notify &amp;quot;/etc/init.d/nginx reload&amp;quot; templates/nginx.tmpl /etc/nginx/sites-enabled/default&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;-only-exposed&lt;/code&gt; - Only use containers that have exposed ports.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-watch&lt;/code&gt; - After starting up, watch for docker container events and regenerate the template.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-notify &amp;quot;/etc/init.d/nginx reload&amp;quot;&lt;/code&gt; - Reload the nginx config after the template is generated.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;templates/nginx.tmpl&lt;/code&gt; - The nginx template.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/etc/nginx/sites-enabled/default&lt;/code&gt; - Destination file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the rendered template with two containers configured with &lt;code&gt;VIRTUAL_HOST=demo1.localhost&lt;/code&gt; and one with &lt;code&gt;VIRTUAL_HOST=demo2.localhost&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre&gt;&lt;span class=&#34;k&#34;&gt;upstream&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;demo1.localhost&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;kn&#34;&gt;server&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;172.17.0.4&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;5000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;kn&#34;&gt;server&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;172.17.0.3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;5000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;server&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;c1&#34;&gt;#ssl_certificate /etc/nginx/certs/demo.pem;&lt;/span&gt;
    &lt;span class=&#34;c1&#34;&gt;#ssl_certificate_key /etc/nginx/certs/demo.key;&lt;/span&gt;

    &lt;span class=&#34;kn&#34;&gt;gzip_types&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;text/plain&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;text/css&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/json&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/x-javascript&lt;/span&gt;
               &lt;span class=&#34;s&#34;&gt;text/xml&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/xml&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/xml+rss&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;text/javascript&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;kn&#34;&gt;server_name&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;demo1.localhost&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;kn&#34;&gt;location&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;/&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
        &lt;span class=&#34;kn&#34;&gt;proxy_pass&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;http://demo.localhost&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
        &lt;span class=&#34;kn&#34;&gt;include&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;/etc/nginx/proxy_params&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;upstream&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;demo2.localhost&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;kn&#34;&gt;server&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;172.17.0.5&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;5000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;server&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
    &lt;span class=&#34;c1&#34;&gt;#ssl_certificate /etc/nginx/certs/demo.pem;&lt;/span&gt;
    &lt;span class=&#34;c1&#34;&gt;#ssl_certificate_key /etc/nginx/certs/demo.key;&lt;/span&gt;

    &lt;span class=&#34;kn&#34;&gt;gzip_types&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;text/plain&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;text/css&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/json&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/x-javascript&lt;/span&gt;
               &lt;span class=&#34;s&#34;&gt;text/xml&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/xml&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;application/xml+rss&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;text/javascript&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;kn&#34;&gt;server_name&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;demo2.localhost&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

    &lt;span class=&#34;kn&#34;&gt;location&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;/&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
        &lt;span class=&#34;kn&#34;&gt;proxy_pass&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;http://demo2.localhost&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
        &lt;span class=&#34;kn&#34;&gt;include&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;/etc/nginx/proxy_params&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
    &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/p&gt;

&lt;h2 id=&#34;try-it-out:068079669506d19d9ac0100aa8bfeb4c&#34;&gt;Try It Out&lt;/h2&gt;

&lt;p&gt;I created a &lt;a href=&#34;https://index.docker.io/u/jwilder/nginx-proxy/&#34;&gt;trusted build&lt;/a&gt; with this setup to make
it easier to try it out:&lt;/p&gt;

&lt;p&gt;Run nginx-proxy container:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker run -d -p 80:80 -v /var/run/docker.sock:/tmp/docker.sock -t jwilder/nginx-proxy
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Start your containers with a &lt;code&gt;VIRTUAL_HOST&lt;/code&gt; environment variables:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ docker run -e VIRTUAL_HOST=foo.bar.com -t ...
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you need HTTPS, would like to run &lt;code&gt;docker-gen&lt;/code&gt; in a separate container from nginx,
Websocket support or other features, take a look at the
&lt;a href=&#34;https://github.com/jwilder/nginx-proxy&#34;&gt;github&lt;/a&gt; project for more information.&lt;/p&gt;

&lt;h2 id=&#34;conclusion:068079669506d19d9ac0100aa8bfeb4c&#34;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Generating nginx reverse proxy configs for docker containers can be automated using the Docker APIs and some basic templating. This can simplify deployments as well as improve availability.&lt;/p&gt;

&lt;p&gt;While this works well for containers running on a single host, generating configs for remote hosts requires &lt;a href=&#34;http://jasonwilder.com/blog/2014/02/04/service-discovery-in-the-cloud/&#34;&gt;service discovery&lt;/a&gt;.  Take a look at &lt;a href=&#34;http://jasonwilder.com/blog/2014/07/15/docker-service-discovery&#34;&gt;docker service discovery&lt;/a&gt; for a solution to that problem.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update: There&amp;rsquo;s a few other posts with similar ideas and variations that are worth checking out:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://brianketelsen.com/2014/02/25/using-nginx-confd-and-docker-for-zero-downtime-web-updates/&#34;&gt;Using Nginx, Confd, and Docker for Zero-Downtime Web Update&lt;/a&gt; - Brian Ketelsen&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://crosbymichael.com/docker-events.html&#34;&gt;Docker Events&lt;/a&gt; - Michael Crosby&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://oskarhane.com/haproxy-as-a-static-reverse-proxy-for-docker-containers/&#34;&gt;Haproxy As A Static Reverse Proxy for Docker Containers&lt;/a&gt; - Oskar Hane&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        
        <item>
            <title>Docker Log Management Using Fluentd</title>
            <link>http://jasonwilder.com/blog/2014/03/17/docker-log-management-using-fluentd/</link>
            <pubDate>Mon, 17 Mar 2014 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2014/03/17/docker-log-management-using-fluentd/</guid>
            <description>

&lt;p&gt;Docker is an open-source project to easily create lighweight, portable and self-sufficient containers
for applications.  Docker allows you to run many isolated applications on a single host without
the weight of running virtual machines.&lt;/p&gt;

&lt;p&gt;One of the problems with the current versions of docker is managing logs.  Each container runs
a single process and the output of that process is saved by docker to a location on the host.&lt;/p&gt;

&lt;p&gt;There are a few operational issues with this currently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This log file grows indefinitely.  Docker logs each line as a JSON message which can cause
this file to grow quickly and exceed the disk space on the host since it&amp;rsquo;s not rotated automatically.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;docker logs&lt;/code&gt; command returns all recorded logs each time it&amp;rsquo;s run.  Any long running process
that is a little verbose can be difficult to examine.&lt;/li&gt;
&lt;li&gt;Logs under the containers &lt;code&gt;/var/log&lt;/code&gt; or other locations are not easily visible or accessible.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;docker-logging-options:cefd831a6c1c5225510fea1755d80996&#34;&gt;Docker Logging Options&lt;/h2&gt;

&lt;p&gt;While logging in docker is evolving, there are several approaches to handling logs with docker
currently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Collection Inside Container&lt;/strong&gt; - Each container starts up a log collection
process in addition to the application that will be running.  &lt;a href=&#34;https://github.com/phusion/baseimage-docker&#34;&gt;baseimage-docker&lt;/a&gt;
uses &lt;a href=&#34;http://smarden.org/runit/&#34;&gt;runit&lt;/a&gt; along with syslog as an example.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Collection Outside Container&lt;/strong&gt; - A single collection agent runs on the host and containers have a volume
mounted from the host where they write their logs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Collection In Separate Container&lt;/strong&gt; - This is a slight variation of running the collection agent on the
host. The collection agent is also run in a container and volumes from that container are bound to
any application containers using the &lt;code&gt;volumes-from&lt;/code&gt; docker run option.  This &lt;a href=&#34;http://denibertovic.com/post/docker-and-logstash-smarter-log-management-for-your-containers/&#34;&gt;Docker and logstash&lt;/a&gt; article has an example of this approach.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These approaches work but also have some drawbacks.  If collection is performed inside the container,
then each container is running duplicate processes that can waste resources.
(&lt;em&gt;&lt;a href=&#34;http://phusion.github.io/baseimage-docker/&#34;&gt;Running multiple processes&lt;/a&gt; in a container seems to be a
debated subject even though the docker docs use &lt;a href=&#34;http://docs.docker.io/en/latest/examples/using_supervisord/&#34;&gt;supervisor&lt;/a&gt;
as an example.&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;If collectionis run outside the container using volumes, you still need to make sure your application logs to those
volumes and not stdout/stderr.  This might not be possible with all applications.  Finally, the containers
running still have the container JSON log file that will grow unbounded too.&lt;/p&gt;

&lt;h2 id=&#34;using-fluentd-with-docker:cefd831a6c1c5225510fea1755d80996&#34;&gt;Using Fluentd With Docker&lt;/h2&gt;

&lt;p&gt;Another variation of collection outside the container can be done with a centralized logging agent
and without binding volumes to the containers.  This method works directly against the container&amp;rsquo;s JSON
log file on the host.&lt;/p&gt;

&lt;p&gt;When you run a container, the state of the container lives under &lt;code&gt;/var/lib/docker/containers/&amp;lt;id&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre&gt;root@precise64:/var/lib/docker/containers/fe38c4124f36d0a5b2a38ea7dd58fe88ac92980286f1f6a7b7ed3ced7c994374# ls -la
total 44
drwx------  &lt;span class=&#34;m&#34;&gt;3&lt;/span&gt; root root  &lt;span class=&#34;m&#34;&gt;4096&lt;/span&gt; Mar &lt;span class=&#34;m&#34;&gt;14&lt;/span&gt; 19:56 .
drwx------ &lt;span class=&#34;m&#34;&gt;83&lt;/span&gt; root root &lt;span class=&#34;m&#34;&gt;12288&lt;/span&gt; Mar &lt;span class=&#34;m&#34;&gt;14&lt;/span&gt; 21:53 ..
-rw-r--r--  &lt;span class=&#34;m&#34;&gt;1&lt;/span&gt; root root   &lt;span class=&#34;m&#34;&gt;106&lt;/span&gt; Mar &lt;span class=&#34;m&#34;&gt;14&lt;/span&gt; 19:56 config.env
-rw-r--r--  &lt;span class=&#34;m&#34;&gt;1&lt;/span&gt; root root  &lt;span class=&#34;m&#34;&gt;1522&lt;/span&gt; Mar &lt;span class=&#34;m&#34;&gt;14&lt;/span&gt; 19:56 config.json
-rw-------  &lt;span class=&#34;m&#34;&gt;1&lt;/span&gt; root root   &lt;span class=&#34;m&#34;&gt;241&lt;/span&gt; Mar &lt;span class=&#34;m&#34;&gt;14&lt;/span&gt; 19:56 fe38c4124f36d0a5b2a38ea7dd58fe88ac92980286f1f6a7b7ed3ced7c994374-json.log
-rw-r--r--  &lt;span class=&#34;m&#34;&gt;1&lt;/span&gt; root root   &lt;span class=&#34;m&#34;&gt;126&lt;/span&gt; Mar &lt;span class=&#34;m&#34;&gt;14&lt;/span&gt; 19:56 hostconfig.json
-rw-r--r--  &lt;span class=&#34;m&#34;&gt;1&lt;/span&gt; root root    &lt;span class=&#34;m&#34;&gt;13&lt;/span&gt; Mar &lt;span class=&#34;m&#34;&gt;14&lt;/span&gt; 19:56 hostname
-rw-r--r--  &lt;span class=&#34;m&#34;&gt;1&lt;/span&gt; root root   &lt;span class=&#34;m&#34;&gt;181&lt;/span&gt; Mar &lt;span class=&#34;m&#34;&gt;14&lt;/span&gt; 19:56 hosts
drwxr-xr-x  &lt;span class=&#34;m&#34;&gt;2&lt;/span&gt; root root  &lt;span class=&#34;m&#34;&gt;4096&lt;/span&gt; Mar &lt;span class=&#34;m&#34;&gt;14&lt;/span&gt; 19:56 root
&lt;/pre&gt;&lt;/div&gt;
&lt;/p&gt;

&lt;p&gt;The file &lt;code&gt;fe38c4124f36d0a5b2a38ea7dd58fe88ac92980286f1f6a7b7ed3ced7c994374-json.log&lt;/code&gt; is the container
log file.  Each line is a JSON object and there is a line for every line of input and output from the
container.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{&amp;quot;log&amp;quot;:&amp;quot;root@c835298de6dd:/# ls\r\n&amp;quot;,&amp;quot;stream&amp;quot;:&amp;quot;stdout&amp;quot;,&amp;quot;time&amp;quot;:&amp;quot;2014-03-14T22:15:15.155863426Z&amp;quot;}
{&amp;quot;log&amp;quot;:&amp;quot;bin  boot  dev\u0009etc  home  lib\u0009lib64  media  mnt  opt\u0009proc  root  run  sbin  selinux\u0009srv  sys  tmp  usr  var\r\n&amp;quot;,&amp;quot;stream&amp;quot;:&amp;quot;stdout&amp;quot;,&amp;quot;time&amp;quot;:&amp;quot;2014-03-14T22:15:15.194869963Z&amp;quot;}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;a href=&#34;http://fluentd.org&#34;&gt;fluentd&lt;/a&gt; is an open-source data collector that works natively with lines of JSON so you can run a single
fluentd instance on the host and configure it to tail each container&amp;rsquo;s JSON file.&lt;/p&gt;

&lt;p&gt;If you need to tail a log file somewhere on the containers file system, you can use the &lt;code&gt;root&lt;/code&gt; subdirectory as well.
All of the tailed files can then be forwarded to a &lt;a href=&#34;http://jasonwilder.com/blog/2012/01/03/centralized-logging/&#34;&gt;centralized logging system&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a sample fluentd.conf file that tails each container&amp;rsquo;s logs and sends them to stdout.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;## File input
## read docker logs with tag=docker.container

&amp;lt;source&amp;gt;
  type tail
  format json
  time_key time
  path /var/lib/docker/containers/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898-json.log
  pos_file /var/lib/docker/containers/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898-json.log.pos
  tag docker.container.c835298de6dd
  rotate_wait 5
&amp;lt;/source&amp;gt;

&amp;lt;source&amp;gt;
  type tail
  format json
  time_key time
  path /var/lib/docker/containers/965c22a2ad1e935cb1476772ebe1ebef0050559b4cbcc7775b936348e7822347/965c22a2ad1e935cb1476772ebe1ebef0050559b4cbcc7775b936348e7822347-json.log
  pos_file /var/lib/docker/containers/965c22a2ad1e935cb1476772ebe1ebef0050559b4cbcc7775b936348e7822347/965c22a2ad1e935cb1476772ebe1ebef0050559b4cbcc7775b936348e7822347-json.log.pos
  tag docker.container.965c22a2ad1e
  rotate_wait 5
&amp;lt;/source&amp;gt;

&amp;lt;source&amp;gt;
  type tail
  format json
  time_key time
  path /var/lib/docker/containers/889fe291f590c2c2aa2852856687efbb3e6fdd2faeca84da4bc0be2263f37953/889fe291f590c2c2aa2852856687efbb3e6fdd2faeca84da4bc0be2263f37953-json.log
  pos_file /var/lib/docker/containers/889fe291f590c2c2aa2852856687efbb3e6fdd2faeca84da4bc0be2263f37953/889fe291f590c2c2aa2852856687efbb3e6fdd2faeca84da4bc0be2263f37953-json.log.pos
  tag docker.container.889fe291f590
  rotate_wait 5
&amp;lt;/source&amp;gt;

&amp;lt;match docker.**&amp;gt;
  type stdout
&amp;lt;/match&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In a live environment, the actual log contents could be sent to an &lt;a href=&#34;http://elasticsearch.org&#34;&gt;elasticsearch&lt;/a&gt; cluster and
viewed with &lt;a href=&#34;http://www.elasticsearch.org/overview/kibana/&#34;&gt;kibana&lt;/a&gt; or &lt;a href=&#34;http://graylog2.org/&#34;&gt;graylog2&lt;/a&gt;.
Alternatively, there are hosted services that can work with JSON as well.&lt;/p&gt;

&lt;p&gt;Since containers ID are unwieldly to work with, I created a simple &lt;a href=&#34;http://golang.org&#34;&gt;golang&lt;/a&gt; project called
&lt;a href=&#34;https://github.com/jwilder/docker-gen&#34;&gt;docker-gen&lt;/a&gt; that can generate arbitrary files using a template
from the running docker container data.  The example &lt;a href=&#34;https://github.com/jwilder/docker-gen/blob/master/templates/fluentd.conf.tmpl&#34;&gt;fluentd template&lt;/a&gt; in the project was used to
generate the sample above.&lt;/p&gt;

&lt;p&gt;Although not shown, &lt;a href=&#34;https://github.com/jwilder/docker-gen&#34;&gt;docker-gen&lt;/a&gt; could also generate logrotate
config files to rotate the container JSON files to avoid running out of disk space on the host.
Hopefully, the docker project will address this in a future release.&lt;/p&gt;

&lt;h2 id=&#34;conclusion:cefd831a6c1c5225510fea1755d80996&#34;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;This approach provides the following benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The host is able to forward any container&amp;rsquo;s logs to a central log server using
a single collection agent.&lt;/li&gt;
&lt;li&gt;It does not require the applications to use syslog or write to a certain volume.&lt;/li&gt;
&lt;li&gt;The host can access the container logs as well as any log files on the containers filesystem.&lt;/li&gt;
&lt;li&gt;The host can rotate logs for the containers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The one drawback to this approach is that it accesses the docker file system directly without using
the API which means it could break in the future if a future docker release changes how it
stores container logs on the host&amp;rsquo;s file system.&lt;/p&gt;
</description>
        </item>
        
        <item>
            <title>Open-Source Service Discovery</title>
            <link>http://jasonwilder.com/blog/2014/02/04/service-discovery-in-the-cloud/</link>
            <pubDate>Tue, 04 Feb 2014 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2014/02/04/service-discovery-in-the-cloud/</guid>
            <description>

&lt;p&gt;Service discovery is a key component of most distributed systems and service oriented architectures.
The problem seems simple at first: &lt;em&gt;How do clients determine the IP and port for a service that
exist on multiple hosts?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Usually, you start off with some static configuration which gets you pretty far.
Things get more complicated as you start deploying more services.  With a live
system, service locations can change quite frequently due to auto or manual scaling,
new deployments of services, as well as hosts failing or being replaced.&lt;/p&gt;

&lt;p&gt;Dynamic service registration and discovery becomes much more important in these scenarios in
order to avoid service interruption.&lt;/p&gt;

&lt;p&gt;This problem has been addressed in many different ways and is continuing to evolve.  We&amp;rsquo;re going to look at some open-source or openly-discussed solutions to this problem to understand how they work.  Specifically,
we&amp;rsquo;ll look at how each solution uses strong or weakly consistent storage, runtime dependencies, client
integration options and what the tradeoffs of those features might be.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ll start with some strongly consistent projects such as &lt;a href=&#34;http://zookeeper.apache.org/&#34;&gt;Zookeeper&lt;/a&gt;,
&lt;a href=&#34;https://github.com/ha/doozer&#34;&gt;Doozer&lt;/a&gt; and &lt;a href=&#34;https://github.com/coreos/etcd&#34;&gt;Etcd&lt;/a&gt; which are typically
used as coordination services but are also used for service registries as well.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ll then look at some interesting solutions specifically designed for service registration and discovery.
We&amp;rsquo;ll examine &lt;a href=&#34;http://nerds.airbnb.com/smartstack-service-discovery-cloud/&#34;&gt;Airbnb&amp;rsquo;s SmartStack&lt;/a&gt;,
&lt;a href=&#34;https://github.com/Netflix/eureka&#34;&gt;Netflix&amp;rsquo;s Eureka&lt;/a&gt;, &lt;a href=&#34;http://bitly.github.io/nsq/&#34;&gt;Bitly&amp;rsquo;s NSQ&lt;/a&gt;,
&lt;a href=&#34;http://serfdom.io&#34;&gt;Serf&lt;/a&gt;,
&lt;a href=&#34;http://labs.spotify.com/tag/dns/&#34;&gt;Spotify and DNS&lt;/a&gt; and finally &lt;a href=&#34;https://github.com/skynetservices/skydns&#34;&gt;SkyDNS&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&#34;the-problem:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;The Problem&lt;/h1&gt;

&lt;p&gt;There are two sides to the problem of locating services.  &lt;em&gt;Service Registration&lt;/em&gt; and &lt;em&gt;Service Discovery&lt;/em&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Service Registration&lt;/strong&gt; - The process of a service registering its location in a central registry.
It usually register its host and port and sometimes authentication credentials, protocols, versions numbers,
and/or environment details.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Service Discovery&lt;/strong&gt; - The process of a client application querying the central registry to learn
of the location of services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Any service registration and discovery solution also has other development and operational aspects to consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; - What happens when a registered service fails?  Sometimes it is unregistered immediately,
after a timeout, or by another process.  Services are usually required to implement a heartbeating mechanism
to ensure liveness and clients typically need to be able to handle failed services reliably.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Load Balancing&lt;/strong&gt; - If multiple services are registered, how do all the clients balance
the load across the services?  If there is a master, can it be deteremined by a client correctly?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integration Style&lt;/strong&gt; - Does the registry only provide a few language bindings, for example, only Java?
Does integrating require embedding registration and discovery code into your application or is a
&lt;em&gt;sidekick&lt;/em&gt; process an option?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runtime Dependencies&lt;/strong&gt; - Does it require the JVM, Ruby or something that is not compatible
with your environment?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Availability Concerns&lt;/strong&gt; - Can you lose a node and still function?  Can it be upgraded without
incurring an outage?  The registry will grow to be a central part of your architecture and could
be a single point of failure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&#34;general-purpose-registries:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;General Purpose Registries&lt;/h1&gt;

&lt;p&gt;These first three registries use strongly consistent protocols and are actually general purpose, consistent
datastores.  Although we&amp;rsquo;re looking at them as service registries, they are typically used for coordination
services to aid in leader election or centralized locking with a distributed set of clients.&lt;/p&gt;

&lt;h2 id=&#34;zookeeper:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;Zookeeper&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;http://zookeeper.apache.org/&#34;&gt;Zookeeper&lt;/a&gt; is a centralized service for maintaining configuration
information, naming, providing distributed synchronization, and providing group services.  It&amp;rsquo;s written in Java, is strongly consistent (CP) and uses the &lt;a href=&#34;http://www.stanford.edu/class/cs347/reading/zab.pdf&#34;&gt;Zab&lt;/a&gt; protocol to coordinate changes across the ensemble (cluster).&lt;/p&gt;

&lt;p&gt;Zookeeper is typically run with three, five or seven members in the ensemble.  Clients use language
specific &lt;a href=&#34;https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZKClientBindings&#34;&gt;bindings&lt;/a&gt; in order to access the ensemble.
Access is typically embedded into the client applications and services.&lt;/p&gt;

&lt;p&gt;Service registration is implemented with &lt;a href=&#34;http://zookeeper.apache.org/doc/r3.2.1/zookeeperProgrammers.html#Ephemeral+Nodes&#34;&gt;ephemeral nodes&lt;/a&gt;
under a namespace.  Ephemeral nodes only exist while the client is connected so typically a backend service registers itself, after startup, with its
location information.  If it fails or disconnects, the node disappears from the tree.&lt;/p&gt;

&lt;p&gt;Service discovery is implemented by listing and watching the namespace for the service.  Clients
receive all the currently registered services as well as notifications when a service becomes unavailable
or new ones register.  Clients also need to handle any load balancing or failover themselves.&lt;/p&gt;

&lt;p&gt;The Zookeeper API can be difficult to use properly and language bindings might have subtle differences
that could cause problems.
If you&amp;rsquo;re using a JVM based language, the &lt;a href=&#34;http://curator.apache.org/curator-x-discovery/index.html&#34;&gt;Curator Service Discovery Extension&lt;/a&gt; might be of some use.&lt;/p&gt;

&lt;p&gt;Since Zookeeper is a CP system, when a &lt;a href=&#34;http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios&#34;&gt;partition&lt;/a&gt; occurs,
some of your system will not be able to register or find existing registrations even if they could function properly during the
partition.  Specifically, on any non-quorum side, reads and writes will return an error.&lt;/p&gt;

&lt;h2 id=&#34;doozer:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;Doozer&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;https://github.com/ha/doozerd&#34;&gt;Doozer&lt;/a&gt; is a consistent, distributed data store.  It&amp;rsquo;s written in Go,
is strongly consistent and uses &lt;a href=&#34;http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf&#34;&gt;Paxos&lt;/a&gt;
to maintain consensus.  The project has been around for a number of years but has stagnated for
a while and now has close to 160 forks.  Unfortunately, this makes it
difficult to know what the actual state of the project is and whether is is suitable for production
use.&lt;/p&gt;

&lt;p&gt;Doozer is typically run with three, five or seven nodes in the cluster.  Clients use language
specific bindings to access the cluster and, similar to Zookeeper, integration is embedded
into the client and services.&lt;/p&gt;

&lt;p&gt;Service registration is not as straightforward as with Zookeeper because Doozer does not have any
concept of ephemeral nodes.  A service can register itself under a path but if the service becomes
unavailable, it won&amp;rsquo;t be removed automatically.&lt;/p&gt;

&lt;p&gt;There are a number of ways to address this issue. One option might be to add a timestamp and
heartbeating mechanism to the registration process and
handle expired entries during the discovery process or with another cleanup processes.&lt;/p&gt;

&lt;p&gt;Service discovery is similar to Zookeeper in that you can list all the entries under a path and
then wait for any changes to that path.  If you use a timestamp and heartbeat during registration, you
would ignore or delete any expired entries during discovery.&lt;/p&gt;

&lt;p&gt;Like Zookeeper, Doozer is also a CP system and has the same consequences when a partition occurs.&lt;/p&gt;

&lt;h2 id=&#34;etcd:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;Etcd&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;https://github.com/coreos/etcd&#34;&gt;Etcd&lt;/a&gt; is a highly-available, key-value store for shared configuration and service discovery.  Etcd
was inspired by Zookeeper and Doozer.  It&amp;rsquo;s written in Go, uses &lt;a href=&#34;https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf&#34;&gt;Raft&lt;/a&gt;
for consensus and has a HTTP+JSON based API.&lt;/p&gt;

&lt;p&gt;Etcd, similar to Doozer and Zookeeper, is usually run with three, five or seven nodes in the cluster.
Clients use a language specific binding or implement one using an HTTP client.&lt;/p&gt;

&lt;p&gt;Service registration relies on &lt;a href=&#34;https://github.com/coreos/etcd#using-key-ttl&#34;&gt;using a key TTL&lt;/a&gt; along
with heartbeating from the service to ensure the key remains available.  If a services fails to
update the key&amp;rsquo;s TTL, Etcd will expire it.  If a service becomes unavailable,
clients will need to handle the connection failure and try another service instance.&lt;/p&gt;

&lt;p&gt;Service discovery involves listing the keys under a directory and then waiting for changes on the
directory.  Since the API is HTTP based, the client application keeps a long-polling connection
open with the Etcd cluster.&lt;/p&gt;

&lt;p&gt;Since Etcd uses &lt;a href=&#34;https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf&#34;&gt;Raft&lt;/a&gt;, it
should be a strongly-consistent system.  Raft requires a leader to be elected and all client requests are handled by
the leader. However, Etcd also seems to support reads from non-leaders using this &lt;a href=&#34;https://github.com/coreos/etcd/blob/master/server/v2/get_handler.go#L25&#34;&gt;undocumented consistent parameter&lt;/a&gt; which would improve
availabilty in the read case.  Writes would still need to be handled by the leader during a partition and could
fail.&lt;/p&gt;

&lt;h1 id=&#34;single-purpose-registries:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;Single Purpose Registries&lt;/h1&gt;

&lt;p&gt;These next few registration services and approaches are specifically tailored to service registration
and discovery.  Most have come about from actual production use cases while others are interesting
and different approaches to the problem.  Whereas Zookeeper, Doozer and Etcd could also be used for
distributed coordination, these solutions don&amp;rsquo;t have that capability.&lt;/p&gt;

&lt;h2 id=&#34;airbnb-s-smartstack:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;Airbnb&amp;rsquo;s SmartStack&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;http://nerds.airbnb.com/smartstack-service-discovery-cloud/&#34;&gt;Airbnb&amp;rsquo;s SmartStack&lt;/a&gt; is a combination
of two custom tools, &lt;a href=&#34;https://github.com/airbnb/nerve&#34;&gt;Nerve&lt;/a&gt; and &lt;a href=&#34;https://github.com/airbnb/synapse&#34;&gt;Synapse&lt;/a&gt;
that leverage &lt;a href=&#34;http://haproxy.1wt.eu/&#34;&gt;haproxy&lt;/a&gt; and &lt;a href=&#34;http://zookeeper.apache.org/&#34;&gt;Zookeeper&lt;/a&gt; to handle
service registration and discovery.  Both Nerve and Synapse are written in Ruby.&lt;/p&gt;

&lt;p&gt;Nerve is a &lt;em&gt;sidekick&lt;/em&gt; style process that runs as a separate process alongside the application service.
Nerve is reponsible for registering services in Zookeeper.  Applications expose a &lt;code&gt;/health&lt;/code&gt; endpoint,
for HTTP services, that Nerve continuously monitors.  Provided the service is available, it will be
registered in Zookeper.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;sidekick&lt;/em&gt; model eliminates the need for a service to interact with Zookeeper. It simply needs
a monitoring endpoint in order to be registered.  This makes it much easier to support services in different
languages where robust Zookeeper binding might not exist.  This also provides many of benefits of the
&lt;a href=&#34;http://en.wikipedia.org/wiki/Hollywood_principle&#34;&gt;Hollywood principle&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Synapse is also a &lt;em&gt;sidekick&lt;/em&gt; style process that runs as a separate process alongside the service.
Synapse is responsible for service discovery.  It does this by querying Zookeeper for currently
registered services and reconfigures a locally running haproxy instance.  Any clients on the host
that need to access another service always accesses the local haproxy instance which will route the
request to an available service.&lt;/p&gt;

&lt;p&gt;Synapse&amp;rsquo;s design simplifies service implementations in that they do not need to implement any client
side load balancing or failover and they do not need to depend on Zookeepr or its language bindings.&lt;/p&gt;

&lt;p&gt;Since SmartStack relies on Zookeeper, some registrations and discovery may fail during a partition.
They point out that Zookeepr is their &amp;ldquo;Achilles heel&amp;rdquo; in this setup.
Provided a service has been able to discover the other services, at least once, before a partition, it should still
have a snapshot of the services after the partition and may be able to continue operating during the
partition.  This aspect improves the availability and reliability of the overall system.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update: If you&amp;rsquo;re intested in a SmartStack style solution for docker containers, check out &lt;a href=&#34;http://jasonwilder.com/blog/2014/07/15/docker-service-discovery&#34;&gt;docker service discovery&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;h2 id=&#34;netflix-s-eureka:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;Netflix&amp;rsquo;s Eureka&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;https://github.com/Netflix/eureka&#34;&gt;Eureka&lt;/a&gt; is Netflix&amp;rsquo;s middle-tier, load balancing and discovery
service.  There is a server component as well as a smart-client that is used within application
services.  The server and client are written in Java which means the ideal use case would be for
the services to also be imlemented in Java or another JVM compatible language.&lt;/p&gt;

&lt;p&gt;The Eureka server is the registry for services.  They recommend running one Eureka server in each
availability zone in AWS to form a cluster.  The servers replicate their state to each
other through an asynchronous model which means each instance may have a slightly, different picture
of all the services at any given time.&lt;/p&gt;

&lt;p&gt;Service registration is handled by the client component.  Services embed the client in their application
code. At runtime, the client registers the service and periodically sends heartbeats to renew its leases.&lt;/p&gt;

&lt;p&gt;Service discovery is handled by the smart-client as well.  It retrieves the current registrations from the
server and caches them locally.  The client periodically refreshes its state and also handles load
balancing and failovers.&lt;/p&gt;

&lt;p&gt;Eureka was designed to be very resilient during failures. It favors availabilty over
strong consistency and can operate under a number of different failure modes.  If there is a partition within the cluster,
Eureka transitions to a self-preservation state.  It will allow services to be discovered and registered
during a partition and when it heals, the members will merge their state again.&lt;/p&gt;

&lt;h2 id=&#34;bitly-s-nsq-lookupd:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;Bitly&amp;rsquo;s NSQ lookupd&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;https://github.com/bitly/nsq&#34;&gt;NSQ&lt;/a&gt; is a realtime, distributed messaging platform. It&amp;rsquo;s written in Go
and provides an HTTP based API.  While it&amp;rsquo;s not a general purpose service registration and discovery tool,
they have implemented a novel model of service discovery in their
&lt;a href=&#34;http://bitly.github.io/nsq/components/nsqlookupd.html&#34;&gt;nsqlookupd&lt;/a&gt; agent
in order for clients to find &lt;a href=&#34;http://bitly.github.io/nsq/components/nsqd.html&#34;&gt;nsqd&lt;/a&gt; instances at
runtime.&lt;/p&gt;

&lt;p&gt;In an NSQ deployment, the nsqd instances are essentially the service.  These are the message stores.
nsqlookupd is the service registry.  Clients connect directly to nsqd instances but since these may
change at runtime, clients can discover the available instances by querying nsqlookupd instances.&lt;/p&gt;

&lt;p&gt;For service registration, each nsqd instance periodically sends a heartbeat of its state to each nsqlookupd
instance.  Their state includes their address and any queues or topics they have.&lt;/p&gt;

&lt;p&gt;For discovery, clients query each nsqlookupd instance and merge the results.&lt;/p&gt;

&lt;p&gt;What is interesting about this model is that the nsqlookupd instances &lt;em&gt;do not know about each other&lt;/em&gt;.
It&amp;rsquo;s the responsibility of the clients to merge the state returned from each stand-alone nsqlookupd instance to
determine the overal state.  Because each nsqd instance heartbeats its state, each nsqlookupd eventually
has the same information provided each nsqd instance can contact all available nsqlookupd instances.&lt;/p&gt;

&lt;p&gt;All the previously discussed registry components all form a cluster and use strong or weakly consistent
consensus protocols to maintain their state. The NSQ design is inherently weakly consistent but
very tolerant to partitions.&lt;/p&gt;

&lt;h2 id=&#34;serf:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;Serf&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;http://serfdom.io&#34;&gt;Serf&lt;/a&gt; is a decentralized solution for service discovery and orchestration.  It is also
written in Go and is unique in that uses a gossip based protocol, &lt;a href=&#34;http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf&#34;&gt;SWIM&lt;/a&gt; for membership, failure detection and custom event propogation.  SWIM was designed to address the unscalability of traditional heart-beating protocols.&lt;/p&gt;

&lt;p&gt;Serf consists of a single binary that is installed on all hosts.  It can be run as an agent, where it
joins or creates a cluster, or as a client where it can discover the members in the cluster.&lt;/p&gt;

&lt;p&gt;For service registration, a serf agent is run that joins an existing cluster.  The agent is started
with custom tags that can identify the hosts role, env, ip, ports, etc.  Once joined to the cluster, other
members will be able to see this host and it&amp;rsquo;s metadata.&lt;/p&gt;

&lt;p&gt;For discovery, serf is run with the &lt;code&gt;members&lt;/code&gt; command which returns the current members of the
cluster. Using
the members output, you can discover all the hosts for a service based on the tags their agent is
running.&lt;/p&gt;

&lt;p&gt;Serf is a relatively new project and is evolving quickly. It is the only project in this post that
does not have a central registry architectural style which makes it unique.  Since it uses a asynchronous, gossip based protocol, it is inherently weakly-consistent but more fault tolerant and available.&lt;/p&gt;

&lt;h2 id=&#34;spotify-and-dns:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;Spotify and DNS&lt;/h2&gt;

&lt;p&gt;Spotify described their use of DNS for service discovery in their post
&lt;a href=&#34;http://labs.spotify.com/tag/dns/&#34;&gt;In praise of &amp;ldquo;boring&amp;rdquo; technology&lt;/a&gt;.  Instead of using a newer, less
mature technology they opted to build on top of DNS.  Spotify views DNS as a
&amp;ldquo;distributed, replicated database tailored for read-heavy loads.&amp;rdquo;&lt;/p&gt;

&lt;p&gt;Spotify uses the relatively unknown &lt;a href=&#34;http://en.wikipedia.org/wiki/SRV_record&#34;&gt;SRV record&lt;/a&gt; which is intended
for service discovery.  SRV records can be thought of as a more generalized MX record.  They allow you
to define a service name, protocol, TTL, priority, weight, port and target host.  Basically, everything
a client would need to find all available services and load balance against them if necessary.&lt;/p&gt;

&lt;p&gt;Service registration is complicated and fairly static in their setup since they manage all zone files
under source control.  Discovery uses a number of different DNS client librarires and custom tools.  They
also run DNS caches on their services to minimize load on the root DNS server.&lt;/p&gt;

&lt;p&gt;They mention at the end of their post that this model has worked well for them but they are starting
to outgrow it and are investigating Zookeeper to support both static and dynamic registration.&lt;/p&gt;

&lt;h2 id=&#34;skydns:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;SkyDNS&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;https://github.com/skynetservices/skydns&#34;&gt;SkyDNS&lt;/a&gt; is a relatively new project that is written in Go,
uses RAFT for consensus and also provides a client API over HTTP and DNS.  It has some
similarities to Etcd and Spotify&amp;rsquo;s DNS model and actually uses the same RAFT implementation as Etcd,
&lt;a href=&#34;https://github.com/goraft/raft&#34;&gt;go-raft&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;SkyDNS servers are clustered together, and using the RAFT protocol, elect a leader.  The SkyDNS servers
expose different endpoints for registration and discovery.&lt;/p&gt;

&lt;p&gt;For service registration, services use an HTTP based API to create an entry with a TTL.  Services must
heartbeat their state periodically.  SkyDNS also uses SRV records but extends them to also support
service version, environment, and region.&lt;/p&gt;

&lt;p&gt;For discovery, clients use DNS and retrieve the SRV records for the services they need to contact.
Clients need to implement any load balancing or failover and will likely cache and refresh service
location data periodically.&lt;/p&gt;

&lt;p&gt;Unlike Spotify&amp;rsquo;s use of DNS, SkyDNS does support dynamic service registration and is able to do this
without depending on another external service such as Zookeeper or Etcd.&lt;/p&gt;

&lt;p&gt;If you are using &lt;a href=&#34;http://docker.io&#34;&gt;docker&lt;/a&gt;, &lt;a href=&#34;https://github.com/crosbymichael/skydock&#34;&gt;skydock&lt;/a&gt; might be worth checking out to integrate your containers with SkyDNS automatically.&lt;/p&gt;

&lt;p&gt;Overall, this is an interesting mix of old (DNS) and new (Go, RAFT) technology and will be interesting
to see how the project evolves.&lt;/p&gt;

&lt;h1 id=&#34;summary:eabab4b56ce615c8817b5fecb3a27fdf&#34;&gt;Summary&lt;/h1&gt;

&lt;p&gt;We&amp;rsquo;ve looked at a number of general purpose, strongly consistent registries (Zookeeper, Doozer, Etcd)
as well as many custom built, eventually consistent ones (SmartStack, Eureka, NSQ, Serf, Spotify&amp;rsquo;s DNS, SkyDNS).&lt;/p&gt;

&lt;p&gt;Many use embedded client libraries (Eureka, NSQ, etc..) and some use separate sidekick processes
(SmartStack, Serf).&lt;/p&gt;

&lt;p&gt;Interestingly, of the dedicated solutions, all of them have adopted a design that
prefers availability over consistency.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;AP or CP&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Dependencies&lt;/th&gt;
&lt;th&gt;Integration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Zookeeper&lt;/td&gt;
&lt;td&gt;General&lt;/td&gt;
&lt;td&gt;CP&lt;/td&gt;
&lt;td&gt;Java&lt;/td&gt;
&lt;td&gt;JVM&lt;/td&gt;
&lt;td&gt;Client Binding&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Doozer&lt;/td&gt;
&lt;td&gt;General&lt;/td&gt;
&lt;td&gt;CP&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Client Binding&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Etcd&lt;/td&gt;
&lt;td&gt;General&lt;/td&gt;
&lt;td&gt;Mixed (1)&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Client Binding/HTTP&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;SmartStack&lt;/td&gt;
&lt;td&gt;Dedicated&lt;/td&gt;
&lt;td&gt;AP&lt;/td&gt;
&lt;td&gt;Ruby&lt;/td&gt;
&lt;td&gt;haproxy/Zookeeper&lt;/td&gt;
&lt;td&gt;Sidekick (nerve/synapse)&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Eureka&lt;/td&gt;
&lt;td&gt;Dedicated&lt;/td&gt;
&lt;td&gt;AP&lt;/td&gt;
&lt;td&gt;Java&lt;/td&gt;
&lt;td&gt;JVM&lt;/td&gt;
&lt;td&gt;Java Client&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;NSQ (lookupd)&lt;/td&gt;
&lt;td&gt;Dedicated&lt;/td&gt;
&lt;td&gt;AP&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Client Binding&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Serf&lt;/td&gt;
&lt;td&gt;Dedicated&lt;/td&gt;
&lt;td&gt;AP&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Local CLI&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Spotify (DNS)&lt;/td&gt;
&lt;td&gt;Dedicated&lt;/td&gt;
&lt;td&gt;AP&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Bind&lt;/td&gt;
&lt;td&gt;DNS Library&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;SkyDNS&lt;/td&gt;
&lt;td&gt;Dedicated&lt;/td&gt;
&lt;td&gt;Mixed (2)&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;HTTP/DNS Library&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;(1) If using the &lt;code&gt;consistent&lt;/code&gt; parameter, inconsistent reads are possible&lt;/p&gt;

&lt;p&gt;(2) If using a caching DNS client in front of SkyDNS, reads could be inconsistent&lt;/p&gt;
</description>
        </item>
        
        <item>
            <title>Fluentd vs Logstash</title>
            <link>http://jasonwilder.com/blog/2013/11/19/fluentd-vs-logstash/</link>
            <pubDate>Tue, 19 Nov 2013 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2013/11/19/fluentd-vs-logstash/</guid>
            <description>

&lt;p&gt;&lt;a href=&#34;http://fluentd.org&#34;&gt;Fluentd&lt;/a&gt; and &lt;a href=&#34;http://logstash.net&#34;&gt;Logstash&lt;/a&gt; are two open-source projects that
focus on the problem of centralized logging.  Both projects address the &lt;a href=&#34;http://jasonwilder.com/blog/2013/07/16/centralized-logging-architecture/&#34;&gt;collection and transport&lt;/a&gt;
aspect of centralized logging using different approaches.&lt;/p&gt;

&lt;p&gt;This post will walk through a sample deployment to see how each differs from the other.  We&amp;rsquo;ll look
at the dependencies, features, deployment architecture and potential issues.  The point is not to figure out
which one is the best, but rather to see which one would be a better fit for your environment.&lt;/p&gt;

&lt;p&gt;The example setup we&amp;rsquo;ll walk through is collecting web server logs on multiple hosts and archiving
them to S3:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;http://jasonwilder.com/images/centralized-logging-s3.png&#34; alt=&#34;Centralized Logs With S3&#34; /&gt;&lt;/p&gt;

&lt;p&gt;This type of architecture would be suitable for archival or processing with
&lt;a href=&#34;http://hive.apache.org/&#34;&gt;Hive&lt;/a&gt; or &lt;a href=&#34;http://pig.apache.org/&#34;&gt;Pig&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Another common architecture is storing logs in &lt;a href=&#34;http://www.elasticsearch.org/&#34;&gt;ElasticSearch&lt;/a&gt; to
make them searchable with &lt;a href=&#34;http://www.elasticsearch.org/overview/kibana/&#34;&gt;Kibana&lt;/a&gt;
or &lt;a href=&#34;http://graylog2.org/&#34;&gt;Graylog2&lt;/a&gt;. Setting that up is somewhat independent of using Logstash
or Fluentd so I&amp;rsquo;ve left that out to keep things simple.&lt;/p&gt;

&lt;h2 id=&#34;installation-requirements:d94954b9a471ac6eb86d0c66c558309a&#34;&gt;Installation Requirements&lt;/h2&gt;

&lt;h3 id=&#34;logstash:d94954b9a471ac6eb86d0c66c558309a&#34;&gt;Logstash&lt;/h3&gt;

&lt;p&gt;Logstash is a &lt;a href=&#34;http://jruby.org/&#34;&gt;JRuby&lt;/a&gt; based application which requires the JVM to run.  Since it runs on the JVM, it
can run anywhere the JVM does, which is usually means Linux, Mac OSX, and Windows.  The package is shipped
as single executable jar file which makes it very easy to install.&lt;/p&gt;

&lt;p&gt;One of the downsides of depending on the JVM is that it&amp;rsquo;s memory footprint can be higher than you
would want for transporting logs.  Fortunately, &lt;a href=&#34;https://github.com/elasticsearch/logstash-forwarder&#34;&gt;Lumberjack&lt;/a&gt;
can be run on individual hosts to collect and ship logs and Logstash can be run on the
centralized log hosts.&lt;/p&gt;

&lt;p&gt;Lumberjack is a Go based project with a much smaller memory
and CPU footprint. Deployment is still straightforward as Logstash and is basically installing a single
binary.  The project provides &lt;code&gt;deb&lt;/code&gt; and &lt;code&gt;rpm&lt;/code&gt; packages to make it easier to deploy.
An SSL certificates is required to setup authentication between Lumberjack and Logstash which is
a little more complicated, but a nice benefit that encrypted transport is the default.&lt;/p&gt;

&lt;h3 id=&#34;fluentd:d94954b9a471ac6eb86d0c66c558309a&#34;&gt;Fluentd&lt;/h3&gt;

&lt;p&gt;Fluentd is a &lt;a href=&#34;http://en.wikipedia.org/wiki/Ruby_MRI&#34;&gt;CRuby&lt;/a&gt; application which requires Ruby 1.9.2 or later.  There is an open-source
version, fluentd, as well as a commercial version, td-agent.  Fluentd runs on Linux and Mac OSX,
but &lt;a href=&#34;http://docs.fluentd.org/articles/faq#does-fluentd-run-on-windows&#34;&gt;does not run on Windows&lt;/a&gt; currently.&lt;/p&gt;

&lt;p&gt;For larger installs, they recommend using &lt;a href=&#34;http://www.canonware.com/jemalloc/&#34;&gt;jemalloc&lt;/a&gt; to
avoid memory fragmentation.  This is included in the &lt;code&gt;deb&lt;/code&gt; and &lt;code&gt;rpm&lt;/code&gt; packages but needs to be installed
manually if using the open-source version.&lt;/p&gt;

&lt;p&gt;If you use the open-source version, you&amp;rsquo;ll need to install Fluentd from source or via &lt;code&gt;gem install&lt;/code&gt;.
Since Fluentd is primarily developed by a commercial company, their &lt;code&gt;deb&lt;/code&gt; and &lt;code&gt;rpm&lt;/code&gt; packages are
configured to send data to their hosted centralized logging platform.&lt;/p&gt;

&lt;p&gt;Apart from Ruby, they also recommend running &lt;code&gt;ntpd&lt;/code&gt; which you should be running anyways.&lt;/p&gt;

&lt;h2 id=&#34;feature-comparison:d94954b9a471ac6eb86d0c66c558309a&#34;&gt;Feature Comparison&lt;/h2&gt;

&lt;p&gt;Logstash supports a number of inputs, codecs, filters and outputs.  Inputs are sources of data.
Codecs essentially convert an incoming format into an internal Logstash representation as well
as convert back out to an output format.  These are usually used if the incoming message is not
just a single line of text.  Filters are processing actions on events and
allow you to modify events or drop events as they are processed.  Finally, outputs are destinations
where events can be routed.&lt;/p&gt;

&lt;p&gt;Fluentd is similar in that it has inputs and outputs and a matching mechanism to route
log data between destinations.  Internally,
log messages are converted to JSON which provides structure to an unstructered log message.
Messages can be tagged and then routed to different outputs.&lt;/p&gt;

&lt;p&gt;Both projects have very similar capabilities and highlighting the difference between them from a
feature standpoint is difficult.  They both have plugin models that allow you to extend their functionality
if needed.  They also have rich repository of plugins already available.&lt;/p&gt;

&lt;p&gt;Probably the most significant difference between Fluentd and Logstash is their design focus.
&lt;em&gt;Logstash emphasizes flexibility and interoperability&lt;/em&gt; whereas
&lt;em&gt;Fluentd prioritizes simplicity and robustness&lt;/em&gt;.  This does not mean that
Logstash is not robust or Fluentd is not flexible, rather each has prioritized
features differently.&lt;/p&gt;

&lt;p&gt;Fluentd has fewer inputs than Logstash, out of the box, but many of the inputs
and outputs have built-in support for buffering, load-balancing, timeouts and retries.  These
types of features are necessary for ensuring data is reliably delivered.&lt;/p&gt;

&lt;p&gt;For example, the
&lt;a href=&#34;http://docs.fluentd.org/articles/out_forward&#34;&gt;out_forward&lt;/a&gt; plugin used to transfer logs from one
fluentd instance to another has many robustness options that can be configured to ensure messages
are delivered reliably.&lt;/p&gt;

&lt;h2 id=&#34;architecture-comparison:d94954b9a471ac6eb86d0c66c558309a&#34;&gt;Architecture comparison&lt;/h2&gt;

&lt;p&gt;From a deployment architecture standpoint, both frameworks are very similar.  With Logstash, each
web server would be configured to run Lumberjack and tail the web server logs.  Lumberjack would forward the logs
to a server running Logstash with a Lumberjack input.  The Logstash server would also have an
output configured using the &lt;a href=&#34;http://logstash.net/docs/1.2.2/outputs/s3&#34;&gt;S3 output&lt;/a&gt;.  Since Lumberjack
requires SSL certs, the log transfers would be encrypted from the web server to the log server.&lt;/p&gt;

&lt;p&gt;With fluentd, each web server would run fluentd and tail the web server logs and forward them to
another server running fluentd as well.  This server would be configured to receive logs and write
them to S3 using the &lt;a href=&#34;https://github.com/fluent/fluent-plugin-s3&#34;&gt;S3 plugin&lt;/a&gt;.  Fluentd does not
support encryption so logs would be transferred unencrypted.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update: &lt;a href=&#34;https://twitter.com/repeatedly/status/402869393148760064&#34;&gt;@repeatedly&lt;/a&gt; pointed me to the
&lt;a href=&#34;https://github.com/tagomoris/fluent-plugin-secure-forward&#34;&gt;fluent-plugin-secure-forward&lt;/a&gt;
that some companies are using for encrypted transport.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&#34;improving-availability:d94954b9a471ac6eb86d0c66c558309a&#34;&gt;Improving Availability&lt;/h3&gt;

&lt;p&gt;One central log server is a single point of failure.  What happens if we wanted to have more than
one central log server?&lt;/p&gt;

&lt;p&gt;Lumberjack can be configured to use &lt;a href=&#34;https://github.com/elasticsearch/logstash-forwarder#configuring&#34;&gt;multiple servers&lt;/a&gt;
but will only send logs to one until that one fails.  If that happens, previously collected log data
won&amp;rsquo;t be accessible until that host is resurrected.  Essentially, it supports a master with hot-standby
servers.&lt;/p&gt;

&lt;p&gt;Fluentd on the other hand can forward two copies of the logs to each server if needed, load-balance
between multiple hosts or have a master with a hot-standy in case of failure.  There are lot of
options for not only improving availabilty but also scalability if your log volume increases
substantially. Keep in mind, that if you forward multiple copies, this could create duplicate logs
in S3 which might need to be handled when you analyze them.&lt;/p&gt;

&lt;h2 id=&#34;potential-issues:d94954b9a471ac6eb86d0c66c558309a&#34;&gt;Potential Issues&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&#34;http://logstash.net/docs/1.2.2/tutorials/getting-started-centralized&#34;&gt;Logstash docs&lt;/a&gt; suggests
using &lt;a href=&#34;http://redis.io&#34;&gt;Redis&lt;/a&gt; as the receiving output if you run Logstash (not Lumberjack) on each
host.  This setup is based on Redis Lists and/or Pub/Sub which can lose messages if
the receiver dies after removing the message from Redis and before it has had a chance to forward it
along.  Additionally, Redis would need to be configured with &lt;a href=&#34;http://redis.io/topics/persistence&#34;&gt;AOF&lt;/a&gt;
to minimize the chance of lost messages if Redis were to fail.&lt;/p&gt;

&lt;p&gt;There is a document describing &lt;a href=&#34;http://logstash.net/docs/1.2.2/life-of-an-event&#34;&gt;the life of an event&lt;/a&gt;
that discusses some of the failure modes and how Logstash addresses them.  One important point is
that outputs are responsible for retrying events in the case of errors.  There are also internal,
ephemeral queues within Logstash that can hold up to 20 events.  Depending on the failure, there is
a window for messages to be lost.&lt;/p&gt;

&lt;p&gt;If you absolutely cannot risk losing messages, make sure you investigate all the failure modes and
whether the plugins you are using are implemented correctly to handle them.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update: &lt;a href=&#34;https://logstash.jira.com/browse/LOGSTASH-1631&#34;&gt;LOGSTASH-1631&lt;/a&gt; is a bug that
demonstrates one way messages can be lost. It appears the internal messaging is going to be
replaced with a more reliable implementation in the future.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&#34;conclusion:d94954b9a471ac6eb86d0c66c558309a&#34;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Both Logstash and Fluentd are viable centralized logging frameworks that can transfer logs from
multiple hosts to a central location.  Logstash is incredibly flexible with many input and output
plugins whereas fluentd provides fewer input and output sources but provides multiple options
for reliably and robust transport.&lt;/p&gt;
</description>
        </item>
        
        <item>
            <title>Realtime Web Server Log Metrics</title>
            <link>http://jasonwilder.com/blog/2013/07/22/realtime-web-server-log-metrics/</link>
            <pubDate>Mon, 22 Jul 2013 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2013/07/22/realtime-web-server-log-metrics/</guid>
            <description>&lt;p&gt;This is a sample config that uses &lt;a href=&#34;http://nxlog-ce.sourceforge.net&#34;&gt;nxlog&lt;/a&gt; to tail web access logs in
Combined Log Format, pull out the status code and bytes sent and send them to statsd so they can be graphed
using Graphite.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s a simple way to see if your web server is returning errors over time or how much data it&amp;rsquo;s sending.  The same
concept could be used for other log files.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://github.com/etsy/logster&#34;&gt;Logster&lt;/a&gt; lets you do similar things but custom parsing
is accomplished by writing Python plugins which can be a little more complicated than using configuration
files.&lt;/p&gt;

&lt;p&gt;nxlog works by defining inputs, processors, outputs and routes that tie the inputs to processers and finally to outputs.&lt;/p&gt;

&lt;p&gt;For this example, I use the &lt;a href=&#34;http://nxlog-ce.sourceforge.net/nxlog-docs/en/nxlog-reference-manual.html#im_file&#34;&gt;im&lt;em&gt;file&lt;/a&gt;
module which will tail one or more files send each line along for processing.  nxlog will remember the last line read
by default so restarts or connectivity issues are handled gracefully.  After each line is read, the
line is passed to a _Exec&lt;/em&gt; statement that parses each field into variables.&lt;/p&gt;

&lt;p&gt;The parsing could be done in the processor but because of how the routes are setup, I chose to do it during input so
it only happens once.&lt;/p&gt;

&lt;p&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre&gt;&lt;span class=&#34;c&#34;&gt;# Tail access log in Combined Log Format and parse out the fields.&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Input&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;in_nginx&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Module&lt;/span&gt;  im_file
    &lt;span class=&#34;nb&#34;&gt;File&lt;/span&gt;	&lt;span class=&#34;s2&#34;&gt;&amp;quot;/var/log/nginx/access.log&amp;quot;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Exec&lt;/span&gt;    if $raw_event =~ /^(\S+) (\S+) (\S+) \[([^\]]+)\] \&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;(\S+) (.+) HTTP.\d\.\d\&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt; (\d+) (\d+) \&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;([^\&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;]+)\&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt; \&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;([^\&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;]+)\&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;/\
                { \
                  $Hostname = $1; \
                  if $3 != &amp;#39;-&amp;#39; $AccountName = $3; \
                  $EventTime = parsedate($4); \
                  $HTTPMethod = $5; \
                  $HTTPURL = $6; \
                  $HTTPResponseStatus = $7; \
                  $HTTPBytesSent = $8; \
                  $HTTPReferer = $9; \
                  $HTTPUserAgent = $10; \
                }
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Input&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/p&gt;

&lt;p&gt;There are two processors configured &lt;em&gt;http_status&lt;/em&gt; and &lt;em&gt;http_bytes&lt;/em&gt;.  Each one checks to make sure the required
data has been parsed and then rewrites the event into a &lt;a href=&#34;https://github.com/b/statsd_spec&#34;&gt;statsd protocol&lt;/a&gt;
metric.  Both processors create a counter metric that statsd will sum up and roll-over at every interval.&lt;/p&gt;

&lt;p&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre&gt;&lt;span class=&#34;c&#34;&gt;# Rewrite the log message to a statsd counter event using the HTTP status code.&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Processor&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;http_status&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Module&lt;/span&gt;      pm_null
    &lt;span class=&#34;nb&#34;&gt;Exec&lt;/span&gt;        if defined($HTTPResponseStatus) { \
		          $raw_event = &lt;span class=&#34;s2&#34;&gt;&amp;quot;http.status.&amp;quot;&lt;/span&gt; + $HTTPResponseStatus + &lt;span class=&#34;s2&#34;&gt;&amp;quot;:1|c&amp;quot;&lt;/span&gt;; \
                } else { \
                  drop(); \
                }
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Processor&amp;gt;&lt;/span&gt;

&lt;span class=&#34;c&#34;&gt;# Rewrite the log message to a statsd counter event using the bytes sent.&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Processor&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;http_bytes&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Module&lt;/span&gt;      pm_null
    &lt;span class=&#34;nb&#34;&gt;Exec&lt;/span&gt;        if defined($HTTPBytesSent) { \
		          $raw_event = &lt;span class=&#34;s2&#34;&gt;&amp;quot;http.bytes.&amp;quot;&lt;/span&gt; + $HTTPBytes + &lt;span class=&#34;s2&#34;&gt;&amp;quot;:&amp;quot;&lt;/span&gt; + $HTTPBytesSent + &lt;span class=&#34;s2&#34;&gt;&amp;quot;|c&amp;quot;&lt;/span&gt;; \
                } else { \
                  drop(); \
                }
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Processor&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/p&gt;

&lt;p&gt;The output is a &lt;a href=&#34;http://nxlog-ce.sourceforge.net/nxlog-docs/en/nxlog-reference-manual.html#om_udp&#34;&gt;om_udp&lt;/a&gt;
that forwards the re-written log event to a statsd instance.&lt;/p&gt;

&lt;p&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre&gt;&lt;span class=&#34;c&#34;&gt;# Statsd uses UDP&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Output&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;out_statsd&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Module&lt;/span&gt;	om_udp
    &lt;span class=&#34;nb&#34;&gt;Host&lt;/span&gt;	&lt;span class=&#34;m&#34;&gt;127.0.0.1&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Port&lt;/span&gt;	&lt;span class=&#34;m&#34;&gt;8125&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Output&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/p&gt;

&lt;p&gt;Finally, two routes are created that originate from the same input file but are sent to separate processors and back
to the same output.  Since processors are normally chained together and I&amp;rsquo;m re-writing the event, chaining them
together doesn&amp;rsquo;t work.  Instead I need a new copy of the log message for each processor.&lt;/p&gt;

&lt;p&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre&gt;&lt;span class=&#34;c&#34;&gt;# Route nginx access log through status processor and out to statsd&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Route&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;web_status_statsd&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Path&lt;/span&gt;        in_nginx =&amp;gt; http_status =&amp;gt; out_statsd
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Route&amp;gt;&lt;/span&gt;

&lt;span class=&#34;c&#34;&gt;# Since we re-wrote the log event, define a new route for bytes sent&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Route&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;web_bytes_statsd&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Path&lt;/span&gt;        in_nginx =&amp;gt; http_bytes =&amp;gt; out_statsd
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Route&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/p&gt;

&lt;p&gt;And the whole thing put together:&lt;/p&gt;

&lt;p&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre&gt;&lt;span class=&#34;c&#34;&gt;# Tail access log in Combined Log Format and parse out the fields.&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Input&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;in_nginx&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Module&lt;/span&gt;  im_file
    &lt;span class=&#34;nb&#34;&gt;File&lt;/span&gt;	&lt;span class=&#34;s2&#34;&gt;&amp;quot;/var/log/nginx/access.log&amp;quot;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Exec&lt;/span&gt;    if $raw_event =~ /^(\S+) (\S+) (\S+) \[([^\]]+)\] \&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;(\S+) (.+) HTTP.\d\.\d\&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt; (\d+) (\d+) \&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;([^\&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;]+)\&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt; \&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;([^\&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;]+)\&lt;span class=&#34;err&#34;&gt;&amp;quot;&lt;/span&gt;/\
                { \
                  $Hostname = $1; \
                  if $3 != &amp;#39;-&amp;#39; $AccountName = $3; \
                  $EventTime = parsedate($4); \
                  $HTTPMethod = $5; \
                  $HTTPURL = $6; \
                  $HTTPResponseStatus = $7; \
                  $HTTPBytesSent = $8; \
                  $HTTPReferer = $9; \
                  $HTTPUserAgent = $10; \
                }
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Input&amp;gt;&lt;/span&gt;

&lt;span class=&#34;c&#34;&gt;# Rewrite the log message to a statsd counter event using the HTTP status code.&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Processor&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;http_status&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Module&lt;/span&gt;      pm_null
    &lt;span class=&#34;nb&#34;&gt;Exec&lt;/span&gt;        if defined($HTTPResponseStatus) { \
		          $raw_event = &lt;span class=&#34;s2&#34;&gt;&amp;quot;http.status.&amp;quot;&lt;/span&gt; + $HTTPResponseStatus + &lt;span class=&#34;s2&#34;&gt;&amp;quot;:1|c&amp;quot;&lt;/span&gt;; \
                } else { \
                  drop(); \
                }
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Processor&amp;gt;&lt;/span&gt;

&lt;span class=&#34;c&#34;&gt;# Rewrite the log message to a statsd counter event using the bytes sent.&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Processor&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;http_bytes&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Module&lt;/span&gt;      pm_null
    &lt;span class=&#34;nb&#34;&gt;Exec&lt;/span&gt;        if defined($HTTPBytesSent) { \
		          $raw_event = &lt;span class=&#34;s2&#34;&gt;&amp;quot;http.bytes.&amp;quot;&lt;/span&gt; + $HTTPBytes + &lt;span class=&#34;s2&#34;&gt;&amp;quot;:&amp;quot;&lt;/span&gt; + $HTTPBytesSent + &lt;span class=&#34;s2&#34;&gt;&amp;quot;|c&amp;quot;&lt;/span&gt;; \
                } else { \
                  drop(); \
                }
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Processor&amp;gt;&lt;/span&gt;

&lt;span class=&#34;c&#34;&gt;# Statsd uses UDP&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Output&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;out_statsd&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Module&lt;/span&gt;	om_udp
    &lt;span class=&#34;nb&#34;&gt;Host&lt;/span&gt;	&lt;span class=&#34;m&#34;&gt;127.0.0.1&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Port&lt;/span&gt;	&lt;span class=&#34;m&#34;&gt;8125&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Output&amp;gt;&lt;/span&gt;

&lt;span class=&#34;c&#34;&gt;# Route nginx access log through status processor and out to statsd&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Route&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;web_status_statsd&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Path&lt;/span&gt;        in_nginx =&amp;gt; http_status =&amp;gt; out_statsd
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Route&amp;gt;&lt;/span&gt;

&lt;span class=&#34;c&#34;&gt;# Since we re-wrote the log event, define a new route for bytes sent&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Route&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;web_bytes_statsd&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;Path&lt;/span&gt;        in_nginx =&amp;gt; http_bytes =&amp;gt; out_statsd
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Route&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/p&gt;
</description>
        </item>
        
        <item>
            <title>Centralized Logging Architecture</title>
            <link>http://jasonwilder.com/blog/2013/07/16/centralized-logging-architecture/</link>
            <pubDate>Tue, 16 Jul 2013 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2013/07/16/centralized-logging-architecture/</guid>
            <description>

&lt;p&gt;In &lt;a href=&#34;http://jasonwilder.com/blog/2012/01/03/centralized-logging/&#34;&gt;Centralized Logging&lt;/a&gt;, I covered a few tools that help
with the problem of centralized logging.  Many of these tools address only a portion of the problem
which means you need to use several of them together to build a robust solution.&lt;/p&gt;

&lt;p&gt;The main aspects you will need to address are: &lt;em&gt;collection, transport, storage&lt;/em&gt;, and &lt;em&gt;analysis&lt;/em&gt;.  In some special cases, you may
also want to have an &lt;em&gt;alerting&lt;/em&gt; capability as well.&lt;/p&gt;

&lt;h3 id=&#34;collection:a45c0fcecce9d83887cd906ffc8b7137&#34;&gt;Collection&lt;/h3&gt;

&lt;p&gt;Applications create logs in different ways, some log through syslog, others log directly to files.  If you
consider a typical web application running on a linux hosts, there will be a dozen or more log files
in /var/log as well as a few application specific logs in home directories or other locations.&lt;/p&gt;

&lt;p&gt;If you are supporting a web based application and your developers or operations staff need access to log data
quickly in order to troubleshoot live issues, you need a solution that is able to monitor changes to log files in
near real-time.  If you are using a file replication based approach where files are replicated to a central server
on a fixed schedule, then you can only inspect logs as frequently as the replication runs.  A one minute rsync cron
job might not be fast enough when your site is down and you are waiting for the relevant log data to be replicated.&lt;/p&gt;

&lt;p&gt;On the other hand, if you need to analyze log data offline for calculating metrics or other batch related work,
a file replication strategy might be a good fit.&lt;/p&gt;

&lt;h3 id=&#34;transport:a45c0fcecce9d83887cd906ffc8b7137&#34;&gt;Transport&lt;/h3&gt;

&lt;p&gt;Log data can accumulate quickly on multiple hosts.  Transporting it reliably and quickly to your centralized
location may need additional tooling in order to effectively transmit it and ensure data is not lost.&lt;/p&gt;

&lt;p&gt;Frameworks such as &lt;a href=&#34;https://github.com/facebook/scribe&#34;&gt;Scribe&lt;/a&gt;, &lt;a href=&#34;http://flume.apache.org/&#34;&gt;Flume&lt;/a&gt;,
&lt;a href=&#34;https://github.com/mozilla-services/heka&#34;&gt;Heka&lt;/a&gt;, &lt;a href=&#34;http://logstash.net/&#34;&gt;Logstash&lt;/a&gt;,
&lt;a href=&#34;http://incubator.apache.org/chukwa/&#34;&gt;Chukwa&lt;/a&gt;, &lt;a href=&#34;http://fluentd.org/&#34;&gt;fluentd&lt;/a&gt;,
&lt;a href=&#34;https://github.com/bitly/nsq&#34;&gt;nsq&lt;/a&gt; and &lt;a href=&#34;http://kafka.apache.org/&#34;&gt;Kafka&lt;/a&gt; are designed for transporting large
volumes of data from one host to another reliably.  Although each of these frameworks addresses the transport
problem, they do so quite differently.&lt;/p&gt;

&lt;p&gt;For example, &lt;a href=&#34;https://github.com/facebook/scribe&#34;&gt;Scribe&lt;/a&gt;, &lt;a href=&#34;https://github.com/bitly/nsq&#34;&gt;nsq&lt;/a&gt;
and &lt;a href=&#34;http://kafka.apache.org/&#34;&gt;Kafka&lt;/a&gt;, require clients to log data via their API.  Typically, application
code is written to log directly to these sources which allows them to reduce latency and improve reliability.  If
you want to centralize typical log file data, you would need something to tail and stream the logs via their respective APIs.
If you control the app that is logging the data you want to collect, these can be much more efficient.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;http://logstash.net/&#34;&gt;Logstash&lt;/a&gt;, &lt;a href=&#34;https://github.com/mozilla-services/heka&#34;&gt;Heka&lt;/a&gt;,
&lt;a href=&#34;http://fluentd.org/&#34;&gt;fluentd&lt;/a&gt; and &lt;a href=&#34;http://flume.apache.org/&#34;&gt;Flume&lt;/a&gt; provide a number of input
sources but also support natively tailing files and transporting them reliably.  These are a better fit for
more general log collection.&lt;/p&gt;

&lt;p&gt;While &lt;a href=&#34;http://rsyslog.com/&#34;&gt;rsyslog&lt;/a&gt; and &lt;a href=&#34;http://www.balabit.com/network-security/syslog-ng&#34;&gt;Syslog-ng&lt;/a&gt;
are typically thought of as the defacto log collector, not all applications use syslog.&lt;/p&gt;

&lt;h3 id=&#34;storage:a45c0fcecce9d83887cd906ffc8b7137&#34;&gt;Storage&lt;/h3&gt;

&lt;p&gt;Now that your log data is being transfered, it needs a destination.  Your centralized storage system needs
to be able to handle the growth in data over time.  Each
day will add a certain amount of storage that is relative to the number of hosts and processes that are generating
log data.&lt;/p&gt;

&lt;p&gt;How you store things depends on a few things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;How long should it be stored&lt;/em&gt; - If the logs are for long-term, archival purposes and do not require immediate
analysis, &lt;a href=&#34;http://aws.amazon.com/s3/&#34;&gt;S3&lt;/a&gt;, &lt;a href=&#34;http://aws.amazon.com/glacier/&#34;&gt;AWS Glacier&lt;/a&gt;,
or tape backup might be a suitable option since they provide relatively low cost for
large volumes of data.  If you only need a few days or months worth of logs, storing them on some form distributed
storage systems such as &lt;a href=&#34;http://hadoop.apache.org/docs/stable/hdfs_design.html&#34;&gt;HDFS&lt;/a&gt;,
&lt;a href=&#34;http://cassandra.apache.org/&#34;&gt;Cassandara&lt;/a&gt;, &lt;a href=&#34;http://www.mongodb.org/&#34;&gt;MongoDB&lt;/a&gt; or
&lt;a href=&#34;http://elasticsearch.org&#34;&gt;ElasticSearch&lt;/a&gt; also works well.  If you only need a few
hours worth of retention for real-time analysis, &lt;a href=&#34;http://redis.io&#34;&gt;Redis&lt;/a&gt; might work as well.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;em&gt;Your environments data volume&lt;/em&gt;. -  A days worth of logs for Google is much different than a days worth of logs for
ACME Fishing Supplies.  The storage system you chose should allow you to scale-out horizontally if your
data volume will be large.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;em&gt;How will you need to access the logs&lt;/em&gt; - Some storage is not suitable for real-time or even batch analysis.
AWS Glacier or tape backup can take hours to load a file.  These don&amp;rsquo;t work if you need log access for production
troubleshooting.  If you plan to do more interactive data analysis,
storing log data in &lt;a href=&#34;http://elasticsearch.org&#34;&gt;ElasticSearch&lt;/a&gt; or &lt;a href=&#34;http://hadoop.apache.org/docs/stable/hdfs_design.html&#34;&gt;HDFS&lt;/a&gt;
may allow you work with the raw data more effectively.  Some log data is
so large that it can only be analyzed in more batch oriented frameworks.  The defacto standard is this case is &lt;a href=&#34;http://hadoop.apache.org/&#34;&gt;Apache Hadoop&lt;/a&gt; along
with &lt;a href=&#34;http://hadoop.apache.org/docs/stable/hdfs_design.html&#34;&gt;HDFS&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;analysis:a45c0fcecce9d83887cd906ffc8b7137&#34;&gt;Analysis&lt;/h3&gt;

&lt;p&gt;Once your logs are stored on a centralized storage platform, you need a way to analyze them.  The most common approach
is a batch oriented process that runs periodically.  If you are storing log data in
&lt;a href=&#34;http://hadoop.apache.org/docs/stable/hdfs_design.html&#34;&gt;HDFS&lt;/a&gt;, &lt;a href=&#34;http://hive.apache.org/&#34;&gt;Hive&lt;/a&gt;
 or &lt;a href=&#34;http://pig.apache.org/&#34;&gt;Pig&lt;/a&gt; might
help analyzing the data easier than writing native MapReduce jobs.&lt;/p&gt;

&lt;p&gt;If you need a UI for analysis, you can store parsed log data in &lt;a href=&#34;http://elasticsearch.org&#34;&gt;ElasticSearch&lt;/a&gt;
and use a front-end such as &lt;a href=&#34;http://kibana.org/&#34;&gt;Kibana&lt;/a&gt; or &lt;a href=&#34;http://graylog2.org/&#34;&gt;Graylog2&lt;/a&gt;
to query and inspect the data.  The log parsing can be handled by &lt;a href=&#34;http://logstash.net/&#34;&gt;Logstash&lt;/a&gt;,
&lt;a href=&#34;https://github.com/mozilla-services/heka&#34;&gt;Heka&lt;/a&gt; or applications logging with JSON
directly.  This approach allows more real-time, interactive access to the data but is not really suited for a mass
batch processing.&lt;/p&gt;

&lt;h3 id=&#34;alerting:a45c0fcecce9d83887cd906ffc8b7137&#34;&gt;Alerting&lt;/h3&gt;

&lt;p&gt;The last component that is sometimes nice to have is the ability to alert on log patterns or calculated metrics
based on log data.  Two common uses for this are error reporting and monitoring.&lt;/p&gt;

&lt;p&gt;Most log data is not interesting but errors almost always indicate a problem.  It&amp;rsquo;s much more effective
to have the logging system email or notify respective parties when errors ocurr instead of having someone repeatedly
watch for the events.  There are several services that solely provide application error logging such as
&lt;a href=&#34;https://www.getsentry.com/&#34;&gt;Sentry&lt;/a&gt; or &lt;a href=&#34;https://www.honeybadger.io/&#34;&gt;HoneyBadger&lt;/a&gt;.  These can also
aggregate repetitve exceptions which can give you and idea of how frequently an error is occuring.&lt;/p&gt;

&lt;p&gt;Another use case is monitoring.  For example, you may have hundreds of web servers and want to know if they start
returning 500 status codes.  If you can parse your web log files and record a metric on the status code, you can then trigger
alerts when that metric crosses a certain threshold.   &lt;a href=&#34;http://riemann.io&#34;&gt;Riemann&lt;/a&gt; is designed for detecting
scenarios just like this.&lt;/p&gt;

&lt;p&gt;Hopefully this helps provide a basic model for designing a centralized logging solution for your environment.&lt;/p&gt;
</description>
        </item>
        
        <item>
            <title>Optimizing MongoDB Indexes</title>
            <link>http://jasonwilder.com/blog/2012/02/08/optimizing-mongodb-indexes/</link>
            <pubDate>Wed, 08 Feb 2012 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2012/02/08/optimizing-mongodb-indexes/</guid>
            <description>

&lt;p&gt;Good indexes are an important part running a well performing application on MongoDB.  MongoDB performs best
when it can keep your indexes in RAM.  Reducing the size of your indexes also leads to faster queries and the
ability to manage more data with less RAM.&lt;/p&gt;

&lt;p&gt;These are a few tips to reduce the size of your MongoDB indexes:&lt;/p&gt;

&lt;h3 id=&#34;1-determining-indexes-sizes:6767c783482562a5eb48fdd85b43cdd8&#34;&gt;1) Determining Indexes Sizes&lt;/h3&gt;

&lt;p&gt;The first thing you should do is to understand the size of your indexes.  You want to know the sizes before
you make changes to confirm that the changes have actually reduced the size.  Ideally, you are graphing
your indexes over time with your monitoring tools.&lt;/p&gt;

&lt;p&gt;Using the mongo shell you can run db.stats() to get database indexes stats:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt; db.stats()
{
	&amp;quot;db&amp;quot; : &amp;quot;examples1&amp;quot;,
	&amp;quot;collections&amp;quot; : 6,
	&amp;quot;objects&amp;quot; : 403787,
	&amp;quot;avgObjSize&amp;quot; : 121.9966467469235,
	&amp;quot;dataSize&amp;quot; : 49260660,
	&amp;quot;storageSize&amp;quot; : 66695168,
	&amp;quot;numExtents&amp;quot; : 20,
	&amp;quot;indexes&amp;quot; : 9,
	&amp;quot;indexSize&amp;quot; : 48524560,
	&amp;quot;fileSize&amp;quot; : 520093696,
	&amp;quot;nsSizeMB&amp;quot; : 16,
	&amp;quot;ok&amp;quot; : 1
}
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;indexes - The number of indexes in examples1 DB&lt;/li&gt;
&lt;li&gt;indexSize - The size of the indexes in example1 DB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since each collection has indexes, you can run db.collection.stats() to see them:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt; db.address.stats()
{
	&amp;quot;ns&amp;quot; : &amp;quot;examples1.address&amp;quot;,
	&amp;quot;count&amp;quot; : 3,
	&amp;quot;size&amp;quot; : 276,
	&amp;quot;avgObjSize&amp;quot; : 92,
	&amp;quot;storageSize&amp;quot; : 8192,
	&amp;quot;numExtents&amp;quot; : 1,
	&amp;quot;nindexes&amp;quot; : 2,
	&amp;quot;lastExtentSize&amp;quot; : 8192,
	&amp;quot;paddingFactor&amp;quot; : 1,
	&amp;quot;flags&amp;quot; : 1,
	&amp;quot;totalIndexSize&amp;quot; : 16352,
	&amp;quot;indexSizes&amp;quot; : {
		&amp;quot;_id_&amp;quot; : 8176,
		&amp;quot;_types_1&amp;quot; : 8176
	},
	&amp;quot;ok&amp;quot; : 1
}
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;totalIndexSize - The size of all indexes in the collection&lt;/li&gt;
&lt;li&gt;indexSizes - A dictionary of index name and size&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;NOTE: all sizes returned by these commands are in bytes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;These commands are useful but they are tedious to use manually.  To report on indexes stats, I wrote a
utility, index-stats.py, that can be found in the
&lt;a href=&#34;https://github.com/jwilder/mongodb-tools&#34;&gt;mongodb-tools&lt;/a&gt; project on Github that makes things
easier.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(virtualenv) mongodb-tools$ ./index-stats.py
Checking DB: examples2.system.indexes
Checking DB: examples2.things
Checking DB: examples1.system.indexes
Checking DB: examples1.address
Checking DB: examples1.typeless_address
Checking DB: examples1.user
Checking DB: examples1.typeless_user

Index Overview
+----------------------------+------------------------+--------+------------+
|         Collection         |         Index          | % Size | Index Size |
+----------------------------+------------------------+--------+------------+
| examples1.address          | _id_                   |   0.0% |      7.98K |
| examples1.address          | _types_1               |   0.0% |      7.98K |
| examples1.typeless_address | _id_                   |   0.0% |      7.98K |
| examples1.typeless_user    | _id_                   |  10.1% |      6.21M |
| examples1.typeless_user    | address_id_1           |  10.1% |      6.21M |
| examples1.typeless_user    | typeless_address_ref_1 |   5.9% |      3.62M |
| examples1.user             | _id_                   |  10.1% |      6.21M |
| examples1.user             | _types_1               |   6.9% |      4.24M |
| examples1.user             | _types_1_address_id_1  |  12.2% |      7.51M |
| examples1.user             | _types_1_address_ref_1 |  26.2% |     16.09M |
| examples2.things           | _id_                   |  10.1% |      6.21M |
| examples2.things           | _types_1               |   8.4% |      5.13M |
+----------------------------+------------------------+--------+------------+

Top 5 Largest Indexes
+-------------------------+------------------------+--------+------------+
|        Collection       |         Index          | % Size | Index Size |
+-------------------------+------------------------+--------+------------+
| examples1.user          | _types_1_address_ref_1 |  26.2% |     16.09M |
| examples1.user          | _types_1_address_id_1  |  12.2% |      7.51M |
| examples1.typeless_user | _id_                   |  10.1% |      6.21M |
| examples2.things        | _types_1               |   8.4% |      5.13M |
| examples1.user          | _types_1               |   6.9% |      4.24M |
+-------------------------+------------------------+--------+------------+

Total Documents: 600016
Total Data Size: 74.77M
Total Index Size: 61.43M
RAM Headroom: 2.84G
Available RAM Headroom: 1.04G
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The output shows the total index size, each index size, and their relative sizes to each other.
In addition, the Top 5 Largest indexes are reported across all your collections.
This makes it easy to determine your largest indexes and the ones where reducing
their size will provide most benefit.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RAM Headroom is your physical memory - index size.  A positive value means you have RAM available for indexes to fit
in memory.&lt;/li&gt;
&lt;li&gt;Available RAM Headroom is free memory - index size.  Since other processes consume memory on this system, I don&amp;rsquo;t
have the total RAM Headroom available.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The RAM Headroom stat idea comes from the
&lt;a href=&#34;http://blog.boxedice.com/2011/02/15/mongodb-monitoring-dashboard/&#34;&gt;MongoDB monitoring service&lt;/a&gt; I use,
&lt;a href=&#34;http://serverdensity.com&#34;&gt;ServerDensity&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;From this output, I would focus on the examples1.user collection and the &lt;em&gt;types_1_address_ref_1&lt;/em&gt; and
&lt;em&gt;types_1_address_id_1&lt;/em&gt; indexes first.&lt;/p&gt;

&lt;h3 id=&#34;2-remove-redundant-indexes:6767c783482562a5eb48fdd85b43cdd8&#34;&gt;2) Remove Redundant Indexes&lt;/h3&gt;

&lt;p&gt;If you have been releasing code changes over a period of time, you&amp;rsquo;ll likely end up with redundant indexes.  MongoDB
can use the prefix of a compound index if all the component parts are not available.  In the previous output,&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;| examples1.user          | _types_1               |   6.9% |      4.24M |
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;is redundant with&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;| examples1.user          | _types_1_address_ref_1 |  26.2% |     16.09M |
| examples1.user          | _types_1_address_id_1  |  12.2% |      7.51M |
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Because _&lt;em&gt;types_1&lt;/em&gt; is the prefix to these two indexes. Dropping it would save 4.2M on the total index
size and be one less index to update when user documents change.&lt;/p&gt;

&lt;p&gt;To make it easier to find these indexes, you can run redundant-indexes.py from
&lt;a href=&#34;https://github.com/jwilder/mongodb-tools&#34;&gt;mongodb-tools&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(virtualenv)mongodb-tools$ ./redundant-indexes.py
Checking DB: examples2
Checking DB: examples1
Index examples1.user[_types_1] may be redundant with examples1.user[_types_1_address_ref_1]
Index examples1.user[_types_1] may be redundant with examples1.user[_types_1_address_id_1]
Checking DB: local
&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id=&#34;3-compact-command:6767c783482562a5eb48fdd85b43cdd8&#34;&gt;3) Compact Command&lt;/h3&gt;

&lt;p&gt;If you are running MongoDB 2.0+, you can run the compact command to defragment your collections and rebuild
the indexes.  The compact command locks the database so make sure you know where you are running it beforehand.
If you are running with replica sets, the easiest thing to do is to run it on your secondaries, one at a time, fail-over
the primary to new secondary and run compact on the old primary.&lt;/p&gt;

&lt;h3 id=&#34;4-mongodb-2-0-index-improvements:6767c783482562a5eb48fdd85b43cdd8&#34;&gt;4) MongoDB 2.0 Index Improvements&lt;/h3&gt;

&lt;p&gt;If you are not running MongoDB 2.0 or later, upgrading and rebuilding your indexes should provide about a
25% savings.&lt;/p&gt;

&lt;p&gt;See &lt;a href=&#34;http://www.mongodb.org/display/DOCS/2.0+Release+Notes#2.0ReleaseNotes-IndexPerformanceEnhancements&#34;&gt;Index Performance Enhancements&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&#34;5-check-index-criteria:6767c783482562a5eb48fdd85b43cdd8&#34;&gt;5) Check Index Criteria&lt;/h3&gt;

&lt;p&gt;Another thing to check is your index criteria.  You want the values that are indexed to be small and as selective
as possible.  Indexing values that do not help MongoDB find
your data faster slow queries down and increase the index size.  If you are using a mapping framework for your
application, and it support defining indexes in the code, you should
check to see what it&amp;rsquo;s actually indexing.  For example &lt;a href=&#34;http://mongoengine.org/&#34;&gt;MongoEngine&lt;/a&gt; for Python
uses a &amp;ldquo;_types&amp;rdquo; field to identify subclasses in the same collection.  This can add a lot of space and may not add
to the selectivity of you indexes.&lt;/p&gt;

&lt;p&gt;In my test data, my largest index is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;| examples1.user             | _types_1_address_ref_1 |  26.2% |     16.09M |
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Looking at the data for it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt; db.user.findOne()
{
	&amp;quot;_id&amp;quot; : ObjectId(&amp;quot;4f2ef95c89a40a11c5000002&amp;quot;),
	&amp;quot;_types&amp;quot; : [
		&amp;quot;User&amp;quot;
	],
	&amp;quot;address_id&amp;quot; : ObjectId(&amp;quot;4f2ef95c89a40a11c5000000&amp;quot;),
	&amp;quot;address_ref&amp;quot; : {
		&amp;quot;$ref&amp;quot; : &amp;quot;address&amp;quot;,
		&amp;quot;$id&amp;quot; : ObjectId(&amp;quot;4f2ef95c89a40a11c5000000&amp;quot;)
	},
	&amp;quot;_cls&amp;quot; : &amp;quot;User&amp;quot;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can see that &lt;em&gt;_types&lt;/em&gt; is an array with a value of &lt;em&gt;User&lt;/em&gt;, the class name.  Since I don&amp;rsquo;t have any subclasses of &lt;em&gt;User&lt;/em&gt;
in my code, indexing this value does not help the index selectivity. Another way of thinking about this is that each
value in the index is going to have &amp;ldquo;User&amp;rdquo; as a prefix which adds a few extra bytes for value and does not increase
the selectivity of the index.&lt;/p&gt;

&lt;p&gt;Removing it in the code with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;class User(Document):
    meta {&#39;index_types&#39;:False}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Changes the index to:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;| examples1.user             | address_ref_1          |  16.8% |     12.39M |
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;About a 23% savings.&lt;/p&gt;

&lt;p&gt;Digging in further, &lt;em&gt;address_ref_1&lt;/em&gt; is a &lt;em&gt;ReferenceProperty&lt;/em&gt; to an &lt;em&gt;Address&lt;/em&gt; object.  The data above shows that it is a
dictionary that contains the id of the reference field as well as the collection that it points to.  If we
change this &lt;em&gt;ReferenceProperty&lt;/em&gt; to an &lt;em&gt;ObjectIdProperty&lt;/em&gt;, which is what &lt;em&gt;address_id&lt;/em&gt;, is, you can get additional savings:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;| examples1.user             | address_id_1           |   9.5% |      6.21M |
| examples1.user             | address_ref_1          |  20.9% |     13.70M |

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;About a 53% savings.  This changes the index value from being stored as a serialized dictionary to just and ObjectId
which is likely highly optimized with MongoDB.  Changing the property type does require code changes though and
you also lose the automatic de-referencing capability provided by &lt;em&gt;ReferenceProperties&lt;/em&gt;. It can produce significant
savings though.&lt;/p&gt;

&lt;p&gt;In total, we&amp;rsquo;ve reduced the original index by 61% by adjusting some index criteria and making some small code changes.&lt;/p&gt;

&lt;h3 id=&#34;6-delete-move-old-data:6767c783482562a5eb48fdd85b43cdd8&#34;&gt;6) Delete/Move Old Data&lt;/h3&gt;

&lt;p&gt;In most applications, some data is accessed more frequently than others.  If you have old data that won&amp;rsquo;t be accessed
by your users, you may be able to purge it, move it to another un-indexed collection, or archive it somewhere outside
of the DB.  Ideally, you database contains and is indexing the working set of available data.&lt;/p&gt;

&lt;p&gt;There are some other good optimization ideas that can be found here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.slideshare.net/andrew311/optimizing-mongodb-lessons-learned-at-localytics&#34;&gt;Optimizing MongoDB: Lessons Learned at Localytics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning&#34;&gt;MongoDB Performance Tuning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How do you tune your indexes?&lt;/p&gt;
</description>
        </item>
        
        <item>
            <title>Centralized Logging</title>
            <link>http://jasonwilder.com/blog/2012/01/03/centralized-logging/</link>
            <pubDate>Tue, 03 Jan 2012 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2012/01/03/centralized-logging/</guid>
            <description>

&lt;p&gt;Logs are a critical part of any system, they give you insight into what a system is doing as well what
happened.  Virtually every process running on a system generates logs in some form or another.
Usually, these logs are written to files on local disks.   When your system grows to multiple hosts,
managing the logs and accessing them can get complicated.  Searching for a particular error across
hundreds of log files on hundreds of servers is difficult without good tools.  A common approach to
this problem is to setup a centralized logging solution so that multiple logs can be aggregated in
a central location.&lt;/p&gt;

&lt;p&gt;So what are your options?&lt;/p&gt;

&lt;h3 id=&#34;file-replication:f034079baf6de821e361a928dc85da81&#34;&gt;File Replication&lt;/h3&gt;

&lt;p&gt;A simple approach is to setup file replication of your logs to a central server on a cron schedule.  Usually rsync and
cron are used since they are simple and straightforward to setup.  This solution can work for a while but it doesn&amp;rsquo;t
provide timely access to log data.  It also doesn&amp;rsquo;t aggregate the logs and only co-locates them.&lt;/p&gt;

&lt;h3 id=&#34;syslog:f034079baf6de821e361a928dc85da81&#34;&gt;Syslog&lt;/h3&gt;

&lt;p&gt;Another option that you probably already have installed is &lt;a href=&#34;http://en.wikipedia.org/wiki/Syslog&#34;&gt;syslog&lt;/a&gt;.
Most people use &lt;a href=&#34;http://rsyslog.com/&#34;&gt;rsyslog&lt;/a&gt; or &lt;a href=&#34;http://www.balabit.com/network-security/syslog-ng&#34;&gt;
syslog-ng&lt;/a&gt; which are two syslog implementations.  These daemons allow processes to send log messages to them and the
syslog configuration determines how the are stored.  In a centralized logging setup, a central syslog daemon is setup
on your network and the client logging dameons are setup to forward messages to the central daemon.  A good write-up
of this kind of setup can be found at:
&lt;a href=&#34;http://urbanairship.com/blog/2010/10/05/centralized-logging-using-rsyslog/&#34;&gt;Centralized Logging Use Rsyslog&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Syslog is great because just about everything uses it and you likely already have it installed on your system.  With a
central syslog server, you will likely need to figure out how to scale the server and make it highly-available.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://www.balabit.com/network-security/syslog-ng&#34;&gt;syslog-ng&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://urbanairship.com/blog/2010/10/05/centralized-logging-using-rsyslog/&#34;&gt;rsyslog&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;distributed-log-collectors:f034079baf6de821e361a928dc85da81&#34;&gt;Distributed Log Collectors&lt;/h3&gt;

&lt;p&gt;A new class of solutions that have come about have been designed for
high-volume and high-throughput log and event collection.  Most of these solutions are more general purpose
event streaming and processing systems and logging is just one use case that can be solved using them.
All of these have their specific features and differences but their architectures are fairly similar.
They generally consist of logging clients and/or agents on each specific host.  The agents forward logs to a
cluster of collectors which in turn forward the messages to a scalable storage tier.  The idea is that the
collection tier is horizontally scalable to grow with the increase number of logging hosts and messages.  Similarly,
the storage tier is also intended to scale horizontally to grow with increased volume.  This is gross simplification
of all of these tools but they are a step beyond traditional syslog options.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/facebook/Scribe&#34;&gt;Scribe&lt;/a&gt;&lt;/strong&gt; - Scribe is scalable and reliable log aggregation server
used and released by Facebook as open source.  Scribe is written in C++ and uses &lt;a href=&#34;http://thrift.apache.org/&#34;&gt;Thrift&lt;/a&gt;
for the protocol encoding.  Since it uses thrift, virtually any language can work with it.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;https://cwiki.apache.org/FLUME/&#34;&gt;Flume&lt;/a&gt;&lt;/strong&gt; - Flume is an Apache project for collecting,
aggregating, and moving large amounts of log data.  It stores all this data on
&lt;a href=&#34;http://hadoop.apache.org/hdfs/&#34;&gt;HDFS&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://logstash.net&#34;&gt;logstash&lt;/a&gt;&lt;/strong&gt; - logstash lets you ship, parse and index logs from any source.  It
works by defining inputs (files, syslog, etc.), filters (grep, split, multiline, etc..) and outputs (elasticsearch,
mongodb, etc..).  It also provides a UI for accessing and searching your logs.  See
&lt;a href=&#34;http://logstash.net/docs/1.0.17/getting-started-centralized&#34;&gt;Getting Started&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://wiki.apache.org/hadoop/Chukwa&#34;&gt;Chukwa&lt;/a&gt;&lt;/strong&gt; - Chukwa is another Apache project that collects
logs onto &lt;a href=&#34;http://hadoop.apache.org/hdfs/&#34;&gt;HDFS&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://fluentd.org/doc/&#34;&gt;fluentd&lt;/a&gt;&lt;/strong&gt; - Fluentd is similar to logstash in that there are inputs and
outputs for a large variety of sources and destination.  Some of it&amp;rsquo;s design tenets are easy installation and small
footprint.  It doesn&amp;rsquo;t provide any storage tier itself but allows you to easily configure where your logs should be
collected.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://incubator.apache.org/kafka/&#34;&gt;kafka&lt;/a&gt;&lt;/strong&gt; - Kafka was developed at LinkedIn for their activity stream
processing and is now an Apache incubator project.  Although Kafka could be used for log collection this is not it&amp;rsquo;s
primary use case.  Setup requires &lt;a href=&#34;http://zookeeper.apache.org/&#34;&gt;Zookeeper&lt;/a&gt;
to manage the cluster state.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://graylog2.org/&#34;&gt;Graylog2&lt;/a&gt;&lt;/strong&gt; - Graylog2 provides a UI for searching and analyzing logs.  Logs are
stored in &lt;a href=&#34;http://www.mongodb.org&#34;&gt;MongoDB&lt;/a&gt; and/or &lt;a href=&#34;http://elasticsearch.org&#34;&gt;elasticsearch&lt;/a&gt;.
Graylog2 also provides the &lt;a href=&#34;http://graylog2.org/about/gelf&#34;&gt;GELF&lt;/a&gt; logging format to overcome some issues
with syslog message: 1024 byte limit and unstructured log messages.  If you are logging long stacktraces,
you may want to look into GELF.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://www.splunk.com&#34;/&gt;splunk&lt;/a&gt;&lt;/strong&gt; - Splunk is commercial product that has been around for several years.
It provides a whole host of features for not only collecting logs but also analyzing and viewing them.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Update: I wrote a post comparing &lt;a href=&#34;http://jasonwilder.com/blog/2013/11/19/fluentd-vs-logstash/&#34;&gt;Fluentd vs Logstash&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&#34;hosted-logging-services:f034079baf6de821e361a928dc85da81&#34;&gt;Hosted Logging Services&lt;/h3&gt;

&lt;p&gt;There are also several hosted &amp;ldquo;logging as a service&amp;rdquo; providers as well.  The benefit of them is that you only need
to configure your syslog forwarders or agents and they manage the collection, storage and access to the logs.  All of
the infrastructure that you have to setup and maintain is handled by them, freeing you up to focus on your application.
Each service provide a simple setup (usuallysyslog forwarding based), an API and a UI to support search and analysis.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://loggly.com/&#34;&gt;loggly&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://papertrailapp.com&#34;&gt;papertrail&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://logentries.com/&#34;&gt;logentries&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I go into more detail how all of these fit together in
&lt;a href=&#34;http://jasonwilder.com/blog/2013/07/16/centralized-logging-architecture/&#34;&gt;Centralized Logging Architecture&lt;/a&gt;.&lt;/p&gt;
</description>
        </item>
        
        <item>
            <title>Octopress Blogging System</title>
            <link>http://jasonwilder.com/blog/2011/12/15/octopress-blogging-system/</link>
            <pubDate>Thu, 15 Dec 2011 00:00:00 UTC</pubDate>
            
            <guid>http://jasonwilder.com/blog/2011/12/15/octopress-blogging-system/</guid>
            <description>&lt;p&gt;After several years of maintaining a Wordpress blog, I&amp;rsquo;ve decided to switch to
&lt;a href=&#34;http://octopress.org&#34;&gt;Octopress&lt;/a&gt;.  Wordpress worked well for me at first
but it seemed to have more functionality than I really needed or wanted.  The breaking
point happened last night when I tried to upgrade it and the templates I was using no
longer worked.  Digging into the code was going to be more effort than it was worth.  So
now I&amp;rsquo;ve moved on to something simpler.&lt;/p&gt;

&lt;p&gt;Octopress uses a different approach then Wordpress for generating content.  Octopress generates
a static site whereas Wordpress needs MySQL and PHP.  I like the simplicty of Octopress
and it also makes it easy version control everything w/ git.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
