Gaza

This blog doesn’t see much use, but if there’s one thing I want to be clear for anyone that stumbles upon it, now or in the future, it is my unconditional condemnation of Israel’s genocide in Gaza. I’m no subject matter expert on the conflict, but killing children and journalists like Israel is doing is wrong, full stop. My views on this have been shaped by much better writers that I would recommend you read on this, particularly Ed Yong and his compilations on the subject, and Sarah Kendzior with her numerous writings from before the latest escalation in the conflict until the present.

Don’t give up, never forget.

Decoding InnoDB foreign key errors

When debugging InnoDB foreign key errors, the most detailed error message is found in the LATEST FOREIGN KEY ERROR section of show engine innodb status. This will helpfully include a binary dump of the index tuple that failed, something like this:

Foreign key constraint fails for table `foo`.`#sql-7_146`:
,
CONSTRAINT `some_fk` FOREIGN KEY (`bar_id`) REFERENCES `bar` (`id`)
Trying to add in child table, in index some_fk tuple:
DATA TUPLE: 2 fields;
0: len 8; hex 00000000250763b5; asc % c ;;
1: len 4; hex 84d43d99; asc = ;

In this case the IDs in the tuple were an unsigned bigint and a signed int, thus the 8 and 4 byte representations. While this is certainly a step on the way to identifying the row that violated the foreign key integrity, having it printed in decimal certainly would have saved some headache. You’d think that these would be fairly trivial to convert from hex to decimal, and you’d be correct for the unsigned long, you can tack on a 0x prefix and use python f. ex: python -c 'print(0x00000000250763b5), and it would print the correct ID of 621,241,269. (You can also use mysql itself, select conv('00000000250763b5', 16, 10) will return the same thing). However if you try this with the int, you’ll get 2,228,501,913 back, which to a sharp eye immediately seems a bit sus since it’s larger than the int max (~2.1B).

jeremycole on Stack Overflow has luckily shared the crucial bit of information on what is going on here:

80000003 is the hex representation of the bytes stored for the integer 3 (InnoDB internally flips the high bit)

This nugget of wisdom is all you need to be able to identify your offending row, in my case flip the leading 8 to a 0 and it converts easily.

Inline feedback from checkov on Github

checkov is a pretty neat tool to verify that your Infrastructure-as-Code (IaC) repo doesn’t do or omit anything that unintentionally impacts your security posture. The best kind of feedback is early and localized feedback, thus better than having a failed test run is a message directly in the PR diff about where something went wrong. Luckily GitHub has decent support for letting Actions provide localized feedback by using the magic format ::error file=$file,line=$line,col=$col::$message (documented here), which we can fairly easily combine with checkov for a pretty good developer experience.

We can write a quick script to bridge these two, which can be run without any arguments to run checks against the entire repo, or by giving it a list of files to check:

import argparse
import json
import subprocess
import sys

SKIPPED_CHECKS = [
    # I don't like this check
    'CKV_AWS_40',
]

def main():
    args = get_args()
    result, output = get_checkov_output(args.files)
    print_github_errors(output)
    sys.exit(result)


def get_checkov_output(files):
    cmd = ['checkov']

    # If any input files are given, run on only those, otherwise run across everything
    if files:
        for file in files:
            cmd.extend(['--file', file])
    else:
        cmd.extend(['--directory', 'terraform/'])

    cmd.extend([
        '--quiet',
        '--framework', 'terraform',
        '--output', 'json',
        '--skip-check', ','.join(SKIPPED_CHECKS),
    ])
    proc = subprocess.run(cmd, stdout=subprocess.PIPE)
    return proc.returncode, json.loads(proc.stdout.decode('utf-8'))


def print_github_errors(checkov_output):
    for failure in checkov_output['results']['failed_checks']:
        details = ''
        if 'guideline' in failure:
            details = ' Details: %s' % failure['guideline']
        print('::error file=%s,line=%s,col=1::%s (%s).%s' % (
            failure['repo_file_path'][1:],
            failure['file_line_range'][0],
            failure['check_name'],
            failure['check_id'],
            details),
        )


def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('files', nargs='*')
    return parser.parse_args()


if __name__ == '__main__':
    main()

In a GitHub Action we can request to only run this only against changed files in a PR:

name: Checkov

on:
    pull_request:
        paths:
            - .github/workflows/checkov.yml
            - terraform/**
            - tools/terraform_security_check.py

jobs:
    terraform-security-check:
        runs-on: ubuntu-20.04
        steps:
            - uses: actions/checkout@v2
              with:
                  fetch-depth: ${{ github.event.pull_request.commits }}

            - name: Fetch base branch
              run: |
                git fetch --no-tags --prune --depth=1 origin +refs/heads/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}

            - name: Configure
              run: pip install checkov

            - name: Run checkov
              run: ./tools/terraform_security_check.py $(git diff --name-only "origin/${{ github.base_ref }}.." ./terraform/)

Which lets us get inline feedback like this if a file touched by a PR violates any of the security policies:

Avoiding rebuild on git revert

Do you use a git commit hash so identify artifacts from your build process? And if something goes wrong on a deploy, do you use git revert or similar to revert the change?

I do, it’s a pretty easy process to automate, but the main problem is that if your builds are a bit slow (ie a couple minutes or more), the revert takes about the same time as a regular change, and thus you have broken code live for the time it takes to rebuild your stuff.

Fear not, our lord and savior is close by, just replace your git rev-parse HEAD or your equivalent with git rev-parse HEAD:./. The only difference is that the latter is a consistent identifier for the files under the given directory, and will thus return the same identifier after a revert. Use this to check if the artifact already exists, skip the build if it does, and jump straight to redeploying the existing artifact.

Branch-specific variables for GitHub Actions

When setting up a CI/CD pipeline you might find that you need to set some variables depending on which branch you’re on. With GitHub Actions there’s good support for setting global variables which can be overridden on a per-job or per-step basis, but setting branch-specific variables is a bit less straight forward. This post outlines the cleanest approach I’ve found for this so far.

First of all you want to declare the variables you want in a global lookup table as json.

env:
  lookup: |
    {
      "master": {
        "TARGET_ENV": "staging",
        "BUCKET": "example-staging"
      },
      "prod": {
        "TARGET_ENV": "prod",
        "BUCKET": "example-prod"
      }
    }

Then as one of the first steps in your job you want to utilize workflow commands to dynamically set environment variables based on the lookup table you defined. You can use any programming language you prefer to do this, you just need to load the lookup table, parse the current branch and output lines with ::set-env name=KEY::VALUE for each variable to set. I used node for this since json-parsing is easily available without any extra imports, which keeps the code for this small. There’s no existing environment variable for the current branch, but you can extract this from the GITHUB_REF which looks like refs/heads/<branch-name>. The code to do this:

const branch = process.env.GITHUB_REF.slice("refs/heads/".length);
const data = JSON.parse(process.env.lookup)[branch];
for (const key in data) {
    console.log(`::set-env name=${key}::${data[key]}`)
}

Should be pretty straight forward. Compacting this a bit we get the following oneliner we can make a job step out of

- name: Put branch-specific variables into env
  run: node -e 'let data = JSON.parse(process.env.lookup)[process.env.GITHUB_REF.slice("refs/heads/".length)]; for (let key in data) {console.log(`::set-env name=${key}::${data[key]}`)}'

Obviously this can be golfed much further if you want to. I think this preserves a decent balance between compactness and legibility, but if you really want to keep this short for some reason, we can remove the lets as they’re not strictly required, trim some whitespace, use single letter variable names, and replace the process.env lookups with passing in the strings from bash instead. That yields this fairly terse variant:

- name: Put branch-specific variables into env (compact version)
  run: node -e "d=JSON.parse('$lookup')['${GITHUB_REF:11}'];for(k in d){console.log('::set-env name='+k+'::'+d[k])}"

Whichever version you chose, you can now use the variables from your lookup table as you would usually in later steps:

- name: Some deploy step
  run: ./deploy.sh ${{ env.BUCKET }}

Putting it all together with a check to only deploy the target branches:

env:
  lookup: |
    {
      "master": {
        "BUCKET": "example-staging"
      },
      "prod": {
        "BUCKET": "example-prod"
      }
    }

jobs:
  deploy:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/master' || github.ref == 'refs/heads/prod'
    steps:
        - name: Put branch-specific variables into env (compact version)
          run: node -e "d=JSON.parse('$lookup')['${GITHUB_REF:11}'];for(k in d){console.log('::set-env name='+k+'::'+d[k])}"

        - name: Deploy
          run: ./deploy.sh ${{ env.BUCKET }}

Bread

Since the entire world seems to have picked up baking bread now I figured I’d share the recipe that literally carried Megacool through four years in San Francisco. We all took turns to bring loaves to the office for lunch, and it’s been my staple breakfast for the last three years. This is a low hassle, cheap and super quick recipe for extremely tasty bread. No kneading necessary! Prep time is ~25 minutes, you’ll be eating fresh bread ~3 hours after starting.

  • 1 L whole wheat flour
  • 6 dl regular flour
  • 1 pack rapid-rise yeast
  • 1 tsp salt
  • 3 tpsp olive oil (canola also works, use whatever you prefer)
  • 1 L lukewarm water
  • 4 dl “magic mix”

The magic mix is where your preferences come into play. Feel free to experiment with this part to make the bread your own. I use:

  • 2 dl rough chopped walnuts (roughly a handful of walnuts)
  • 2 dl oats
  • 1 dl flaxseeds

This adds up to ~4 dl when combined since there’s a lot of space between walnuts for the flaxseeds to fill up. If you need to tailor it for someone who can’t have nuts, do a bit more oats and maybe add some other grains you like.

Put whole wheat flour and yeast in a bowl. Fill a measuring cup with your magic mix, then top it up with regular flour until the total is 1 L. Put this into the bowl as well, and add the oil, salt and skin-temperature water. Mix it until it becomes fairly slimy and homogeneous. Divide into two bread forms, and let it rise somewhere warm for 1.5 hours. I usually turn on the oven which makes a good temperature on top of the stove for rising the bread. When they are big enough according to your standards and expectations, put them in the oven for 55 minutes at 200°C (395°F) . At this point, take them out of the forms and put them back in the oven for 10 minutes to give them a good crispy crust. Let them cool for a bit on a wire cooking tray, then dig in.

What you don’t eat I recommend freezing sliced, then it’s easy to just take out what you want and toast it later. Enjoy!

Commit of the week - 2020 week 9

I had an idea recently of presenting a commit of the week, as a way to showcase good software engineering in a bite-sized show and tell fashion. The thinking is that a lot of good practices, principles and computer science fundamentals can be easily taught with actual code from actual projects, which ensures that it’s relevant and applicable. I want to showcase work from a variety of people and projects, be that something I encounter at work or in open source, or, if I’m being lazy and haven’t been able to think of something for a given week, some of my own work. This is one of those latter types of weeks.

The format I’ll follow is presenting the commit, explaining a bit of the background and what is happening, and the key takeaways as to why this is good thing. That’s it for the formalities, let’s jump right in!

Author: Tarjei Husøy <git@thusoy.com>
Date:   Thu Feb 20 11:46:18 2020 -0800

    Remove postgres version overrides on travis

    The default seems to be 9.5.20 now, so this is moot.

diff --git a/.travis.yml b/.travis.yml
--- a/.travis.yml
+++ b/.travis.yml
services:
    - postgresql
addons:
    postgresql: "9.5"
 install:
-    # There's a bug in postgres 9.5.3 in "on conflict" clause handling that's
-    # fixed in 9.5.4 that we run everywhere else, but is not the default on
-    # travis yet. Thus manually upgrade postgres.
-    - sudo apt-get install postgresql-9.5
-    - sudo service postgresql restart
     - SKIP_VAGRANT=1 SKIP_BOOTSTRAP=1 ./configure
 script:
     - QUIET=false DO_NETWORK_TESTS=true ./tools/docker-test.sh

What is happening here is that the Travis CI build image we were using shipped with Postgres 9.5.3, which had a bug that impacted our code. Some time ago I had added a workaround to the install phase to upgrade this to a newer version of Postgres where the bug was fixed to make sure our test suite could run to completion. Since the default Postgres version had been upgraded, this override was no longer needed and could be removed.

What do I think is good about this? Firstly, we’re removing code. Less code is almost always better. In this case it also made our builds 20-30 seconds faster because we’re doing less work for every build, which means shorter feedback loops. It’s also now more correct, because our builds started failing due to these lines, because when the default was upgraded these lines now made 9.3 the new version for some reason, which also failed our tests. This illustrates how code rots, something that at some point was a good workaround is now making problems, without any intentional change on our part. Code needs maintenance to stay functional.

The other thing I think is good here is that the initial override was well commented. The way the comment here is written made it easy to understand the purpose of something that would otherwise be very confusing, as there’s an earlier statement in the file that says 9.5 should be installed, why is that repeated here? The other good thing about this comment is that explains not only what is being done, but what the original symptons were (the wrong version being installed by default). This makes it easy when reviewing this later to check if that is still the case, by simply checking what is the default version now. If that is no longer the case, the workaround can be removed.

This ensures that when a funky thing is added to the codebase, it has a limited lifespan because it’s easy to evaluate when it’s no longer necessary. Codebases that don’t document similar workarounds will keep accumulating them, and they can’t be removed safely because nobody remembers why they were added in the first place. This makes the codebases gradually harder to work with until they’re basically unmaintainable, because people will have to accomodate for all these weird things that are being done, without understanding why it’s being done in the first place. When the engineers working on something get used to not knowing why things are being done in a certain way, anything they do will come from a place of partial understanding, which leads to partial solutions.

Well-documented code does not fall victom to Chesterton’s Fence, as the reasoning is right there with the weird obstruction. The alternative (or complementary action) to adding a comment is to make sure there’s a test in place that validates the fix. If the woraround is removed, the test will fail and can draw attention to the underlying problem. But tests are also code and will rot over time, thus you should also consider the impact of the test over time, as every single test added makes a test run a little bit slower. If the tests are no longer relevant, they should also be removed.

Key takeways

  • Remove code that is no longer relevant.
  • When you do anything that is not obvious from the context, explain what the conditions were at the time and why the fix is necessary. Write like you’re writing to a future maintainer that is evaluating if a thing can be deleted.

Introducing laim: A mail transformer

Many *nix utilities default to sending mail for status reports. For example cron will send output from jobs to the email specified in $MAILTO, or to the owner of the crontab. Unattended-upgrades can send email reports when packages are updated, and sudo can send reports whenever someone fails to enter a correct password, or whenever anyone authenticates at all. For several of these use cases it makes sense to use a centralized message delivery system like email, since the task might run on behalf of a user who might not have write access to a logfile which could otherwise be used.

Operating a secure mail infrastructure is non-trivial, and not fun. If you want to send mail externally you need to setup and maintain DKIM and SPF and maintain IP reputation, and if you have more than one server you’re probably looking at setting up some internal forwarding, which means servers also need a way to authenticate themselves to the forwarder. Having done this for my own infrastructure I did not want to repeat it when setting up the infrastructure for Megacool.

Thus the problem is that we want to listen to mail from our servers, but we don’t want to treat it as mail. We might want to forward it to some centralized logging, or forward to Slack, or instrument the event to a service like Honeycomb. Or all the above. What I wanted was a uniform and simple way to handle mail in whatever flexible way is needed for a given server.

This led to the creation of laim, a tool to easily transform mail and handle it however you like. The laim package provides SMTP service on localhost port 25 to receive internal mail, and a sendmail compatibility binary. After installation you need to modify the handler, which is a python script that receives the mail and does whatever it wants with it.

Here’s a handler we currently use to forward to a Slack channel dedicated to server reports:

import os
import socket
import textwrap

import requests
from laim import Laim


class SlackHandler(Laim):

    def __init__(self):
        super().__init__()
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': 'Bearer %s' % self.config['slack-token']
        })
        self.channel_id = self.config['slack-channel-id']
        self.hostname = socket.gethostname()


    def handle_message(self, sender, recipients, message):
        self.session.post('https://slack.com/api/chat.postMessage', json={
            'channel': self.channel_id,
            'text': textwrap.dedent('''\
                `%s` received mail for %s
                *From*: %s
                *To*: %s
                *Subject*: %s

                %s
            ''') % (
                self.hostname,
                ', '.join(recipients),
                message.get('From'),
                message.get('To'),
                message.get('Subject'),
                message.get_payload(),
            ),
        })


if __name__ == '__main__':
    handler = SlackHandler()
    handler.run()

Quite simple. Configuration beyond deploying the handler is fairly sparse, there’s a config file that will be read on startup before dropping privileges, whose content is available to the handler through self.config. The service doesn’t listen on any external interfaces, and drops privileges before handling any incoming messages.

Laim was written to be easy to operate, and to not serve as an escalation vector for someone that has compromised a host. It depends on the python standard library and aiosmtp to do the heavy lifting, thus building on well-tested code in a memory-safe language to handle the input. While high throughput is not a goal of the library, it’s written around bounded queues to prevent arbitrary memory growth in case of delays or bugs in the handler. Configuration was largely avoided by taking a library approach instead of trying to cram a lot of logic into configuration files, making the library easier and the handlers easy to understand, debug and customize for anyone that is familiar with python.

At Megacool we now receive the following messages from sudo on failed passwords:

And the following from unattended-upgrades:

For unattended-upgrades it’d probably be nice to write a log parser that could forward the event to Honeycomb or similar so that we could see when a security update is rolled out across all hosts. Currently it’s a bit noisy when all hosts update a given package within a day, but at least we’re now receiving an event we can see and handle, compared to unattended-upgrades sadly logging that it doesn’t have a mailer available to send reports to.

If you’re curious and want to play around with it, it’s easily available for Debian stretch and buster through my apt repo at https://repo.thusoy.com/apt/debian main (release key here), through a salt state here, or play around with the source here. The source repo includes helpers to easily build your own .deb’s if you want to host it from your own repo, all dependencies are bundled with the deb through dh-virtualenv.

An observant eye might have noticed that our Slack handler depends on requests to handle outgoing http, which is not available on Debian without either installing python3-requests globally or sidestepping apt and installing it with pip (which is strongly discouraged). After the laim package is installed it has it’s own virtualenv at /opt/venvs/laim, with it’s own copy of pip, so what we do is use this version of pip to install anything the handler needs, without interfering with anything else installed on the system. This is an excerpt from our internal laim saltstack state that deploys the handler:

include:
    - laim


laim-slack:
    cmd.run:
        - name: /opt/venvs/laim/bin/pip install requests==2.22.0
        - unless: /opt/venvs/laim/bin/pip freeze | grep requests==2.22.0
        - require:
            - pkg: laim
        - watch_in:
            - service: laim

    file.managed:
        - name: /etc/laim/handler.py
        - source: salt://laim-slack/handler.py
        - watch_in:
            - service: laim

If it sounds useful, take it for a spin and let me know if you run into any issues.

Initial connection security in the cloud

So you’ve started your first virtual machine in the cloud somewhere, and you’re ready to make something people will love. You just need to put it out there, so you open up your terminal, type in the magic letters ssh and some IP address, and suddenly you’re in some cypherpunk dystopia and have to wrangle with the moral dilemma of whether to trust an arbitrary sequence of hexadecimal integers and wonder if your dog will ever look at you the same if you just close your eyes and type yes, just this one time

What did you just risk? Does using modern technology always come with such pain in the beginning, that eases as you learn that this is the way it’s done? Are there any tools that could help you use the cloud that doesn’t end up with you questioning whether you are indeed a good, trustworthy person, deep down? I’m no therapist, so while I might not be able to provide solace, I hope I can provide some understanding, and a bumpy path forward.

Risk

What risk do we run when we accept untrusted ssh connections to cloud servers? If you are using password authentication over untrusted connections with ssh, it is possible for a man-in-the-middle (MitM) to snoop on everything sent over the connection to your server. They’ll answer your connection as if they were the server, just presenting their own ssh public key, and you’ll accept it since you don’t know the actual public key of your server. The attacker will receive your password, use that to authenticate to your actual server, and transparently connect the two of you, listening to everything that is sent. They also now probably have root access to your server and can install whatever backdoor they desire.

However, if you are using public key authentication, you are no longer sending any secrets over the wire, and this attack does no longer work, but the attacker is left with one more trick. Instead of snooping on the connection to the actual server, the attacker can now just claim they are the actual server. Since for an initial connection you probably just created the server, the attacker only needs to present you with a seemingly fresh box and you’ll be none the wiser that this isn’t the actual server you just created. Chances are that you’ll have sent something of value, like TLS certificates, or database credentials, or maybe you configured a new user with your password, before you notice that something is wrong. Maybe you’ll just write it off as a somewhat expected failure of cloud services, networks are best-effort services anyway, and create a new box, and this time everything worked. Oh well, probably all right then. Depending on what the attacker was able to get from you with their fake server, you might have fully compromised your organization. Maybe the attackers server was given access to your puppet/salt/chef master, and now has access to all your configuration data and secrets. Maybe the attacker now has access to your intranet and less-than-perfectly-protected internal services.

Two things should be clear by now, we do need a way to tell that a server we’re connecting to is the server we expected, and we should really not use password authentication, especially to untrusted servers. The latter is usually easy, since most cloud services will not enable password authentication for ssh and require you to configure a public ssh key on the box before you start it. Thus our primary threat is that the server we’re connecting to is not the server we just created. If you use tools like terraform or salt-cloud or something similar to automatically provision your cloud services, you might be at even greater risk, since you’ll be creating resources more often, and every new connection is another chance for an attacker to impersonate an actual server. And if you haven’t guessed it yet, none of the common cross-provider cloud tools like terraform, libcloud, salt-cloud and probably more, verify the initial connection. They mostly trust that the first connection is not tampered with, and will save the public key presented for later connections. This is the common policy for ssh, known as TOFU (trust on first use).

Quoting Raymond Hettinger, “there must be a better way!”.

A way forward

In an ideal scenario we’ll get the public key from our cloud provider before connecting to the server, enabling us to validate that the connection is indeed solid before we hand over the keys to our kingdoms. As a backup plan, if we can put a something only we know on the server before we connect to it, and verify that it’s there before we send anything else, we can protect ourselves from sending anything of value to the attacker in the event that we connected to the wrong server. The latter can be accomplised by using cloud-init, a common tool for running a script or adding services when a server first boots. If we create a nonce locally before creating the new server, add a cloud-init configuration to place that nonce in a known location on the filesystem, we can verify after connecting that the file does indeed contain our nonce before we do anything else. If it’s there, we reached the correct server, and future connections can trust the public key that was presented. If it was not, odds are you have an attacker on your network and you have just connected to their server. Abort the connection, remove the public key you accepted from your ssh known hosts, and talk to your security team.

So where does the common cloud providers stand when it comes to these capabilities?

Provider capabilities

To break down what we’ve learned so far, we can establish a trusted connection to a cloud server only if we can either get the ssh public key from the provider, OR if they provide cloud-init or similar tools for the initial connection and only accept connections using public keys. So what are our options?

Checking the docs for a couple of common providers yields the following capability table:

ProviderGet public key?cloud-initRequires pubkey auth
Amazon EC2Yes1YesYes
Digital OceanNoYesNo2
Google CloudNoYes3Yes
Microsoft AzureNoYesYes
LinodeNoYes4No

1 Using the get-console-output API (docs)

2 Unless you’ve added one of your ssh keys to the instance when creating it, Digital Ocean will email you a password you can use to connect as root. Never use this feature.

3 Called startup scripts

4 Called StackScripts. Since there’s no way to upload your ssh public key to the instance before it starts you also have to use StackScripts to add your public key to the instance.

In other words, there is hope! While AWS is the only provider that enables getting the server public key programmatically (as far as I’m able to tell), all the others do provide a way to run something similar to cloud-init. Linode doesn’t make secure connections easy, but you can also ssh into the box through their ssh gateways, for which you can get their key fingerprints, but that requires authenticating with your account password instead of ssh public keys, and then connecting with the root password for your box from the gateway.

From this small bit of research it seems it is indeed possible to securely connect to cloud servers, but like mentioned earlier, most of the tools to automate cloud provisioning does not utilize these capabilities. But we now know what has to be done, go make some pull requests!

Resistance-era password storage

This post comes to you in three parts: An introduction to the most underutilized feature of argon2, introducing a python project to utilize that feature, and a rant about the current terminology on password storage.

argon-wat?

argon2 is the algorithm that in 2015 came out as the winner of the Password Hashing Competition. The goal of PHC was to find an algorithm that was thoroughly analyzed by experts so that we could standardize on for storing passwords, since there was an abundance of different schemes at the time.

The key tenet of modern password storage is assuming that your database will leak at some point, and that the storage scheme should ensure it’s as expensive as possible for an attacker to reverse your users’ passwords. While we as defenders run our chosen scheme on servers, an attacker can pick a setup optimized just for crunching passwords, ie. a custom circuit built for the purpose (ASIC), FPGAs or a multi-GPU beast. “As expensive as possible” in this world means trying to ensure our scheme runs optimally on our setup, not on the attacker’s.

And on our setup, we don’t need an answer immediately. While an attacker probably has millions of hashes to try to reverse, we’re only processing one at a time. Granted, we have a user on the other end that’s waiting for us so that they can get on with their life, but we can afford to let them wait for a couple of milliseconds. Bumping the processing time from microseconds to milliseconds doesn’t really impact our user in any way, but we literally just ensured the attacker needs to bump their budget 1000x to reverse the same amount of passwords they did before our speed bump.

Something else that’s available to us on our servers is memory. Logins are fairly infrequent compared to return visits, thus we’re not overly concerned about making the process require, say, 512kB of memory instead of just a hundred bytes or so. In that swift stroke we just bumped our attacker’s budget with another 5000x.

Servers also have multi-core CPUs available to them, thus we can also ensure the process uses, say, 20 threads, and that’s another 20x budget bump for our attacker, who’s at the moment crying after having invested in a $1000 password crunching setup before realizing they need another hundred billion dollars to fund a $104,840,000,000 setup to reverse our new hashes.

While we can pat our backs and be quite confident attacker’s won’t be able to reverse all of our hashes, an attacker might not be interested in all of our hashes. In a leaked database every hash will probably be associated with an email address, and to most attackers a .gov, or .mil email is probably significantly more interesting than most others. For many setups, the attacker only needs one password to fully compromise an organization, as that password might open up an email account that grants the rest of accesses needed to compromise an organization.

Also aiding our attackers is our fallibility as human beings to remember 20+ character pseudo-random strings for every website we use. Password managers is the bane of every hash-reversing attacker, but they are far from ubiquitous yet. No matter how expensive we make it, it’s never going to be hard for an attacker to reverse “password123” and “” passwords. Oh, you don’t think an attacker knows your address? Why, they have your email address, odds are it’s possible to figure out who you are and where you live from that. Thus assume that if you’re an interesting target, they’ll be guessing your password with every combination of words from a dictionary built on words from your language, your address, your family member’s names, your pets, the place you work, your interests, and so on.

But we can end this. All of our approaches so far to making it expensive have been about leveraging the differences in our setup to our attackers. An in 2017, the vast majority of passwords we have are for a web service. For most setups that means some application code that manages access to a database. Which means that we have a logical layer in front of the storage layer. By utilizing this we can ensure that even if our storage layer is compromised (potentially through a bug in our logic layer), even a single password hash is irreversible.

argon2 enables us to do this, by folding in an arbitrary secret into the process, ensuring that an attacker without that secret cannot reverse the hash. Practically you can think of it as instead of reversing one password, an attacker now has to reverse two: both our user’s password and one we choose. And we don’t have to remember ours, thus we can generate a 30 character random string, and as long as we keep this value out of the database, our attacker is stopped dead in their tracks.

That is of course assuming that an attacker only has compromised your database. If you’re entire infrastructure is compromised, the attacker will be back to the $100B setup and can reverse hashes, albeit slowly. We’re no worse off than we were before introducing the secret. But if only the database is compromised, like through sadly-still-the-most-common-vulnerability-on-the-web SQL injections or a leaked database backup, not a single password will be reversed from your hashes.

We can also ensure that even if our secret is leaked, it doesn’t impact all of our hashes. In a naive setup, rotating the secret would invalidate all of your passwords, forcing your users through the password reset mechanism to set a new one. argon2 however enables storing a secret identifier in the hash, so that we can know for each password which secret was used. Thus if we rotate our secret regularly and the secret is compromised in July, and our database is compromised in September, only users who didn’t log in to the site at least once in the period between the two events can have their hashes reversed.

This is possible since whenever a user logs in, we can re-store their hash if the secret currently used for them is not the latest. We have to keep all old secrets around to enable verifying passwords with old secrets, but whenever we store a new one we’ll use the latest we have.

Introducing: The solution to all of your problems

*Sniffs* Do you smell cinnamon?

The only sad part about it all is that the secret argument you need to pass to argon2 is not exposed through the high-level APIs in the reference implementation, which has led to the most common argon2 libraries in most languages not enabling this functionality at all. This is where kittens and puppies cry. However, with an aptly named package I’ve recently published to PyPI this is now available to you if you use python:

from porridge import Porridge

porridge = Porridge('myfirstkey:myfirstsecret')

def create_user_password(user, password):
    user.boiled_password = porridge.boil(password)


def validate_user_password(user, password):
    valid = porridge.verify(password, user.boiled_password)
    if valid and porridge.needs_update(user.boiled_password):
        create_user_password(user, password)
    return valid

Yeah yeah, I heard your “Woooo… dafuq?!?” Skip to part three if you want to read about my motivation for the naming, but in the meantime, assuming you have a fairly normal setup, the code above does everything we talked about above. It wraps argon2 with strong parameters for memory usage, time and parallelism, and uses ‘myfirstsecret’ as the secret. When it’s time to rotate the secret, initialize Porridge with mysecondkey:mysecondsecret,myfirstkey:myfirstsecret instead. This also re-stores the hash on logins if either the secret has been rotated or the parameters for memory, time or parallelism has changed. Currently available in your favorite cheese shop.

If you’re not on python, tough luck. I’m hoping someone will make similar packages available on other platforms as well, you only need a class with three methods, so it’s not a big effort, but you do need to build and interface with native code, which is most of the hassle. In my project, there’s more setup and build code (~1000 lines) than actual python code (~700 lines), but that includes building wheels for all major platforms and all recent versions of Python to ensure it installs cleanly on Linux, macOS and Windows, even without a C compiler. Bon appétit.

Now to part three, in which I’ll be explaining the reasoning behind the naming seen in the module. Opinions ahead.

There are only two hard problems in computer science…

…naming things, cache invalidation and off-by-one errors. This is about the first one.

I’ve always hated hearing the word “hash” when we’re talking about passwords. A hash is a very well-known concept used often when writing software, but when we’re storing passwords, hashing them is the last thing you want to be doing. The process does indeed utilize a hash function somewhere, and it does have the same one-way properties as hashing does, but all the other functional requirements are stark opposites. Hash functions should be as fast as possible and require as little memory as possible, password storage should use as much of both as feasible in a given environment. Passwords should also be salted, use server-side secrets, and a lot of other stuff you’d never think of yourself if you had no information security background and saw the two words “password hashing”.

There is the concept of “key stretching”, which largely only covers the time aspect, and has to be combined with salting and memory-hardness to be worthy of anything. Thus since I haven’t been able to find any better terminology for password storage, I made something up.

“Boiling” was what I came up with, which is a close enough analogy to be useful. It’s a slow process, requires several ingredients (often salt), and is usually a one-way process, once the ingredients have been boiled, they’re hard to separate again. That is decently close to slow, salted hashing for a well-known concept. Naming conflicts should also be few, there’s no known project that combines “boiling” and “passwords”, so it should be hard to confuse with something else, and easy to find with search engines.

This, combined with my agreeing with Eran Hammer on module naming, led to the porridge module, in which I boil passwords. If anyone comes across a table “boiled_password” in my database or overhears anyone talking about “boiling passwords” at a conference, googling the term makes it impossible to confuse with plain hashing of passwords.


One final note, remember that security is never a binary thing, and authentication on the web is complex topic. Just because you do everything that this post recommends for storage, it’s only a small component in the field of passwords. You’re still going to need rate limiting, password policies, U2F/2FA, secure resets, etc. Hire someone who knows how this works to design your system.

tl;dr

Calling password storage “hashing” is dumb because it’s too easy to confuse with plain cryptographic hashing, call it “password boiling” instead; use porridge to boil passwords to also include a server-side-only secret to prevent reversal if your database is leaked.

Thanks to Nicolay Broby Petersen and Kyle Lady for reading drafts of this post and providing valuable feedback.

Dualities

A lot of jobs exist in a field where you get to combine several interests. Take photography. It’s impossible for photography to exist in a vacuum, something has to be depicted, which is determined by the photographers interests. Portrait photographers are drawn to people, and are great communicators, being able to tell the stories of the people they portray while getting them to tell their stories, relaxing in front of the camera, etc. Nature photographers love showing people the beauty that surrounds us if we only go look for it and spend time outdoors.

Software engineering is the same. Software is always applied to solve a problem that depends on the engineer; maybe it’s better tools for communicating, creating music, or making society more effective. Understanding both the craft and the industry it’s applied to is crucial for success.

I think this duality between the craft and what it’s applied to exists in a lot of fields. If you study law you can apply that to anything from civil rights to litigation to copyright enforcement. Civil engineers can work on anything from infrastructure like roads and bridges to buildings to dams. Electrical engineers can choose to design for drones, for medical implants, for spacecraft and power plants.

Most companies gather all sorts of people with competency in different crafts to work on the same problem in a given industry. For example a company making drones will collect software engineers, electrical engineers, optical engineers, mechanical engineers, designers, radio engineers, salespeople and marketers who all specialize in aerospace. Other companies, like consultancies, specialize on the craft, supplying for example software engineers, that might or might not carry existing experience in an assortment of industries.

The craft and the industry might change during the course of a career, and this is a great way to expand your skillset and gain a fuller understanding of the different nuances of both the craft and the industry. If you start as hobbyiest musician, maybe you’ll become more obesessed with the mixing stage and become an electronic music producer. You’ll get a lot of experience with the software tools in that industry from the consumer side. If you later transition into a software developer making music productions tools you’ll create better software than an equivalently skilled engineer without that background. Maybe you’ll later do software engineering outside the music industry, transitioning to movie production tools. You’re now in a position to merge the best elements of three crafts over two industries in your work.

I think it’s great to embrace diversity in teams, but also look how you can diversify yourself. I find that it strengthens your empathy skills, but also enhances your sense of self-accomplishment and fullfilment.

The smaller the company you work in, the more crafts related to the relevant industry you might have to perform. While small companies (which can be only a single person) will inherently be able to work on “smaller” problems that big companies, they’ll often be way more agile. Not only because of the fewer communication links between the people solving the given problem, but also because those people will likely cover multiple relevant crafts each, enabling them to solve more problems without communicating at all. Communication will also be more efficient due to the higher empathy on both sides.

Personally this is the primary reason I prefer smaller companies. Empathy is often greater, people are more efficient, problems are solved faster, and you grow personally in several crafts. I also love the higher influence you have in such companies, enabling them to have a stronger voice and character outwards, compared to companies that always have to compromise and often becomes committee-driven and bland.

I do think however that larger companies often can stimulate much of these effects in their teams, and that they would benefit from doing so. One such technique would be seating people of different crafts together. Instead of grouping all software engineers in one place, the designers in one place, and the salespeople in another, seat these people together based on what they’re working on, creating a mini-company inside your company. Basically, structure departments by product (or industry) instead of craft. The engineers will thus have a shorter path to the designers, and hear more from support about what problems the users are facing. Designers will gain greater influence as they can be part of more informal discussions that affects the user experience.

Always focus on people.


Much of this has been said in other forms earlier, I liked Effective DevOps by Jennifer Davis and Katherine Daniels on building cultures for collaboration; Daniel Burka’s take on “Everyone is a designer. Get over it.”; for a good take on other ways bigger companies can preserve a unique voice, read John Saito on microcopy; John Green always talks about reading broadly; Jeff Bezos’ two pizza rule is a good technique to reduce the consequences of Brook’s Law.

MitM-ing Postgres

I recently had to set up a new postgres instance at work. While we’re already using Heroku Postgres for production loads, we had some requirements for this one that couldn’t be solved without being database admin, and thus had to go self-hosted. That process in itself is fairly straightforward and not worthy of a post in itself, but this instance was one we’d use to run ad-hoc analytics queries on, and thus had to be able to log on to from anywhere. Also not a problem in itself, Postgres can run all connections over TLS, thus we’d be able to both securely authenticate ourselves to the server and verify that we’re talking to the correct server. Just a matter of creating a self-signed certificate, configuring it with the key on the database and put it in ~/.postgresql/root.crt, and use ?sslmode=verify-ca when we’re connecting. But as with everything, this is easy in retrospect – as I started on this I had no idea this was how Postgres did certificate validations. But now I know a bit more, so indulge me for a minute and let me try to explain how this all fits together, then we’ll see how some Amazon RDS and Heroku Postgres compare in terms of authenticating their connections, before we try to break stuff.

As with most TLS-based protocols, the onus is on the connecting party to authenticate the server. The server dictates whether it accepts authentication over unencrypted connections or not, but how the connection is established and verified is up to the client. When the client connects it picks a mode to use when verifying the connection, specified with the connection parameter sslmode, which defaults to prefer. From the excellent documentation on TLS modes we see that this simply first attempts to establish a TLS connection, and if that fails falls back to connect in the clear. We can specify looser modes like allow and disable that will respectively first try plaintext and then TLS, or disable it completely. But simply connecting over TLS doesn’t give us any indications about the trustworthyness of the connection, because we haven’t specified a trust root to use. The most familiar TLS-tunneled protocol people know of is HTTPS, where the trust root is usually pre-installed on your operating system or web browser, and includes hundreds of certificate authorities (CAs) which can sign certificate requests for any domain on the internet. For Postgres there is no such collection of certificate authorities, thus you have to manually specify your trust root. If you don’t put anything in ~/.postgresql/root.crt or point the connection to a different root through the sslrootcert parameter, any certificate presented will be considered valid (if using any mode aparty from verify-ca or verify-full).

This means that in the face of an active attacker, the default settings provide no guarantees of either confidentiality, integrity or authenticity (the CIA triad). And to be clear, your threat model doesn’t need to include three-letter agencies before active network attackers become a problem, anyone with a $99 Pineapple can probably position themselves in the middle of your traffic if you ever use WiFis without a VPN. I’m guessing that includes quite a lot of developers that like to work out of coffee shops, co-working spaces, and any other location where it’s reasonable easy for anyone to get access. Thus we must do something to authenticate our connection, which luckily is as easy as putting the database’s certificate in ~/.postgresql/root.crt. This will authenticate all outgoing connections from your machine against this trust root, so if you only have an established trust root for some of your databases (and for some reason can’t establish a trust root for the rest), put the certificate somewhere else and use the sslrootcert parameter to indicate to psql where your trust root is. This illustrates the difference between the modes prefer, require and verify-ca, the two former modes will authenticate the connection if a trust root is present, but will accept in anything if a trust root is absent. verify-ca will fail without a specified trust root.

The difference between the modes verify-ca and verify-full is less obvious and there’s only a difference if you have multiple databases. If you only have one database you’d just generate a self-signed certificate for it and specify that as your trust root, verify-ca would ensure that the only accepted connection would only be to that database (or potentially to multiple databases sharing the same certificate and key). But if you have multiple databases you might find that managing the trust root becomes burdensome since it must be updated everytime a database is added or removed, and thus you should probably create your own certificate authority to sign database certificates. Anyone can be their own certificate authority, it’s just a matter of creating a self-signed certificate with privileges to sign other certificates and put that as your trust root, and now any database that can authenticate itself with a certificate signed by your CA will be accepted. However, with mode verify-ca this does not guarantee that you’re talking to the database you tried to connect to, only that the database has a certificate issued by the CA in your trust root. If you tried to connect to db1.example.com, the certificate issued to db2.example.com would be considered valid for that connection. If these provide the same service you’d probably trust them both equally much, but if db2 has to be publicly accessible while db1 is internal only, db2 has a much higher risk of being hacked and thus have the private key for its certificate compromised. In that case you would not trust them equally much, and either split the trust so that the two are issued by different authorities and thus have different trust roots, or you need domain validation as well, which is provided by the verify-full option, which ensures that the hostname you’re trying to connect to is the same as the Common Name (CN) field in the database’s certificate.

Comparing Amazon RDS and Heroku Postgres

While there are lots of companies offering hosted Postgres, I’m only going to compare two popular choices (because I’m lazy), Amazon RDS and Heroku Postgres. RDS makes it fairly easy to do things correctly, on the instance details page there’s a group dedicated to security options, where you can see the CA that signed the instance’s certificate.

Unfortunately there’s no link to download the CA certificate, but a quick search will lead you to this page, which explains how to download the certificate and configure the connection. I wrote a small script to dump the certificate presented by a database here, let’s use that to take a look at the certificate our RDS instance presents:

$ ./postgres_get_server_cert.py mitm.chjg7xaetv8u.eu-central-1.rds.amazonaws.com
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            ab:eb:fd:ef:19:b0:83:df:bc:44:b1:3e:52:9f:71
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, ST=Washington, L=Seattle, O=Amazon Web Services, Inc., OU=Amazon RDS, CN=Amazon RDS eu-central-1 CA
        Validity
            Not Before: Jun  5 04:50:59 2016 GMT
            Not After : Mar  5 22:03:31 2020 GMT
        Subject: CN=mitm.chjg7xaetv8u.eu-central-1.rds.amazonaws.com, OU=RDS, O=Amazon.com, L=Seattle, ST=Washington, C=US
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (1024 bit)
                Modulus:
                    00:8d:29:7c:de:a2:bf:69:28:61:63:70:e2:19:c9:
                    da:d3:66:14:0a:50:e6:fc:8c:42:21:e8:9b:74:dd:
                    f8:42:31:1c:4d:b4:37:89:30:91:eb:db:1c:2d:7d:
                    bc:12:01:be:0f:04:8d:fe:11:69:ee:b3:e1:3f:cb:
                    <...>

The database certificate has the correct hostname in the Common Name, and you’re encouraged to use sslmode=verify-full in the docs. Apart from using sha1 and 1024-bit RSA in 2016, everything A-OK for RDS in terms of configuring their Postgres connection.

Now let’s turn to Heroku. Heroku runs on AWS, and thus should provide equally good security, right? Not entirely. Well, not at all, actually. Let’s take a look at the certificate for our Heroku instance:

$ database_url=$(heroku config:get DATABASE_URL)
$ ./postgres_get_server_cert.py "$database_url" \
> | openssl x509 -noout -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 11658775476923737329 (0xa1cc4a86821324f1)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=ip-10-99-YY-XX.ec2.internal
        Validity
            Not Before: May 12 22:27:13 2016 GMT
            Not After : May 10 22:27:13 2026 GMT
        Subject: CN=ip-10-99-YY-XX.ec2.internal
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:cc:10:26:b4:b5:c7:e4:45:08:ce:c3:0c:23:25:
                    0d:cd:40:9b:05:ed:e2:fb:df:69:89:8d:35:1c:38:
                    47:98:72:a8:44:47:a1:47:95:ac:64:19:66:69:97:
                    d2:5f:03:46:17:00:61:d6:38:77:2d:31:33:ca:2d:
                    b0:53:82:cf:09:02:37:87:f6:73:53:ab:1a:f2:04:
                    <..>

Well, at least they do sha-256 and 2048-bit RSA, right? Apart from that we see that the database has a self-signed certificate (issuer is the same as subject), which practically means that there’s no reliable trust root to use when connecting, as Heroku moves the databases around between hosts as part of maintenance. Thus adding the above certificate to your trust root would break your app once the database is moved around. We can see this from the Heroku Toolbelt too, here they set the PGSSLMODE environment varible (equivalent to the sslmode connection parameter) to require, before forking out to psql, without any specified trust root. This is particularly bad for Heroku, because unlike Amazon, they provide tools like heroku pg:psql to easily get a database prompt from basically anywhere, which is very convenient for doing ad-hoc stuff on the database. Problem is, since this is a CLI tool it’s very likely it’ll be run from developer laptops and not only from the dynos, and thus is very susceptible to a Man-in-the-Middle (MitM) attack on the connection. This is fortunately fixable, if Heroku creates their own CA like Amazon has done, they can sign their databases’ certificates and ship the CA certificate in an update to the Toolbelt, after which all connections through pg:psql can be authenticated with full verification of both CA and hostname. If they also provide the CA certificate through the database details page on the dashboard, other external applications could also authenticate their connections to the databases, but until then they recommend using heroku run bash -c 'psql $DATABASE_URL'. Running psql through bash on a dyno will ensure that the TLS connection is established on their network, which means that an attacker would have to be between the dyno and the database, a harder challenge than being between your laptop and the database.

Proof of concept

But while words are nice and well, let’s see if we can demonstrate the issue further and develop a proof-of-concept of an attack on a Postgres connection that doesn’t authenticate towards a trust root. Since we’ll be hacking ourselves for demonstration purposes, we can avoid the step of trying to position a device in between ourselves and the database, and rather instruct our app to connect through our MitM-service and see if we can establish a connection to the database. If the MitM-service works and we don’t provide a trust root for the connection this should be entirely transparent to the application, but the MitM-service should have captured the credentials needed to connect to the database. This means that we need to know a couple of things; firstly how the server authenticates a client, and secondly how the wire protocol looks to be able to parse messages from both the client and the database.

While postgres supports many ways of authenticating clients, including Kerberos, client TLS certificates, LDAP and PAM, the two I’m assuming are most popular are password and MD5. Both of these require the client to prove their identity through a password, but while the former method sends the password in plaintext on the wire, the latter performs a double-hashing of the password with a connection-specific salt, preventing both sending the password in the clear and replay attacks.

While MD5 is not the strongest hash algorithm available, I have no desire to try to brute-force Heroku’s 26-character passwords from a 64 character alphabet, since that equals roughly 2^156 bits of entropy. Luckily you’d on average only have to try half of the candidates before you find the correct one, but that’s still nothing I can complete before the next episode of Game of Thrones airs tonight. Thus since we’re going to impersonate a server anyway, let’s just ask the client for the plaintext password, and then compute the hash ourselves if the database asks for it. Thus the setup will look like this, with TLS on all connections:

+-----+                +------+                     +-----------------+
| App | -- Password -> | MitM | -- MD5(password) -> | Heroku Postgres |
+-----+                +------+                     +-----------------+

So how does the Postgres wire protocol look? A presentation by Jan Urbański at PGCon 2014 describes the relevant details, all messages exchanged are of the format |char tag|int32 len|payload|, except for a special startup message which determines which version of the protocol will be used and sends connection details like username and target database, which is on the format |int32 len|int32 version|[payload]|. There’s three major protocol versions supported by the database, but we’re only going to bother with version 3.0, which is currently the latest version, introduced in 2003. The protocol version is encoded with the major version in the upper 16 bits of the int, and the minor version in the lower 16, meaning the format allows 65536 major and minor versions. Should probably be enough for a couple of years still, given that Postgres has only gone through three major protocol versions in its 19 years[1] of existence. There’s also the magical version 1234.5679, which is used to signal that the client wants to connect over TLS.

The payload of the startup message is null-separated key-value pairs, with a trailing null. If the database is configured to require authentication for the given database it’ll respond with an authentication request (tag 'R'), specifying one of the possible methods mentioned earlier. Heroku’s Postgres only asks for MD5, thus that’s the one we’re going to implement. The client responds with a password message with tag 'p', and a payload which in the case of MD5 will be a hash md5(md5(password + username), salt). These three messages concludes what we have to understand to MitM the connection, since after we’ve intercepted the 'p' from the client and encoded it as a MD5 'p' to the server, we simply forward the response from the server to the client, letting it reply with whether the password was accepted or not. We could inspect this message to determine whether the password was valid or not, but that’s left as an excercise for the reader.

We’re thus playing both sides of the initial phases of the protocol, we’re performing the server side with the client until it has sent us the password, and then we perform the client part of the protocol with the submitted password to the actual database. After that we don’t care much, but there’s lots of fun that could be had at this stage, like replacing all urls in a response with lolcat pictures, subtracting 1 from all numbers or whatever you find fun on a Sunday evening.

The script is provided here. To MitM our own Heroku connection we’ll give the script the IP of the database, and then route all local DNS for the database to localhost. If you want to test this with other applications simply give the script the hostname of your database, and configure your app with localhost as the database host.

The example database I’ve configured for the project has hostname ec2-54-163-238-215.compute-1.amazonaws.com, which conveniently encodes the IP as part of the hostname, thus I’ll start the script like this: ./postgres_mitm.py 54.163.238.215, and I’ll put the following line in /etc/hosts: 127.0.0.1 ec2-54-163-238-215.compute-1.amazonaws.com. This will ensure that when psql tries to look up the IP address of the database it will be directed to 127.0.0.1, but since our script is started with the target IP it will be able to connect.

How does this fare?

# Terminal 1
$ heroku pg:psql
---> Connecting to DATABASE_URL
psql (9.4.6)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-ECDSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

mega-mitm::DATABASE=> select user;
  current_user
----------------
 fxutzshsavlfyj
(1 row)

# Terminal 2
$ ./postgres_mitm.py 54.163.238.215
2016-06-05 01:16:15,740 [INFO] Listening for connections
2016-06-05 01:16:40,809 [INFO] Intercepted auth: postgres://fxutzshsavlfyj:Yp1DA<..>eMOz7JfAtu@54.163.238.215:5432/d4ajdorhb758hq

Success! Let’s see how it turns out if we add the database’s certificate as the trust root.

# Terminal 1
$ ./postgres_get_server_cert.py 54.163.238.215 > ~/.postgresql/root.crt
$ heroku pg:psql
---> Connecting to DATABASE_URL
psql: SSL error: certificate verify failed

# Terminal 2
$ ./postgres_mitm.py 54.163.238.215
2016-06-05 08:32:36,623 [INFO] Listening for connections
2016-06-05 08:32:46,759 [INFO] Client had an established trust root, could not intercept details.

Excellent, we can prevent the MitM by pinning our database’s certificate, since the random certificate presented by the MitM will no longer be accepted as valid. Note that doing this for Heroku means your app will suddently break when your database changes host as part of Heroku maintenance, do not rely on this in production!

Heroku has been made aware of these issues but have not given any indication as to when a solution is available, until then either use a VPN whenever you’re running heroku pg:psql over WiFi, or use `heroku run bash -c ‘psql $DATABASE_URL’.

[1]: Woot, Postgres turns 20 on July 8th.

Secrets and the cron environment

Don’t use cron’s environment variables for storing secrets. Like I did.

MAILTO=notactuallymyemail@thusoy.com
PASSPHRASE=K5mzQ6VMk1NpCQEGjakbgq80H678fsxpKeErO8aV # uh-oh
0 5 * * * duplicity --verbosity warning /home s3+http://mybucket/mybackupdir

It’s certainly often convenient and the quickest solution, but leads to a risk of them being compromised if you’re using email reports. Email reports are very practical to be notified of what is going with jobs on your servers, but cron is a bit too helpful in this regard and will dump every environment variable into the headers of the email. You can (and should) configure your email to only be sent over a secure transport, but this will not prevent headers from being stored on disk on the sending and receiving mail servers, as well as any intermediaries. You can (and should, if you’re paranoid) add end-to-end encryption of your cron emails, but GPG will only protect the body of the message, and will thus without any extra measures to scrub the headers leave the secrets in plaintext.

A better solution is to put the job in question into a script, and set the variables there, which will keep them out of the cron report. #hackyourselffirst

Easily compute DANE TLSA records

I’m not going to rant about CAs vs. DANE here, there’s tons of reading on that on the web already, but for a primer the DANE RFC is quite good. I’ll merely here point out two useful tools for starting to migrate away from CAs.

The first is a short snippet for creating the TLSA DNS record you’ll need:

$ echo 3 0 1 $(echo | openssl s_client -connect thusoy.com:443 | sed -n '/-B/,/-E/p' | openssl x509 -outform DER | sha256sum | cut -d' ' -f1)
depth=2 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Certification Authority
verify error:num=20:unable to get local issuer certificate
verify return:0
DONE
3 0 1 debe8c60a067a7a648b07c96708d0e224cf7c06fd58840d2083c305e2d70e9fa

Replace sha256sum with shasum -a 256 if you’re on OSX.

Replace thusoy.com with whatever domain you want to compute the DNS record for. The important part here is seeing return:0 before the DONE, and then whatever follows you put as a TLSA record under _443._tcp. and you’re good to go!

Note that this grabs the certificate over the web and verifies it with the CA system, if you don’t have a CA-signed certificate or don’t want to rely on it to compute your TLSA record, it gets even simpler:

$ echo 3 0 1 $(openssl x509 -in  -outform DER | sha256sum | cut -d' ' -f1)

The other thing I wanted to share is this DNSSEC/TLSA Validator plugin, which gives you the status of DNSSEC and TLSA records directly in the address bar of your browser:

Note that it’s merely a status display, the connection is still validated with the CA system and will not be terminated if the TLSA record doesn’t match, but at least it enables developers to deploy DANE in preparation for the time when browsers finally enable it in their core. Mozilla has down-prioritized this since 2011, but if a larger percentage of sites can properly authenticate themselves using DANE it’s harder for browsers to ignore the issue.

Pure python crypt(3) hashes

Ever been frustrated that you can only create crypt(3) compatible password hashes on *nix machines? Me too! But I figured, the crypt(3) implementation in glibc can’t be that hard to port to python, so I gave it a go, and can now present the Python package pcrypt. pcrypt uses a pure-python implementation of the crypt(3) algorithm, and is nearly 5 orders of magnitude slower than the glibc variant. However, it runs on all platforms and features a nifty CLI to quickly get compatible hashes, so it solves my immediate needs.

As a small background, crypt(3) (that is, the crypt C function found in section 3 of the man pages) is the function that hashes user passwords and stores them in /etc/shadow (and /etc/passwd before that) on most *nixes. The exact algorithm has changed over the years, from initially being the password encrypted with DES using the password itself as key and a two-character salt, through MD5, Blowfish to the SHA2-based variants most commonly used today. The modern spec is the password hashed with 5000 rounds of SHA512 and a 16 character salt,with a configurable number of rounds. The exact algorithm we use today was first designed by Poul-Henning Kamp in the nineties, using MD5 and 1000 iterations. This was very good for its time, but after a decade of Moore’s Law it didn’t hold up anymore. Red Hat started using a spec where the iteration count was customizable as part of the salt, and Ulrich Drepper combined this with PHK’s algorithm to produce the spec crypt(3) uses today.

I noticed that there has been a couple of requests for a similar pure JavaScript implementation of the algorithm, which should be doable. Being able to create hashes entirely client-side in the browser is enticing, so I hope someone will give it a go. If you do, the Python version should be close to the level of abstraction of a JavaScript version and might make adoption easier if you decide to give it a go.

Privilege escalation through .bashrc

Privilege escalation is the art of going from having access to an unprivileged user on a system, basically everything non-root, to having access to root, and thus the keys to the kingdom. These unprivileged accounts might be the accounts running your web server, mail server, application, or normal user accounts like the one you ssh in to. The latter is attractive to attackers, as you have a fair chance of being on the sudoers list, and thus capable of getting root if you can provide a valid password.

If an attacker has managed to get access to your user but doesn’t know your password yet, one vector to attaining your password is to trick you into giving it away, by having you type your password into a application you think is sudo, but is actually a malicious one planted by an attacker. This is easily accomplised since bash allows you to provide your own personal command aliases, which the attacker can set for you. sudo might be set as an alias to capture_pw.sh, which will emulate the sudo prompt and capture your password. Your password might then be saved to a file for later retrieval or automatically submitted to the attacker. You will not notice anything, since the script after having captured your password will pass it along to the actual sudo application and probably delete all traces of itself. Proofs of concepts of this is easily available on the Internet, like this one.

The vector is easily accessible to attackers since each user can define their own aliases in .bashrc or .bash_aliases. As with most things security, the solution is a tradeoff between usability and security. You cannot allow an unprivileged user to modify your environment, thus you need to revoke your own rights to do so. All files sourced on login (/etc/profile, ~/.bashrc, ~/.bash_aliases and potentially more) needs to be owned by root and only grant yourself read access. This in itself is not enough however, as this merely prevents modifying the existing file. Deleting a file on linux is an operation on the parent directory, as deleting a file is equivalent to removing it from the directory’s index. Thus a user with write-access to the parent directory might find your root-writable .bashrc, delete it, and create a new one in it’s place, granting the attacker write-access. In conclusion, you cannot have write-access to your own home directory to prevent this attack vector.*

Does this sound implausible? It shouldn’t be. Getting access to a system without the password is possible everytime you leave your computer without locking the screen, lose an unencrypted ssh key due to theft or loss of laptop or phone, have your ssh password brute-forced but have a different one for sudo, and a lot of other options. This fix will not protect you from all privilege escalation, a dedicated attacker that has gotten access to your system will have a ton of other ways to try to get root, but this is a relatively easy fix to a very easy attack. Always think of security in depth, put as many layers between an attacker and full compromise as you find convenient. To prevent a breach you need to have a higher tolerance for inconvenience than your attacker has patience. Evaluating your attacker’s patience is one of the hard tasks you’ll have to complete yourself.

  • You could also achieve the same effect through an SELinux policy, which could prevent deleting only the files sourced on login and not affect the rest of the directory.

Signing mail

If you’ve been on the Internet long enough, you might have been unlucky enough to be exposed to this UI decision from the 90s, the notorious PGP chunk:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Howdy,


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13

iQIcBAEBAgAGBQJVZdlnAAoJEJyPPH5ga5Kwsi8P/R/dsUWld8oe6yc+Cnfdkufc
y0fIlLhyWLqaZVEv3zye4O7gw3s3LO91PWSbNqIax7aUZee+NmamZEKFSk7dKO+g
Lu6B4u3SxzNgeCfBPcexKS8S9ARBUuRr34Ow7U+l4rI6vNv85YBk8nVHgVcZL6vS
k4I0cc/nKRXV3m8HPlGWxUiaOdyMaMPYtQaxw7WcyiC3H/uuL58H1j2o2xoEtDBI
KghkjdI/N0fuYEpVTVQnd8+jDGVH+FwAjWJtGt/3RVzlH5APqkdHzlOwzerH0ORL
XUiEMFOsd6eez0zv4F+bfTBDBHsl4ih5Y7bwUH+A1fNLwP+SEbf0O+QO+mTHyTSf
e7H4MwsRyJfna4yCcnTH0s3yQ3DotFtRGgc32MrBBTIC0t2sJHtdL0yzAWCYZ3FU
6F3Yi6uAT/8V32FGiRB+NeFyaXPiZByg03Tv2yMkEUGanOXmDOKI9fTSn5MGhBPE
kLbYQmfSKxLT/iN/WGHtPAFNuQRFQ3hTkQ9NttSP+LMkuNOHx5ppEKho/bNTmMN3
ltd2c1XZN2VLoxtXAHSiUQ7yWxrh5aFHEaoSohufaAuilAo5rKs6/gQa9dmhbfUt
Kqr1zKqK2gMwjDE5JkV045tfTnfP48YwXTRxkFjPX+PqjwAQ15XytQVIigO5uQgw
Ak2Wt6X8KxBcNYA5JC1t
=ZKyE
-----END PGP SIGNATURE-----

Which is a good candidate for the worst UI decision of all time, and it’s utility is best summarized (as often is the case), by xkcd. Granted, PGP is one of few viable options for end-to-end encryption without inventing any new infrastructure, but you need to be aware that PGP offers no protection of metadata like sender, receiver, time sent, and subject of the mail.

So let’s consider, why do people sign their mail in this way, typically the privacy-minded and tech-savvy?

Disclaimer: This post does not concern using PGP for end-to-end encryption, using PGP to sign messages in other channels than mail, or any other use of PGP. This is only about PGP and signed mail. Also note that when I say PGP, I’m talking about all OpenPGP-compatible software, including GPG.

Tamper-proof mail

Mail that has been cryptographically signed will resist modification, as any single byte that has been modified will cause the signature to be invalid. This does however come with two heavy assumptions:

  • The receiver will actually verify the integrity of the message
  • A Man-in-the-middle capable of modifying the message during transmission will not simply remove the entire signature, rendering it as normal mail.

The first point includes having to first (securely!) exchange keys with the person you’re receiving the mail from. There’s the appalling option of just publishing your public key to a keyserver, but the fact is that even though most services have realized that you need to verify email addresses before using them for anything valuable, key servers – a service developed for exclusively for security-minded users – does not do this. Anyone can create a key claiming to be me, on any of the fifteen or so different email addresses I’ll reply on. If you trust Twitter or Facebook or any such site (odds are that if you’re this paranoid about integrity of your mail, you don’t), you could publish your fingerprint there, which could confirm that the key gotten from the keyservers is actually yours. Alternatively, if you have a web page (most people don’t), you could publish it there, granted that you serve your site over HTTPS, use DNSSEC to protect your domain, and trust that your registrar properly secures their DNS signing keys (or you do so yourself if you self-host DNS), you properly secure the private key associated to your certificate, you have complete trust in every single CA in the trust store, and that there’s no malware on sender or receiver’s computer who could snoop private keys or modify public keys.

A better option, key parties! Gather all your friends, drink beer and exchange public keys face to face, with no intermediaries or unreliable channels in between. Have you been to many of those? Me neither. I’ve studied information security for five years now, and I don’t know of anyone else in my class that even has a GPG key. I’m only using mine to sign uploads to PyPI.

As for the MitM threat, if I send you a message like the one above, it’s depressingly easy for an attacker to simply strip anything related to the signature, and the receiver would have no way of knowing that the message was originally signed unless she already knows that all messages from the sender is supposed to signed. For new parties communicating, that’s not feasible in the PGP system.

Authenticity of sender

Another worthwhile goal of signing mail, is to ensure that the sender is actually the person she claims to be. If you have the public key of the person, no one else can send correctly signed mail as that person. Many of the same concerns as for preventing tampering still applies, but this is a bit harder to subvert, since if someone is actually trying to confirm someone’s identity, you’d simply ask them to prove it. This could be the case if for example you start using a new email address, you could sign the first mail from the new address to let people know it’s actually you, which means that signatures will hopefully be checked and the system actually works like it should, assuming the parties have already exchanged keys.

A better way

New services are trying to improve on the key-exchange related troubles, which I applaud, but if we consider the mail case, the problem is for most practical purposes, solved. There are standardized, widely adopted and supported solutions, which solves the issues mentioned above (with some considerations). Thing is, we have a common name for incoming mail coming from forged addresses, it’s called spam. And spam has been around for long enough for us to establish a very solid grasp of how to prevent it.

SMTP, the protocol that actually exchanges mail between your mail server and mine, doesn’t have any way to authenticate that the claimed From: address is valid. Anything goes, as far as SMTP cares – anyone can claim to be anyone. Which is why spam started happening, and which is why we have other solid ways to verify this outside SMTP.

SPF

SPF (Sender Policy Framework) is a rather obvious way to prevent forged mail addresses – simply ask the sending domain whether the IP is allowed to send on its behalf. SPF is deployed by pretty much anyone who sends mail, as without it you’re destined for the spam folder, or to be rejected outright. If someone that checks SPF receives a mail claiming to be from someone@example.com, the receiver will issue a DNS TXT query to example.com and see if the IP of the sender is listed as an allowed sender for that domain in the SPF record. Not the case? No mail for you. This ensures that no-one can claim to be a sender on that domain, without having access to a mail service on that domain.

How about authenticity? Enter DKIM.

DKIM

DKIM (DomainKeys Identified Mail) works somewhat similar to SPF, in that it uses an already trusted channel (DNS), to exchange out-of-band information about mail. With DKIM, a domain publishes it’s public keys through DNS, so that any receiver can verify that the mail has not been tampered with since it left the senders mail server. And, in a startling contrast to PGP, DKIM includes the signature in the mail headers and not in the body, keeping it completely transparent to users, but being present if they want to check it manually.

But how about the risk we talked about earlier, where an attacker could simply strip the signature and the receiver would be none the wiser? We can handle that as well.

DMARC

DMARC is a policy and reporting tool, which you use to say stuff like “all mail from my domain should carry a valid signature, or be considered spam”. There’s three different policies you can apply to your domain:

  • Ignore: No special action will be taken if unsigned mail is received
  • Spam: Unsigned mail from me is spam
  • Reject: Unsigned mail should be dropped, bypassing the spam folder entirely

This policy is distributed through – you might see a pattern here – DNS.

DMARC is also about reporting, as the record allows you to specify an email address where you’ll get daily reports from receivers of your mail about how they were treated. Thus, if anyone tries to spoof a mail from your domain with a bad signature or lacking a signature, you’d be notified and would not be kept in the dark about how people try to impersonate you.

As all of these technologies rely on DNS, it’s quite essential for their secure operation that your DNS records are protected by DNSSEC. I’m not going to go into detail about how to configure DNSSEC, but most modern registrars will either provide it for you automatically or have some way for you to enable DNSSEC. Make sure you do.

Confidentiality

Contrary to popular belief, most mail sent between people today are encrypted in transit, at least if you’re using GMail or any of the other large providers. On my own server, which I haven’t migrated all personal mail to yet, but is used for most of my accounts, 72% of my mail was received over TLS/SSL. If you’re paranoid and running your own mail server you can configure it to reject non-TLS transmissions, but that will probably prevent you from receiving quite much mail. You’d also not be RFC-compliant, if that’s important to you.

If you have transit encryption, that encryption will also cover metadata of the message, keeping everything except the sending and receiving domain confidential from anyone listening in.

Mail configured with SPF and DKIM will look like this, where you can see that postfix has verified that the sender was valid according to SPF and opendkim have verified that the DKIM signature matches:

<...>
Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=209.85.215.54;
        helo=mail-la0-f54.google.com; envelope-from=<snip>@gmail.com; receiver=<snip>
Authentication-Results: thusoy.com; dkim=pass
        reason="2048-bit key; unprotected key"
        header.d=gmail.com header.i=@gmail.com header.b=niwa/5j1;
        dkim-adsp=none (unprotected policy); dkim-atps=neutral
Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54])
        (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
        (No client certificate requested)
        by thusoy.com (Postfix) with ESMTPS id C3EF9A0745
        for <snip>; Wed, 27 May 2015 18:32:01 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:sender:date:message-id:subject:from:to:content-type;
        bh=JF1P0cVaOod0HI+SRSwmWbbutO+SFZzQejSnBlTt9UE=;
        b=niwa/5j1vbY1rp5D/w8EezUZCKAPlT0bJLZFCfEFwwBkP+8A8Pw/urwZQTG64w2ZUj
        KQH3fD/seNhTf3fPDuTS0O0nB24TTZkjJplaPE6g2wPP7k8pXe1O1c5fNFWRkuywTKcJ
        nBXjj97i7n79QaSwLH4vLI0ySlOyr5pRAhJeo9+6vuTOJJNLMpCVPqP10BVJ69r+39KA
        p2Au0Up+y2ZngArI0dFYtYqQFSgEHI2Yqjjm61jRBEbjEgWJkagLkfZY6R3yQsfFloff
        Zk3rDreMVr5ronX7qPxIvpAaZTMhSg41hf+2NU38tpvoaBRBBOIyBz05JcI1jdPIEktp
        oujg==
<...>

The Received: header tells us that the message was encrypted in transit (look for with ESMTPS), and I’ve configured postfix to also record the ciphers used, out of curiosity.

Issues with SPF, DKIM, DMARC and opportunistic encryption

Nothing is perfect. None of the technologies mentioned above help in verifying that someone on a given domain is not being impersonated by someone else on that domain, like if I claim to be sergey.brin@gmail.com or similar. To prevent this, you need to trust that the provider ensures proper access control and validation of its senders. But this is not unreasonable, trusting your mail provider is a natural thing to do, if you don’t, why would you use that mail provider? Granted, PRISM changed the rules of the game, as you’d also have to trust that your provider has sufficient opsec and legal power to keep governments out of their servers, which it seems has not been the case for any of the large US-based service providers. There are alternatives.

TLS certificates today costs money, which does not help foster adoption for encryption around the web. Granted, if you’re brave you can try to navigate the UI of StartCom’s StartSSL service, which will provide you domain-validated TLS certificates for free, but better things are coming. The EFF is working on Let’s Encrypt together with Mozilla to automate this process and provide free certificates for anyone, which is sorely needed. Alternatively, Postfix supports using DANE for key exchange, which removes the need for a cumbersome and vulnerable CA-structure, and moves the trust over to DNS providers. If you’re enough of a indie hipster you’ll claim that domain names are only for the weak, real enthusiasts manually keep their host files updated for their friends’ IPs and don’t need domain names, but for anyone else this is not practical. Thus letting your DNS provider publish your keys prevents any CA to sign certificates for your domain, without incurring any cost to stay secure.

UI for security service is essential, and in that regard many of these solutions have “failed”. The only way these technologies are presented to the user is whether mail ends up in your spam folder or your inbox. Greater transparency into why some mail is trusted and other is not is needed to establish trust in the system. If the UI presented that mail is verified with DKIM and SPF and was protected in transit that would probably yield much more trust in the mail system. Or preferably, assume that all these are mandatory, and notify the user if any of them fails. If this was actually visible, we’d reduce the number of self-signed certificates and invalid certificates seen in SMTP today. It would also push providers to support opportunistic encryption, as ordinary users would be notified if their mail was sent in plaintext.

It should be noted that these solutions are not mutually exclusive with using PGP, the best is having both, which means that DKIM, SPF and opportunistic encryption validates your service provider, while your PGP key validates you. In some cases, like if you self-host email, the service provider and the sender is the same person and this might not be necessary, but self-hosting email is not a solution for everyone. In any case, if you want to use PGP we’d still need better user interfaces, as I’d stop using any service that actually exposed me to mail that looks anything like the one in the beginning of this post. I’d recommend sending detached signatures as attachments instead of clearsigned mail, which means that if a recipient receives PGP-signed mail they would see an attachment they don’t recognize, instead of the PGP-blob. Attaching your own public key as well means that you could run a trust-on-first-use scheme, where no one could impersonate the other in a conversation after they have been communicating for some time.

SMTP is not the best tool for secure communication, but it’s one of the few usable, distributed systems we have that can achieve pretty good security and also has near-universal adoption. There’s plenty of secure communication tools which I hope will become only more and more widespread, but until then, or in addition to them, mail can actually work pretty well.

Don't split interdependent values

It’s not uncommon to encounter values that are intrinsically linked together. Usernames and passwords come to mind as an obvious example, when working with salt states it’s quite common to encounter program versions and tarball checksums. If you have several values where it doesn’t make sense specifying only one or changing one of them in isolation, don’t design your system to allow them to be. requests is a great example of how to do this properly:

response = requests.get('https://api.github.com', auth=('user', 'pass'))

Notice how the username and password are not two separate arguments to the function, but always specified as a tuple containing both. Contrast this to common practice in the salt state world, like this in the saltstack-formulas nginx.source state:

{% set nginx = pillar.get('nginx', {}) -%}
{% set version = nginx.get('version', '1.6.2') -%}
{% set checksum = nginx.get('checksum', 'sha256=b5608...1a18') -%}

The state would fail if you were to override the version but forgot to also override the checksum. Why make it possible to make these sorts of mistakes? I’d rather use a pillar value nginx.version_specifier = '1.6.2 sha256=b5608...1a18' instead, and then just split the two in the state.

{% set nginx = pillar.get('nginx', {}) %}
{% set version_specifier = nginx.get('version_specifier', '1.6.2 sha256=b5608...1a18') %}
{% set version, checksum = version_specifier.split() %}

Now there’s no invalid combinations of versions and checksums, which is much easier to relate to as a consumer of the state.

Server-side assets: File revisioning

Asset managment is mostly associated with client-side resources, like stylesheets, scripts, graphics and the like. But if you have a decent build process to manage your assets, you might benefit from utilizing these assets also server-side. I’ve been doing this for some time on some of my projects, and have found at least two very promising use cases for this, one of which we’ll take a closer look at today.

To keep stuff snappy on the net these days, the best way to go is usually to minimize the number of requests necessary to finish loading. This is easily said, but often a bit harder to implement. To keep requests to the minimum, you let the client cache static assets for months or longer, to avoid having to fetch them again. This is also easy, all until you want to introduce a change to any of your assets which is cached on thousands of client machines. To force a reload of the assets you employ cache busting, usually done by simply adding a version specifier like style.css?v=123 to our hrefs.

There’s a couple of downsides to this, some proxies doesn’t handle this too well, and if you’re doing the same version for all of your assets (which is often the easiest to implement, just appending a git hash or similar), you’d be invalidating a lot of cached assets for changes unrelated to that asset. A more clever approach then is to add a hash of the file contents to the filename, like styles.96438d44.css, which ensures that the file will be fetched again once it’s contents change, and if you revert whatever change you did, there’s no need for clients to fetch the new assets, since they already have the old one cached. Great!

If you use tools like grunt or gulp, getting them to the filename rewriting is fairly trivial, thanks to plugins like grunt-filerev and gulp-rev. The intention for both of these is to use some other plugin to rewrite the hrefs in the HTML to use the revved version, but I’ve found that to be a slightly harder step to easily integrate with my build flow. What I do instead is dump the plain-to-revved mapping to disk (this grunt plugin is helpful) and include this file in the built app. Since I’m mostly using Flask or Django which both manage static files by symbolic names anyway, it’s trivial to write a little shim than takes this symbolic name and uses the revved version instead.

Let’s take a look at how to perform this, for Flask first, and then we’ll look at Django later. For both of them, we read the revision mapping on load, and do the actual mapping during template rendering.

Flask

For Flask, assuming you dumped the filerevisions to a directory server-assets under your package and called the file filerevs.json, you can do like this:

from os import path
import json

# Create your app etc...

filerevs_path = path.join('server-assets', 'filerevs.json')
try:
    with app.open_resource(filerevs_path) as filerevs_fh:
        filerevs = json.load(filerevs_fh)
        app.config['FILEREVS'] = filerevs
except IOError:
    print('No filerevs found, continuing without')
    app.config['FILEREVS'] = {}

Now we can override the standard Flask url_for with a modified version, which first checks if the file exists in the mapping, before falling back to normal url_for behavior. You can do that by providing some context processors to Jinja that overrides the defaults, like this:

from flask import url_for, current_app

def revved_url_for():
    return {
        'url_for': _revved_url_for,
    }

def _revved_url_for(endpoint, **values):
    if endpoint == 'static':
        original_filename = values.get('filename')
        if original_filename:
            revved_filename = current_app.config['FILEREVS'].get(original_filename)
            if revved_filename:
                del values['filename']
                return url_for(endpoint, filename=revved_filename, **values)
    return url_for(endpoint, **values)

Save that next to where you create your app, or in a file context_processors.py or whatever, you know best how to structure your project.

Anyway, you also need to specify that you want to use this new shiny context processor with your app:

app.context_processor(revved_url_for)

Et voilà, if you’ve revved your ‘css/styles.css’, your calls to url_for('static', filename='css/styles.css') will now return revved filenames like /static/css/styles.cafed00d.css. Awesome. Remember to serve those files with long-lasting cache headers, usually a year is quite sufficient. For nginx, this is as simple as adding the following to your config:

location ~* \.(?:css|js)$ {
  expires 1y;
}

Should be self-explanatory. Ask Google for how to achieve the same thing with Apache.

Django

Loading the config is very much the same in Django as for it’s Flask equivalent:

# in your main settings file:
from os import path

filerevs_path = path.join(path.dirname(__file__), 'server-assets', 'filerevs.json')
try:
    with open(filerevs_path) as filerevs_fh:
        FILEREVS = json.load(filerevs_fh)
except IOError:
    print('No file revisions found, continuing without')
    FILEREVS = {}

However, with Django we have to create a new templatetags module and include this in our templates. So, in your favorite app, create a new package called templatetags, create an empty __init__.py and a revved_static.py file:

from django import template
from django.conf import settings
from django.contrib.staticfiles.storage import staticfiles_storage
from django.contrib.staticfiles.templatetags.staticfiles import StaticFilesNode

register = template.Library()

class FileRevNode(StaticFilesNode):
    """ Overrides normal static file handling by first checking for file revisions in
    settings.FILEREVS, before falling back to the actual requested filename. Otherwise
    indentical to normal static tag.
    """

    def url(self, context):
        path = self.path.resolve(context)
        revved_path = settings.FILEREVS.get(path)
        if revved_path is not None:
            return staticfiles_storage.url(revved_path)
        else:
            return super(FileRevNode, self).url(context)


@register.tag
def static(parser, token):
    return FileRevNode.handle_token(parser, token)

In your template you can now do like this:

{% load revved_static %}

<link rel="stylesheet" href="{% static 'css/styles.css' %}">

And it will render you something like this:

<link rel="stylesheet" href="/static/css/styles.cafed00d.css">

Kung-fu! Every request after the first one should now be super-snappy, and combine this with uglification and minification of the assets, and the first load can be quite manageable as well.

In a Couple Of Days™ we’ll take a look at another usage of server-side assets, where we use a subset of our CSS for progressively enhanced styles.

Privacy-conscious social plugins

Jeremy Keithrecently bloggedabout only injecting Google Analytics code if the users Do Not Track-header is not set. This is very similar to some thoughts I’ve been having recently: The same thing should apply for social plugins as well! They track you at least as much as actual analytics code does, only to the social networks you’re not just an IP, you’re you, with full name, interests, friend lists and whatnot.

Simply including social plugins like a Facebook Like button on your site means that every time one of your visitors browse your site, their browser will gently notify Facebook about the visitors behavior on your site. From the times and Referer-headers the browser sends to Facebook when loading the script, Facebook knows what they’re reading, how long they were reading it for, how many articles read while on your site, and all sorts of stuff the reader might not at all want to share with Facebook. Facebook may or may not actually be tracking these data, but it would be silly of them not to, as with this data they can spot new, hot stuff on the web before anyone has even clicked the Like button.

So it’s much the same problem as with Jeremy’s analytics problem, only slightly simpler - only include the social plugins when the user has actually indicated that they want to share your piece with their network, and not before. Some placeholder image that looks like the Facebook like button can be shown on load, and if clicked, load the code to complete the action. There’s a double benefit to this, your site will load faster, and you’ll protect the privacy of your users.

I don’t have a working example of this yet, but it’s something I want to have for this site, so maybe there’ll be another post on this later.

On another note, welcome to my blog! Hope I’ll see you again later, there’s more to come.