Hatchet/SST

This is a repository for deploying Hatchet via sst.dev + Pulumi in AWS.

Background

This is aimed at someone who is looking to integrate Hatchet into their stack and needs self-hosting, but is not an expert in AWS (or you are familiar with AWS/ECS but not EKS).

The Hatchet managed cloud offers a free tier, however if you do the kind of embarassingly parallel simulation work I do, the limitations on simultaneous Worker counts will prevent it from being relevant to you. You would likely need to go with a custom plan for the kind of workload pattern I have - e.g. running experiments with a few million simulations over a few thousands workers, but only once or twice a month, if that. I recommend you get in touch with the team to discuss pricing, because (a) they are super helpful and (b) if you are an academic like me, you would probably prefer to use managed infra rather than worring about your own. You can compare pricing at the end of the document.

In any case, it's relatively easy to self-deploy Hatchet (and inexpensive, assuming you don't mind standing up and tearing down infra each time you run an experiment, assuming infrequent but highly bursty needs). It's only a few resources really - an AmazonMQ RabbitMQ broker, an Aurora/RDS Postgres database, and an ECS service with the actual Hatchet engine, API, and web UI dashboard. Having said that, there are also serveral conveniences pre-configured so that you can easily deploy in private subnets or with/without an internet-facing load balancer. If you have cloud infra experience, you may prefer to roll your own deployment, however, if not, this should be enough to get you off the ground with Hatchet relatively quickly.

Hatchet's official self-hosting docs include lots more information, including official support for Kubernetes w/ Helm charts or glasskube, but as someone with no real experience with K8S, I felt it was personally easier for me to go the route of translating the Docker Compose Deployment instructions into ECS.

Requirements

You will need an AWS account with credentials, as well as Docker, Node and SST installed.

Install Docker

Visit Docker and install the relevant version for your system if you do not have it already. This will be necessary when deploying since containers are built on your machine before being pushed to ECR.

Install Node/NPM

Follow the instructions here to get Node/npm installed on your machine if you do not have them already. I recommend using nvm regardless of whether you are on Windows vs OSX/Linux

Create an AWS Account

You can follow the official AWS instructions or just use a pre-existing account with relevant permissions; however, sst.dev's instructions are actually pretty helpful in that they guide you through setting up an organization with different isolated accounts for specific environments, so if you are standing up a new project, following them is not a bad idea!

Clone the repo and install deps

cd path/to/your/repos/
git clone https://github.com/szvsw/hatchet-sst.git
cd hatchet-sst
npm i .

Deploying the Stack

Purchase a domain (optional, recommended)

You can skip this step if you only want the engine available to worker nodes in the same VPC (or via tunneling). If you do not know what this means, then you should buy a domain!

In most cases, you will want to make the engine available over the open internet so that you will be able to visit the Hatchet dashboard to check task progresses and allow worker nodes on your local machine to easily connect to the engine.

The easiest way to do this is to purchase a domain through AWS Route53, and let sst.dev automatically configure all of the relevant DNS settings, certs for SSL, load balancer config etc. Depending on how luxurious you are feeling with your choice of domain, this is probably approx. $50/yr for the domain + the monthly LB costs (approx. $30/mo, but if you are just standing up the engine for infrequent experiment runs, e.g. once or twice a month and then tearing down, it's much less).

Log in to your AWS console.
Navigate to Route53.
Purchase a domain, write down its name, e.g. acmelab.com.

If you have an externally managed domain, you will need to create a certificate in ACM and add it to the env vars - more documentation coming soon. It's pretty easy though! Essentially just need to add one or two records to your DNS config via your DNS provider's console and wait 20 min. TODO: enable certificate referencing

Getting ready to deploy

sst let's you manage different stages (aka environments) when you deploy, including some cool functionality around dev deployments, but we will not worry about that for now. By default, when you run a command like sst deploy, it will deploy to an environment with your current OS username - e.g. for me that's szvsw on my work computer but sam on my home computer. You can always override which stage you want to deploy by passing in the --stage <stage-name> flag to the CLI. By default, sst will also load in any configuration variables you set in a corresponding .env.<stage-name> file.

Copy .env.example to .env.<stage-name> (e.g. <your-os-username> or production)
Update ROOT_DOMAIN (or delete if not accessible over the internet)
Update any other configuration variables which might be relevant (e.g. cpu/mem size)

EnvVar	Type	Description
`ROOT_DOMAIN`	`undefined` or `valid domain in Route53`	The root domain which will be used for making Hatchet accessible. The dashboard will be available at `hatchet-<stage-name>.<root-domain>`, e.g. `hatchet-production.acmelab.com`. If omitted or `false`, the engine will only be accessible inside the same VPC.
`DB_STORAGE`	`[number] GB`	Size of the Postgres database storage.
`DB_INSTANCE_TYPE`	supported instances	What type of AWS instance to use for the Aurora Postgres database. nb: omit the `db.` prefix from the instance type name
`BROKER_INSTANCE_TYPE`	supported instances	What type of AWS instance to use for the AmazonMQ RabbitMQ broker. nb: do NOT omit the `mq.` prefix from the instance type name
`ENGINE_CPU`	supported vCPU count	How many vCPUs the Hatchet engine service should use. nb: the combination of cpu/mem must be valid
`ENGINE_MEMORY`	supported vCPU count	How much memory the Hatchet engine service should use. nb: the combination of cpu/mem must be valid
`ENGINE_PRIVATE_SUBNET`	`boolean`	Whether or not to deploy the engine inside a private subnet. nb: if `true`, additional monthly costs will be incurred because either a NAT Gateway or PrivateLink VPC Endpoints will be added in order to pull containers from ECR.
`NAT_GATEWAY`	`boolean`	Whether to add a NAT Gateway to the VPC. If `false` and `ENGINE_PRIVATE_SUBNET=true`, then PrivateLink VPC Endpoints will be added so that containers can still be pulled.
`BASTION_ENABLED`	`boolean`	Whether to add a Bastion instance in your VPC which gives you remote access/tunneling capabilities
`OVERWRITE_CONFIG`	`boolean`	Whether to regenerate the base Hatchet config before redeploying the engine.

nb: the default instance sizes in .env.example are relatively large and sized for decent throughput. See the cost estimate at the end of the document. If you want to start cheaper, consider dropping down to something with 1vCPU for the broker, 2 vCPU for the DB, and 2 vCPU for the engine.

TODO: considerations when deploying workers in a private subnet

Setting secrets

sst secret set DatabasePassword <your-password> --stage <stage-name> (nb: must be 12+ chars)
sst secret set BrokerPassword <your-password> --stage <stage-name> (nb: must be 12+ chars)
sst secret set AdminPassword <your-password> --stage <stage-name> (nb: must be 12+ chars, must contain an uppercase value, a lowercase value, and a number)

Time to deploy!

sst deploy --stage <stage-name>
Visit hatchet-<your-stage-name>.<your-root-domain>, e.g. hatchet-production.acmelab.com and log into the default admin tenant with hatchet@<your-root-domain> and the specified password.

Deploying workers in the same subnet when a load balancer is present

If you have configured ROOT_DOMAIN=your-domain.com, a load balancer is automatically configured and Hatchet's engine is configured to tell workers via the fields encoded in a JWT API token to send the appropriate HTTP(s)/gRPC traffic via hatchet-<your-stage-name>.<root_domain> and hatchet-<your-stage-name>.<root_domain>:8443 respectively. These resolve to the load balancer, which then routes traffic to the appropriate containers.

There's a good chance you might be spinning up thousands of worker nodes, in which case you probably want to skip the load balancer altogether, which you can do by deploying the worker nodes in the same VPC as the engine (TODO: auto-deploy docs coming soon) and using the cloudmap namespace domains.

However, because the client JWTs you generate still have the load balancer URLs encoded in the relevant fields, you need to override some environment variables when deploying the worker.

In addition to setting HATCHET_CLIENT_TOKEN, you will also need to set:

HATCHET_CLIENT_SERVER_URL=http://Engine.<your-stage-name>.hatchet.sst
HATCHET_CLIENT_HOST_PORT=Engine.<your-stage-name>.hatchet.sst:7070
HATCHET_CLIENT_TLS_STRATEGY=none

You can find the relevant URLs in the results of sst deploy under EngineAddresses in internalServerUrl and internalGrpcBroadcastAddress.

Deploying engine without ingress from the internet

If you need to deploy without ingress from the internet, simply omit the ROOT_DOMAIN env var or set it to false. This will result in the deployment skipping the configuration of a Load Balancer for the Hachet service. However, this means that you will not be able to connect local workers to Hatchet or check the dashboard from your machine, at least not with some networking-fu. By default, this will still deploy the service in the public subnets of your VPC, but there will be no ingress pathway from your local machine to the service.

Fortunately, sst makes it relatively easy to get connected to the VPC.

nb: your choice of private/public subnets for the engine containers are irrelevant here, since the tunnel we establish in the VPC will already have ingress rules which allow traffic to reach the engine.

Setting up Bastion & the tunnel

nb: if you are on windows, you will need to use WSL for this part

First, you will need to set BASTION_ENABLED=true and redploy (sst deploy --stage <your-stage-name>). Copy the Bastion Instance ID (something like i-asdf1348) to your clipboard for use later.
(install tunneling via sudo sst tunnel install if you have not already)
Open up a tunnel with sst tunnel --stage <your-stage-name>

Accessing the dashboard

Open up a Firefox, then open Settings > Network Settings > Settings
Select Manual proxy configuration
Configure the SOCKS Proxy host field as localhost and the port field as 1080.
Make sure that SOCKS v5 is selected.
Click OK to save settings.
Open a shell on your Bastion Instance: aws ssm start-session --target <Bastion-instance-id>
Run dig +short engine.<your-stage-name>.hatchet.sst to print out the IP address of the engine service within the VPC (you can also check this from the AWS console).
Open your hosts file in a text editor (on Mac/Linux, this is at /etc/hosts, on windows it's at C:/Windows/System32/drivers/etc) and add a record at the end which says <ipaddress> Engine.<your-stage-name>.hatchet.sst, e.g. 10.0.10.136 Engine.szvsw.hatchet.sst. This will tell your computer to route the url to the ip address, while the proxy we configured in Firefox will tell your computer to route the IP address through the tunnel into the VPC.
You can now access the dashboard via the internal cloudmap namespace server url, which should be something like Engine.<your-stage-name>.hatchet.sst.
Default log-in email will be hatchet@example.com with your specified password from sst secret.

nb: though it's not particularly problematic to leave it there, it's probably a good idea to remove the record you added to your hosts file as well as the proxy settings in Firefox when you are done lest you confuse yourself in the future.

Opening a shell in the VPC

You can remotely access your Bastion instance by running:

aws ssm start-session --target <Bastion-instance-id>

Workers in the VPC

You will of course need to deploy your workers in the same VPC. By default, the a client token generated from the dashboard following the instructions above should work fine - it will use the internal cloudmap namespace correctly. However, you will need to set an additional env var on the worker:

HATCHET_CLIENT_TLS_STRATEGY=none

Depoying workers

TODO: example of worker deployment

Cost Estimate

This cost estimate presented is sized for a moderately high throughput and DOES NOT include your worker node compute costs, just the engine, database, queue, etc.

Aurora/RDS: r6g.xlarge, $0.2016/hr
MQ: m7g.large, $0.0816/hr
Fargate: 4vCPU/8GB, $0.19/hr
ALB: ~$30/mo (depends on if Workers connect thru ALB or within VPC)
NAT (optional), 2 AZs, ~$65/mo
Not included: some negligible ECR costs, Domain registration cost (e.g. $50/yr)

About $13/day or $370/month without a NAT, or about $430/month with a NAT or PrivateLink VPC Endpoints.

Note that the managed Hatchet pricing for the Growth plan is currently $425/month, but it includes $100/month in worker node compute credits, meaning the effective infrastructure price is $325/month, which already beats this. Given that you can get set up with managed Hatchet Cloud in literally seconds AND you can very easily auto-deploy worker nodes via managed comptue with auto-scaling and CI/CD already configured, I would say that pricing seems very attractive versus self-hosting for an actual persistent application (as compared to my typical use case, where I can just stand up and tear down the whole stack since I only need it once or twice a month).

Of course you can tune those instance size to your needs (and maybe even use spot capacity for the engine, though that seems risky), skip the load balancer entirely, and so on so you might see costs anywhere in the $100-300/month depending on your settings, but still, then you might be competing with the managed Hatchet Starter Plan @ $180/mo.

To me this suggests that you probably need a pretty strong argument to go the self-hosting route, which is probably just that you actually need to own your infra for one business/dev reason or another.

TODO Docs

document credentials, hatchet login/token generation, using pgadmin through the tunnel etc

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
infra		infra
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
sst-env.d.ts		sst-env.d.ts
sst.config.ts		sst.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hatchet/SST

Background

Requirements

Install Docker

Install Node/NPM

Create an AWS Account

Clone the repo and install deps

Deploying the Stack

Purchase a domain (optional, recommended)

Getting ready to deploy

Setting secrets

Time to deploy!

Deploying workers in the same subnet when a load balancer is present

Deploying engine without ingress from the internet

Setting up Bastion & the tunnel

Accessing the dashboard

Opening a shell in the VPC

Workers in the VPC

Depoying workers

Cost Estimate

TODO Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hatchet/SST

Background

Requirements

Install Docker

Install Node/NPM

Create an AWS Account

Clone the repo and install deps

Deploying the Stack

Purchase a domain (optional, recommended)

Getting ready to deploy

Setting secrets

Time to deploy!

Deploying workers in the same subnet when a load balancer is present

Deploying engine without ingress from the internet

Setting up Bastion & the tunnel

Accessing the dashboard

Opening a shell in the VPC

Workers in the VPC

Depoying workers

Cost Estimate

TODO Docs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages