Fractile’s revolutionary approach to computing can run the world’s largest language models

100x Faster*

(at 1/10th the cost of existing systems)

* Decode tok/s, versus a (cluster of) H100 GPUs with 8-bit quantisation and TensorRT-LLM, on Llama2 70B

We are building the hardware that will remove every bottleneck to the fastest possible inference of the largest transformer networks

This means the biggest LLMs in the world running faster than you can read, and a universe of completely new capabilities and possibilities for how we work that will be unlocked by near-instant inference of models with superhuman intelligence.

Existing hardware is not fit for the AI revolution

When a trained language model is run for a user, over 99% of the total compute time is spent not on arithmetic but on moving model weights from memory to the processor

* Llama2 70B at 8 bit quantisation, on 80GB A100 GPU

AI

Revolution

By performing 100% of the operations needed to run model inference in memory, we blast through the memory bottleneck

The time taken to run the arithmetic for generating a single word

200x

The time taken moving parameters from memory to the processor for each word

The total time a Fractile processor takes to generate the same word

The time taken to run the arithmetic for generating a single word

200x

The time taken moving parameters from memory to the processor for each word

The total time a Fractile processor takes to generate the same word

We are hiring

see open roles

Please contact us if you are interested in learning more, or connect with us on LinkedIn.

About Us

We are a team of scientists, engineers and hardware designers who are committed to building the solutions that the AI revolution requires to keep scaling. We believe that the most important breakthroughs will come from trying solutions that others are not, to serious problems we actually face.