- It's based on my another AI project, Palm.
- Strawberry is an attempt to make Palm better and make a good AI model which feels human-like.
- The first AI model under this name
Strawberryis going to beStrawberry s1.
Since Strawberry is based on Palm, it built on top of same improvements.
- Reuse a shared stack of layers across recursion steps inspired from Google's Mixture of Recursions [paper]
- Use a novel attention mechanism which I call
Attention On Detail - Modernized architecture: Rotary embeddings, QK-Norm, and ReLU²
- Parallel layers proposed by Google's PaLM [paper]
- SwiGLU in feed forward network. [paper]
- Untie head from embedding
- The Muon optimizer [writeup] [repo]
- Linear layer factorization
- Multi-Headed Causal Self-Attention (MHA)
- Attention Free Transformer (AFT)
- Linear Attention Mechanism (LAM)
- Key-Value Transformer
- Neural Attention
- SwiGLU
Although this version of attention on detail will be different from the Palm's version.
@software{Strawberry,
author={Srijan Srivastava},
title={Strawberry},
url={https://github.com/SrijanSriv211/Strawberry},
version={0.1.0},
year = {2025}
}