A simple but complete full-attention transformer with a set of promising experimental features from various papers. Proposes adding learned memory key/values prior to attending. They were able to remove feedforwards altogether and attain a similar performance to the original transformers. I have found that keeping the feedforwards and adding the memory key/values leads to even better performance. Proposes adding learned tokens, akin to CLS tokens, named memory tokens, that is passed through the attention layers alongside the input tokens. You can also use the l2 normalized embeddings proposed as part of fixnorm. I have found it leads to improved convergence when paired with small initialization (proposed by BlinkDL). The small initialization will be taken care of as long as l2norm_embed is set to True.

Features

  • Decoder-only (GPT-like)
  • Encoder-only (BERT-like)
  • State of the art image classification
  • Augmenting Self-attention with Persistent Memory
  • Transformers Without Tears
  • Root Mean Square Layer Normalization

Project Samples

Project Activity

See All Activity >

Categories

Machine Learning

License

MIT License

Follow x-transformers

x-transformers Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of x-transformers!

Additional Project Details

Programming Language

Python

Related Categories

Python Machine Learning Software

Registered

2022-08-11