Skip to content

zetaqubit/dl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

154 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dl

Implementations of different Deep Learning architectures and algorithms, using only basic ops provided by pytorch.

Experiment Results

transformer

Model wikitext-103 ppl Closest public model
gpt2 12l 26.7 26.37 (gpt2-medium)

toy datasets

  • RNN vs LSTM vs GRU on toy dataset of "abcdef...": tensorboard
  • RNN vs LSTM vs GRU on toy dataset of "a...ab..bc..c...": tensorboard

TODOs

transformer

  • Implement transformer with self-attention
  • Implement sinusoidal position embeddings
  • Implement relative position bias a la T5
  • Implement RoPE embeddings
  • Add support for cross-attention, as used in NMT
  • Implement beam search decoding

rnn

  • Implement RNN
  • Implement LSTM
  • Implement GRU
  • Implement RWKV

examples/wikitext

  • Load wikitext dataset
  • Implement training loop
  • Implement tool for generating text
  • Set up tensorboard metrics, text samples
  • Implement model checkpoint saving / resume
  • Init correctly and verify initial loss is -log(1/50000)
  • Limit train set to 1 batch and verify train loss goes to 0
  • Try mixed precision
  • Scale to 1.5B param model
  • Scale to 1024 sequence length

eval

  • Implement evaluation framework
  • Collect popular LM benchmarks and published metrics

About

Implementations of different Deep Learning architectures and algorithms, using only basic ops provided by pytorch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors