Stars
38
stars
written in Python
Clear filter
Easy and Efficient Transformer : Scalable Inference Solution For Large NLP model
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
This repository contains the code for "Generating Datasets with Pretrained Language Models".
Implementation of ICLR 2018 paper "Loss-aware Weight Quantization of Deep Networks"
Implementation of ICLR 2017 paper "Loss-aware Binarization of Deep Networks"