-
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh. [Paper] [Github]
-
GPTQ-for-LLaMA: 4 bits quantization of LLaMA using GPTQ. [Github]
-
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han. [Paper] [Github]
-
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, Song Han. [Paper] [Github]
-
RPTQ: Reorder-based Post-training Quantization for Large Language Models. Zhihang Yuan and Lin Niu and Jiawei Liu and Wenyu Liu and Xinggang Wang and Yuzhang Shang and Guangyu Sun and Qiang Wu and Jiaxiang Wu and Bingzhe Wu. [Paper] [Github]
-
QLoRA: Efficient Finetuning of Quantized LLMs. Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer. [Paper] [Github]
-
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation. Zhewei Yao, Xiaoxia Wu, Cheng Li, Stephen Youn, Yuxiong He. [Paper]
-
SqueezeLLM: Dense-and-Sparse Quantization. Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer. [Paper] [Github]
-
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling. Xiuying Wei , Yunchen Zhang, Yuhang Li, Xiangguo Zhang, Ruihao Gong, Jinyang Guo, Xianglong Liu. [Paper]
-
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models. Yijia Zhang, Lingran Zhao, Shijie Cao, Wenqiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu. [Paper]
-
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models. Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, Pierre Stock, Yashar Mehdad, Yangyang Shi, Raghuraman Krishnamoorthi, Vikas Chandra. [Paper]
-
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression. Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh. [Paper] [Github]
-
OWQ: Lessons learned from activation outliers for weight quantization in large language models. Changhun Lee, Jungyu Jin, Taesu Kim, Hyungjun Kim, Eunhyeok Park. [Paper] [Github]
-
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study. Peiyu Liu, Zikang Liu, Ze-Feng Gao, Dawei Gao, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen. [Paper] [Github]
-
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats. Xiaoxia Wu, Zhewei Yao, Yuxiong He. [Paper]
Outlier Suppression: Pushing the Limit of Low-bit Transformer. Xiuying Wei, Yunchen Zhang, Xiangguo Zhang, Ruihao Gong, Shanghang Zhang, Qi Zhang, Fengwei Yu, Xianglong Liu. [Paper][Github]
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models. James O’Neill, Sourav Dutta. [Paper]
Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases. Xiaoxia Wu, Cheng Li, Reza Yazdani Aminabadi, Zhewei Yao, Yuxiong He. [Paper]
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models. Zhuocheng Gong, Jiahao Liu, Qifan Wang, Yang Yang, Jingang Wang, Wei Wu, Yunsen Xian, Dongyan Zhao, Rui Yan. [Paper]
Boost Transformer-based Language Models with GPU-Friendly Sparsity and Quantization. Chong Yu, Tao Chen, Zhongxue Gan. [Paper]
- Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing. Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort. [Paper]
-
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers. Yijiang Liu, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang Zhang. [Paper]
-
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization. Chong Yu, Tao Chen, Zhongxue Gan, Jiayuan Fan. [Paper]
-
Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction. Jemin Lee, Yongin Kwon, Jeman Park, Misun Yu, Hwanjun Song. [Paper]
-
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models. Gunho Park, Baeseong Park, Minsub Kim, Sungjae Lee, Jeonghoon Kim, Beomseok Kwon, Se Jung Kwon, Byeongwook Kim, Youngjoo Lee, Dongsoo Lee. [Paper]
-
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt. Zhaozhuo Xu, Zirui Liu, Beidi Chen, Yuxin Tang, Jue Wang, Kaixiong Zhou, Xia Hu, Anshumali Shrivastava. [Paper]
-
A Survey of Techniques for Optimizing Transformer Inference. Krishna Teja Chitty-Venkata, Sparsh Mittal, Murali Emani, Venkatram Vishwanath, Arun K. Somani. [Paper]
-
Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks. Cuong Pham, Tuan Hoang, Thanh-Toan Do. [Paper]
-
Genie: Show Me the Data for Quantization. Yongkweon Jeon, Chungman Lee, Ho-young Kim. [Paper] [Github]
-
BiBench: Benchmarking and Analyzing Network Binarization. Haotong Qin, Mingyuan Zhang, Yifu Ding, Aoyu Li, Zhongang Cai, Ziwei Liu, Fisher Yu, Xianglong Liu. [Paper] [Github]
-
Toward Accurate Post-Training Quantization for Image Super Resolution. Zhijun Tu, Jie Hu, Hanting Chen, Yunhe Wang. [Paper]
-
One-Shot Model for Mixed-Precision Quantization. Ivan Koryakovskiy. [Paper]
-
Adaptive Data-Free Quantization. Biao Qian, Yang Wang, Richang Hong, Meng Wang. [Paper] [Github]
-
NIPQ: Noise proxy-based Integrated Pseudo-Quantization. Juncheol Shin, Junhyuk So, Sein Park, Seungyeop Kang, Sungjoo Yoo, Eunhyeok Park. [Paper] [Github]
-
Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization. Chen Lin. [Paper]
-
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective. Yuexiao Ma, Huixia Li, Xiawu Zheng, Xuefeng Xiao, Rui Wang, Shilei Wen, Xin Pan, Fei Chao, Rongrong Ji. [Paper] [Github]
-
ABCD : Arbitrary Bitwise Coefficient for De-quantization. Woo Kyoung Han. [Paper] [Github]
-
BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to Real-Network Performance. Haotong Qin, Xudong Ma, Yifu Ding, Xiaoyang Li, Yang Zhang, Zejun Ma, Jiakai Wang, Jie Luo, Xianglong Liu. [Paper] [Github]
-
Bayesian asymmetric quantized neural networks. Jen-Tzung Chien, Su-Ting Chang. [Paper]
-
Binary Neural Network for Video Action Recognition. Hongfeng Han, Zhiwu Lu, Ji-Rong Wen. [Paper]
-
Post-training Quantization for Neural Networks with Provable Guarantees. Jinjie Zhang, Yixuan Zhou, Rayan Saab. [Paper] [Github]
-
EBSR: Enhanced Binary Neural Network for Image Super-Resolution. Renjie Wei, Shuwen Zhang, Zechun Liu, Meng Li, Yuchen Fan, Runsheng Wang, Ru Huang. [Paper]
-
Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis. Xiuwei Xu, Ziwei Wang, Jie Zhou, Jiwen Lu. [Paper]
-
Binary domain generalization for sparsifying binary neural networks. Riccardo Schiavone, Francesco Galati, Maria A. Zuluaga. [Paper]
-
FP8 versus INT8 for efficient deep learning inference. Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph Soriaga, Tijmen Blankevoort. [Paper] [Github]
-
FP8 Quantization: The Power of the Exponent. Andrey Kuzmin, Mart Van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, Tijmen Blankevoort. [Paper]
-
Unit Scaling: Out-of-the-Box Low-Precision Training. Charlie Blake, Douglas Orr, Carlo Luschi. [Paper] [Github]
-
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search. Jordan Dotzel, Gang Wu, Andrew Li, Muhammad Umar, Yun Ni, Mohamed S. Abdelfattah, Zhiru Zhang, Liqun Cheng, Martin G. Dixon, Norman P. Jouppi, Quoc V. Le, Sheng Li. [Paper]
-
Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings. Ulf A. Hamster, Ji-Ung Lee, Alexander Geyken, Iryna Gurevych. [Paper]
-
Pruning vs Quantization: Which is Better?. Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort. [Paper]