Blogs 2024 Generative Model Compression and Acceleration July 31, 2024 MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression June 25, 2024 ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation June 25, 2024 DiTFastAttn: Attention Compression for Diffusion Transformer Models June 7, 2024 Back to Top ↑ 2023 An Introduction to Quantization of Large Language Models August 30, 2023 Model Compression Towards Efficient Deep Learning Inference August 29, 2023 Back to Top ↑ 2022 Neural Architecture Search and Architecture Encoding December 12, 2022 Back to Top ↑
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation June 25, 2024