Blogs 2024 NeurIPS 2024 Edge-Device LLM Competition Team NICS-EffAlg Solutions (2nd Place) December 18, 2024 Generative Model Compression and Acceleration July 31, 2024 MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression June 25, 2024 ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation June 25, 2024 DiTFastAttn: Attention Compression for Diffusion Transformer Models June 7, 2024 Back to Top ↑ 2023 An Introduction to Quantization of Large Language Models August 30, 2023 Model Compression Towards Efficient Deep Learning Inference August 29, 2023 Back to Top ↑ 2022 Neural Architecture Search and Architecture Encoding December 12, 2022 Back to Top ↑
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation June 25, 2024