Blogs

2024

NeurIPS 2024 Edge-Device LLM Competition Team NICS-EffAlg Solutions (2nd Place)

December 18, 2024

Generative Model Compression and Acceleration

July 31, 2024

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

June 25, 2024

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

June 25, 2024

DiTFastAttn: Attention Compression for Diffusion Transformer Models

June 7, 2024

Back to Top ↑

2023

An Introduction to Quantization of Large Language Models

August 30, 2023

Model Compression Towards Efficient Deep Learning Inference

August 29, 2023

Back to Top ↑

2022

Neural Architecture Search and Architecture Encoding

December 12, 2022

Back to Top ↑