Who We Are

The Nanoscale Integrated Circuits and System Lab, Energy Efficient Computing Group (NICS-EFC) in the Department of Electronic Engineering at Tsinghua University is led by Professor Yu Wang. The Efficient Algorithm Team (EffAlg) in the NICS-EFC group is led by Research Assistant Professor Xuefei Ning. Our team has an in-depth academic collaboration with Infinigence-AI, and fellows from many institutions including SJTU, MSR, HKU, and so on.

Our current research primarily focuses on efficient deep learning, including algorithm-level acceleration, model-level compression, model architecture design, system co-optimization, and other techniques. Our work targets several application domains, including language generative models (i.e., LLMs), vision generative models, vision understanding models and so on. Most of our projects are open sourced at the thu-nics GitHub organization (most efficient DL projects) or the imagination-research GitHub organization (some efficient DL projects and projects for broader topics; These projects are co-lead researches with Dr. Zinan Lin from MSR).

Our group welcomes all kinds of collaborations, and is continuously recruiting visiting students and engineers who are interested in efficient deep learning. If you're interested in collaborations or visiting student opportunities, email Xuefei and Tianchen, or Prof. Yu Wang.

News

2025/01/23

4 papers, DD, ViDiT-Q, LCSC, and SJD, are accepted by ICLR 2025.
2025/01/08

Will give an invited talk about efficient AIGC trend at Zhiyuan's annual discussion on the 10 AI trends.
2024/12/26

Give an invited talk about efficient AIGC research at SCUT.
2024/12/19

Give an invited talk about efficient AIGC research at AMD China.
2024/12/11

At NeurIPS 2024 in Vancouver, Canada to present our work! Some schedules: (1) 12/11 11:00-14:00 DiTFastAttn poster @ East Exhibit Hall; (2) 12/12 11:00-14:00 Rad-NeRF poster @ East Exhibit Hall; (3) 12/12 16:30-19:30 Can LLMs Learn by Teaching for Better Reasoning? poster @ East Exhibit Hall; (4) 12/15 Workshop on Machine Learning and Compression; (5) 12/15 16:15-16:25 Presentation of Our Solution at Edge LLM Competition.
2024/12/07

Our tutorial proposal: Efficient Inference for Large Language Models -- Algorithm, Model, and System is accepted to EMNLP 2025 in Suzhou, China. See you one year later.

Competition Awards

2024 NeurIPS Edge-Device Large Language Model Competition, Model Compression Track 2nd

2024 NeurIPS Edge-Device Large Language Model Competition, Training From Scratch Track 2nd

2020 CVPR Low-Power CV Challenges 3rd

2018 NeurIPS Adversarial Robustness Competition, Model Track 2nd

Efficient DL Projects

Publishing House of Electronics Industry 2024

(Chinese Book) Efficient Deep Learning: Model Compression and Design. 《高效深度学习：模型压缩与设计》 (京东有售)
Model-level | Efficient Inference | Vision Recognition, Vision Generation, Language

ArXiv 2024

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models
Model-level (Sparsification) | Efficient Inference | Language Paper Code Website

ArXiv 2024

E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling
Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper

ArXiv 2024

MBQ: Modality-Balanced Quantization for Large Vision-Language Models
Model-level (Quantization) | Efficient Inference | Language Paper Code

ArXiv 2024

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
| Better Application | Vision Generation Paper Code Website Video

ICLR 2025

Distilling Auto-regressive Models into Few Steps 1: Image Generation
Algorithm-level | Efficient Inference | Vision Generation Paper Code Website

ICLR 2025

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Algorithm-level | Efficient Inference | Vision Generation Paper

ICLR 2025

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Model-level (Quantization) | Efficient Inference | Vision Generation Paper Code Website

ICLR 2025

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
Algorithm-level | Efficient Training, Efficient Inference | Vision Generation Paper Code

AAAI 2025

Training-Free and Hardware-Friendly Acceleration for Diffusion Models via Similarity-based Token Pruning
Model-level | Efficient Inference | Vision Generation Paper

NeurIPS 2024

Rad-NeRF: Ray-decoupled Training of Neural Radiance Field
Algorithm-level | Better Application | 3D Modeling Paper Code Video

NeurIPS 2024

Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
Algorithm-level | Better Reasoning | Language Paper Code Website Video

ArXiv 2024

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Model-level (Pruning) | Efficient Inference | Language Paper Code

ArXiv 2024

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Model-level (Sparsification) | Efficient Inference | Language Paper Code Website

NeurIPS 2024

DiTFastAttn: Attention Compression for Diffusion Transformer Models
Model-level (Sparsification), Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper Code Website Video

ArXiv 2024

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper Code

ECCV 2024

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Model-level (Quantization) | Efficient Inference | Vision Generation Paper Code Website Video

ArXiv 2024

A Survey on Efficient Inference for Large Language Models Survey
| Efficient Inference | Language Paper

ArXiv 2024

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K Benchmark, Evaluation
| | Paper Code Website

ICCAD 2024

Towards Floating Point-Based Attention-Free LLM: Hybrid PIM with Non-Uniform Data Format and Reduced Multiplications
System-level | Efficient Inference | Language Paper

FPGA 2024

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
System-level, Model-level (Quantization), Model-level (Sparsification) | Efficient Inference | Language Paper

DATE 2024

DyPIM: Dynamic-inference-enabled Processing-In-Memory Accelerator
System-level | Efficient Inference | Vision Recognition Paper

ICML 2024

Evaluating Quantized Large Language Models Evaluation
Model-level (Quantization) | Efficient Inference | Language Paper Code Video

CVPR 2024

FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
Algorithm-level | Efficient Optimization Process | Vision Generation Paper Code Video

ICLR 2024

A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models
Algorithm-level | Efficient Inference | Vision Generation Paper Code Video

ICLR 2024

Skeleton-of-Thought: Prompting Large Language Models for Efficient Parallel Generation
Algorithm-level | Efficient Inference | Language Paper Code Video

WACV 2024

TCP: Triplet Contrastive-relationship Preserving for Class-Incremental Learning
| | Paper

NeurIPS Workshop 2023

LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment
Model-level (Quantization) | Efficient Inference | Language Paper

NeurIPS 2023

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
| | Vision Recognition Paper Code

ICCV 2023

Ada3D: Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection
Model-level (Sparsification) | Efficient Inference | Vision Recognition Paper Video

ICML 2023

OMS-DPM: Deciding The Optimal Model Schedule for Diffusion Probabilistic Model
Algorithm-level | Efficient Inference | Vision Generation Paper Code Website Video

AAAI 2023 (Oral)

Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start"
Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code

AAAI 2023

Memory-Oriented Structural Pruning for Efficient Image Restoration
Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper

AAAI 2023

Ensemble-in-One: Ensemble Learning within Random Gated Networks for Enhanced Adversarial Robustness
| | Vision Recognition Paper

TPAMI 2023

A Generic Graph-based Neural Architecture Encoding Scheme with Multifaceted Information
Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code

DATE 2022 & TCAD 2023

Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture
Model-level (Structure Optimization), System-level | Efficient Optimization Process, Efficient Inference | Vision Recognition Paper

TCAD 2022

Exploring the Potential of Low-bit Training of Convolutional Neural Networks
Model-level (Quantization) | Efficient Training | Vision Recognition Paper

CVPR 2022

CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance
Model-level (Structure Optimization) | Efficient Inference | Vision Recognition Paper

CVPR 2022

FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning
Algorithm-level | Efficient Training | Vision Recognition Paper

ECCV 2022

CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS
Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper

NeurIPS 2022 (Spotlight)

TA-GATES: An Encoding Scheme for Neural Network Architectures
Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper

Low-Power CV 2022

Hardware Design and Software Practices for Efficient Neural Network Inference
Model-level (Structure Optimization), Model-level (Quantization), System-level | Efficient Inference | Vision Recognition Paper

TODAES 2021

Machine learning for electronic design automation: A survey
| | Other Paper Code

NeurIPS 2021

Evaluating Efficient Performance Estimators of Neural Architectures Evaluation
Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code

ASP-DAC 2020

Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search
Model-level (Structure Optimization) | Efficient Optimization Process, Efficient Inference | Vision Recognition Paper Code

ECCV 2020

A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS
Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code

ECCV 2020 (Spotlight)

DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation
Model-level (Structure Optimization) | Efficient Inference | Vision Recognition Paper

ArXiv 2020

aw_nas: A Modularized and Extensible NAS framework
Model-level (Structure Optimization) | Efficient Inference, Efficient Optimization Process | Vision Recognition, Language Paper Code

Who We Are

News

Competition Awards

Efficient DL Projects

Technique

Target

Domain

Sponsors