Who We Are

The Nanoscale Integrated Circuits and System Lab, Energy Efficient Computing Group (NICS-EFC) in the Department of Electronic Engineering at Tsinghua University is led by Professor Yu Wang. The Efficient Algorithm Team (EffAlg) in the NICS-EFC group is led by Research Assistant Professor Xuefei Ning. Our team has an in-depth academic collaboration with Infinigence-AI, and fellows from many institutions including SJTU, MSR, HKU, and so on.

Our current research primarily focuses on efficient deep learning, including algorithm-level acceleration, model-level compression, model architecture design, system co-optimization, and other techniques. Our work targets several application domains, including language generative models (i.e., LLMs), vision generative models, vision understanding models and so on. Most of our projects are open sourced at the thu-nics GitHub organization (most efficient DL projects) or the imagination-research GitHub organization (some efficient DL projects and projects for broader topics; These projects are co-lead researches with Dr. Zinan Lin from MSR).

Our group welcomes all kinds of collaborations, and is continuously recruiting visiting students and engineers who are interested in efficient deep learning. If you're interested in collaborations or visiting student opportunities, email Xuefei or Prof. Yu Wang.

News

  • 2025/01/08
    Will give an invited talk about efficient AIGC trend at Zhiyuan's annual discussion on the 10 AI trends.
  • 2024/12/26
    Give an invited talk about efficient AIGC research at SCUT.
  • 2024/12/19
    Give an invited talk about efficient AIGC research at AMD China.
  • 2024/12/11
    At NeurIPS 2024 in Vancouver, Canada to present our work! Some schedules: (1) 12/11 11:00-14:00 DiTFastAttn poster @ East Exhibit Hall; (2) 12/12 11:00-14:00 Rad-NeRF poster @ East Exhibit Hall; (3) 12/12 16:30-19:30 Can LLMs Learn by Teaching for Better Reasoning? poster @ East Exhibit Hall; (4) 12/15 Workshop on Machine Learning and Compression; (5) 12/15 16:15-16:25 Presentation of Our Solution at Edge LLM Competition.
  • 2024/12/07
    Our tutorial proposal: Efficient Inference for Large Language Models -- Algorithm, Model, and System is accepted to EMNLP 2025 in Suzhou, China. See you one year later.
  • 2024/12/05
    Get the 2nd place in the Model Compression Track and the Training From Scratch Track at the NeurIPS 2024 Edge-Device LLM Competition.

Competition Awards

  • 2024 NeurIPS Edge-Device Large Language Model Competition, Model Compression Track 2nd
  • 2024 NeurIPS Edge-Device Large Language Model Competition, Training From Scratch Track 2nd
  • 2020 CVPR Low-Power CV Challenges 3rd
  • 2018 NeurIPS Adversarial Robustness Competition, Model Track 2nd
  • Efficient DL Projects

    Technique

    Target

    Domain

  • Publishing House of Electronics Industry 2024
    (Chinese Book) Efficient Deep Learning: Model Compression and Design. 《高效深度学习:模型压缩与设计》 (京东有售)
    Model-level | Efficient Inference | Vision Recognition, Vision Generation, Language
  • ArXiv 2024
    FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models
    Model-level (Sparsification) | Efficient Inference | Language Paper Code Website
  • ArXiv 2024
    E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling
    Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper
  • ArXiv 2024
    MBQ: Modality-Balanced Quantization for Large Vision-Language Models
    Model-level (Quantization) | Efficient Inference | Language Paper Code
  • ArXiv 2024
    Distilling Auto-regressive Models into Few Steps 1: Image Generation
    Algorithm-level | Efficient Inference | Vision Generation Paper Code Website
  • ArXiv 2024
    GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
    | Better Application | Vision Generation Paper Code Website Video
  • ArXiv 2024
    Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
    Algorithm-level | Efficient Inference | Vision Generation Paper
  • AAAI 2025
    Training-Free and Hardware-Friendly Acceleration for Diffusion Models via Similarity-based Token Pruning
    Model-level | Efficient Inference | Vision Generation Paper
  • NeurIPS 2024
    Rad-NeRF: Ray-decoupled Training of Neural Radiance Field
    Algorithm-level | Better Application | 3D Modeling Paper Code Video
  • NeurIPS 2024
    Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
    Algorithm-level | Better Reasoning | Language Paper Code Website Video
  • ArXiv 2024
    Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
    Model-level (Pruning) | Efficient Inference | Language Paper Code
  • ArXiv 2024
    MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
    Model-level (Sparsification) | Efficient Inference | Language Paper Code Website
  • NeurIPS 2024
    DiTFastAttn: Attention Compression for Diffusion Transformer Models
    Model-level (Sparsification), Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper Code Website Video
  • ArXiv 2024
    ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
    Model-level (Quantization) | Efficient Inference | Vision Generation Paper Code Website
  • ArXiv 2024
    DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
    Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper Code
  • ArXiv 2024
    Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
    Algorithm-level | Efficient Training, Efficient Inference | Vision Generation Paper Code
  • ECCV 2024
    MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
    Model-level (Quantization) | Efficient Inference | Vision Generation Paper Code Website Video
  • ArXiv 2024
    A Survey on Efficient Inference for Large Language Models Survey
    | Efficient Inference | Language Paper
  • ArXiv 2024
    LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K Benchmark, Evaluation
    | | Paper Code Website
  • ICCAD 2024
    Towards Floating Point-Based Attention-Free LLM: Hybrid PIM with Non-Uniform Data Format and Reduced Multiplications
    System-level | Efficient Inference | Language Paper
  • FPGA 2024
    FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
    System-level, Model-level (Quantization), Model-level (Sparsification) | Efficient Inference | Language Paper
  • DATE 2024
    DyPIM: Dynamic-inference-enabled Processing-In-Memory Accelerator
    System-level | Efficient Inference | Vision Recognition Paper
  • ICML 2024
    Evaluating Quantized Large Language Models Evaluation
    Model-level (Quantization) | Efficient Inference | Language Paper Code Video
  • CVPR 2024
    FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
    Algorithm-level | Efficient Optimization Process | Vision Generation Paper Code Video
  • ICLR 2024
    A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models
    Algorithm-level | Efficient Inference | Vision Generation Paper Code Video
  • ICLR 2024
    Skeleton-of-Thought: Prompting Large Language Models for Efficient Parallel Generation
    Algorithm-level | Efficient Inference | Language Paper Code Video
  • WACV 2024
    TCP: Triplet Contrastive-relationship Preserving for Class-Incremental Learning
    | | Paper
  • NeurIPS Workshop 2023
    LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment
    Model-level (Quantization) | Efficient Inference | Language Paper
  • NeurIPS 2023
    Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
    | | Vision Recognition Paper Code
  • ICCV 2023
    Ada3D: Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection
    Model-level (Sparsification) | Efficient Inference | Vision Recognition Paper Video
  • ICML 2023
    OMS-DPM: Deciding The Optimal Model Schedule for Diffusion Probabilistic Model
    Algorithm-level | Efficient Inference | Vision Generation Paper Code Website Video
  • AAAI 2023 (Oral)
    Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start"
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • AAAI 2023
    Memory-Oriented Structural Pruning for Efficient Image Restoration
    Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper
  • AAAI 2023
    Ensemble-in-One: Ensemble Learning within Random Gated Networks for Enhanced Adversarial Robustness
    | | Vision Recognition Paper
  • TPAMI 2023
    A Generic Graph-based Neural Architecture Encoding Scheme with Multifaceted Information
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • DATE 2022 & TCAD 2023
    Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture
    Model-level (Structure Optimization), System-level | Efficient Optimization Process, Efficient Inference | Vision Recognition Paper
  • TCAD 2022
    Exploring the Potential of Low-bit Training of Convolutional Neural Networks
    Model-level (Quantization) | Efficient Training | Vision Recognition Paper
  • CVPR 2022
    CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance
    Model-level (Structure Optimization) | Efficient Inference | Vision Recognition Paper
  • CVPR 2022
    FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning
    Algorithm-level | Efficient Training | Vision Recognition Paper
  • ECCV 2022
    CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper
  • NeurIPS 2022 (Spotlight)
    TA-GATES: An Encoding Scheme for Neural Network Architectures
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper
  • Low-Power CV 2022
    Hardware Design and Software Practices for Efficient Neural Network Inference
    Model-level (Structure Optimization), Model-level (Quantization), System-level | Efficient Inference | Vision Recognition Paper
  • TODAES 2021
    Machine learning for electronic design automation: A survey
    | | Other Paper Code
  • NeurIPS 2021
    Evaluating Efficient Performance Estimators of Neural Architectures Evaluation
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • ASP-DAC 2020
    Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search
    Model-level (Structure Optimization) | Efficient Optimization Process, Efficient Inference | Vision Recognition Paper Code
  • ECCV 2020
    A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • ECCV 2020 (Spotlight)
    DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation
    Model-level (Structure Optimization) | Efficient Inference | Vision Recognition Paper
  • ArXiv 2020
    aw_nas: A Modularized and Extensible NAS framework
    Model-level (Structure Optimization) | Efficient Inference, Efficient Optimization Process | Vision Recognition, Language Paper Code
  • Sponsors