Who We Are

The Nanoscale Integrated Circuits and System Lab, Energy Efficient Computing Group (NICS-EFC) in the Department of Electronic Engineering at Tsinghua University is led by Professor Yu Wang. The Efficient Algorithm Team (EffAlg) in the NICS-EFC group is led by Research Assistant Professor Xuefei Ning. Our team has an in-depth academic collaboration with Infinigence-AI, and fellows from many institutions including SJTU, MSR, HKU, and so on.

Our current research primarily focuses on efficient deep learning, including algorithm-level acceleration, model-level compression, model architecture design, system co-optimization, and other techniques. Our work targets several application domains, including language generative models (i.e., LLMs), vision generative models, vision understanding models and so on. Most of our projects are open sourced at the thu-nics GitHub organization (most efficient DL projects) or the imagination-research GitHub organization (some efficient DL projects and projects for broader topics; These projects are co-lead researches with Dr. Zinan Lin from MSR).

Our group welcomes all kinds of collaborations, and is continuously recruiting visiting students and engineers who are interested in efficient deep learning. If you're interested in collaborations or visiting student opportunities, email Xuefei or Prof. Yu Wang.

News

Efficient DL Projects

Technique

Target

Domain

  • Publishing House of Electronics Industry 2024
    (Chinese Book) Efficient Deep Learning: Model Compression and Design. 《高效深度学习:模型压缩与设计》 (京东有售)
    Model-level | Efficient Inference | Vision Recognition, Vision Generation, Language
  • ArXiv 2024
    Distilling Auto-regressive Models into Few Steps 1: Image Generation
    Algorithm-level | Efficient Inference | Vision Generation Paper
  • ArXiv 2024
    Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
    Algorithm-level | Efficient Inference | Vision Generation Paper
  • NeurIPS 2024
    Rad-NeRF: Ray-decoupled Training of Neural Radiance Field
    | Efficient Inference | 3D Modeling
  • NeurIPS 2024
    Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
    | | Language Paper Code Website Video
  • ArXiv 2024
    Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
    Model-level (Pruning) | Efficient Inference | Language Paper Code
  • ArXiv 2024
    MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
    Model-level (Sparsification) | Efficient Inference | Language Paper Code Website
  • NeurIPS 2024
    DiTFastAttn: Attention Compression for Diffusion Transformer Models
    Model-level (Sparsification), Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper Code Website Video
  • ArXiv 2024
    ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
    Model-level (Quantization) | Efficient Inference | Vision Generation Paper Code Website
  • ArXiv 2024
    DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
    Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper Code
  • ArXiv 2024
    Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
    Algorithm-level | Efficient Training, Efficient Inference | Vision Generation Paper Code
  • ECCV 2024
    MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
    Model-level (Quantization) | Efficient Inference | Vision Generation Paper Code Website
  • ArXiv 2024
    A Survey on Efficient Inference for Large Language Models Survey
    | Efficient Inference | Language Paper
  • ArXiv 2024
    LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K Benchmark, Evaluation
    | | Paper Code
  • ICCAD 2024
    Towards Floating Point-Based Attention-Free LLM: Hybrid PIM with Non-Uniform Data Format and Reduced Multiplications
    System-level | Efficient Inference | Language Paper
  • FPGA 2024
    FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
    System-level, Model-level (Quantization), Model-level (Sparsification) | Efficient Inference | Language Paper
  • DATE 2024
    DyPIM: Dynamic-inference-enabled Processing-In-Memory Accelerator
    System-level | Efficient Inference | Vision Recognition Paper
  • ICML 2024
    Evaluating Quantized Large Language Models Evaluation
    Model-level (Quantization) | Efficient Inference | Language Paper Code
  • CVPR 2024
    FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
    Algorithm-level | Efficient Optimization Process | Vision Generation Paper Code
  • ICLR 2024
    A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models
    Algorithm-level | Efficient Inference | Vision Generation Paper
  • ICLR 2024
    Skeleton-of-Thought: Prompting Large Language Models for Efficient Parallel Generation
    Algorithm-level | Efficient Inference | Language Paper Code
  • WACV 2024
    TCP: Triplet Contrastive-relationship Preserving for Class-Incremental Learning
    | | Paper
  • NeurIPS Workshop 2023
    LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment
    Model-level (Quantization) | Efficient Inference | Language Paper
  • NeurIPS 2023
    Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
    | | Vision Recognition Paper Code
  • ICCV 2023
    Ada3D: Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection
    Model-level (Sparsification) | Efficient Inference | Vision Recognition Paper
  • ICML 2023
    OMS-DPM: Deciding The Optimal Model Schedule for Diffusion Probabilistic Model
    Algorithm-level | Efficient Inference | Vision Generation Paper Code
  • AAAI 2023 (Oral)
    Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start"
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • AAAI 2023
    Memory-Oriented Structural Pruning for Efficient Image Restoration
    Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper
  • AAAI 2023
    Ensemble-in-One: Ensemble Learning within Random Gated Networks for Enhanced Adversarial Robustness
    | | Vision Recognition Paper
  • TPAMI 2023
    A Generic Graph-based Neural Architecture Encoding Scheme with Multifaceted Information
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • DATE 2022 & TCAD 2023
    Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture
    Model-level (Structure Optimization), System-level | Efficient Optimization Process, Efficient Inference | Vision Recognition Paper
  • TCAD 2022
    Exploring the Potential of Low-bit Training of Convolutional Neural Networks
    Model-level (Quantization) | Efficient Training | Vision Recognition Paper
  • CVPR 2022
    CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance
    Model-level (Structure Optimization) | Efficient Inference | Vision Recognition Paper
  • CVPR 2022
    FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning
    Algorithm-level | Efficient Training | Vision Recognition Paper
  • ECCV 2022
    CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper
  • NeurIPS 2022 (Spotlight)
    TA-GATES: An Encoding Scheme for Neural Network Architectures
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper
  • Low-Power CV 2022
    Hardware Design and Software Practices for Efficient Neural Network Inference
    Model-level (Structure Optimization), Model-level (Quantization), System-level | Efficient Inference | Vision Recognition Paper
  • TODAES 2021
    Machine learning for electronic design automation: A survey
    | | Other Paper Code
  • NeurIPS 2021
    Evaluating Efficient Performance Estimators of Neural Architectures Evaluation
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • ASP-DAC 2020
    Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search
    Model-level (Structure Optimization) | Efficient Optimization Process, Efficient Inference | Vision Recognition Paper Code
  • ECCV 2020
    A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • ECCV 2020 (Spotlight)
    DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation
    Model-level (Structure Optimization) | Efficient Inference | Vision Recognition Paper
  • ArXiv 2020
    aw_nas: A Modularized and Extensible NAS framework
    Model-level (Structure Optimization) | Efficient Inference, Efficient Optimization Process | Vision Recognition, Language Paper Code