Who We Are

The Nanoscale Integrated Circuits and System Lab, Energy Efficient Computing Group (NICS-EFC) in the Department of Electronic Engineering at Tsinghua University is led by Professor Yu Wang. The Efficient Algorithm Team (EffAlg) in the NICS-EFC group is led by Research Assistant Professor Xuefei Ning. Our team has an in-depth academic collaboration with Infinigence-AI, and fellows from many institutions including SJTU, MSR, HKU, and so on.

Our current research primarily focuses on efficient deep learning, including model compression, architecture design, system co-optimization, and other techniques. Our work targets several application domains, including language generative models (i.e., LLMs), vision generative models, vision understanding models and so on. Most of our projects are open sourced at the thu-nics GitHub organization (most efficient DL projects) or the imagination-research GitHub organization (some efficient DL projects and projects for broader topics; These projects are co-advising work with Dr. Zinan Lin from MSR).

Our group welcomes all kinds of collaborations, and is continuously recruiting visiting students and engineers who are interested in efficient deep learning. If you're interested in collaborations or visiting student opportunities, email Xuefei or Prof. Yu Wang.

News

Efficient DL Projects

Technique

Target

Domain

  • ArXiv 2024
    Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
    Model-level (Pruning) | Efficient Inference | Language Paper Code
  • ArXiv 2024
    Can LLMs Learn by Teaching? A Preliminary Study
    | | Language Paper Code
  • ArXiv 2024
    MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
    Model-level (Sparsification) | Efficient Inference | Language Paper Code
  • ArXiv 2024
    DiTFastAttn: Attention Compression for Diffusion Transformer Models
    Model-level (Sparsification), Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper Code
  • ArXiv 2024
    ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
    Model-level (Quantization) | Efficient Inference | Vision Generation Paper
  • ArXiv 2024
    DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
    Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper Code
  • ArXiv 2024
    Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
    Algorithm-level | Efficient Training, Efficient Inference | Vision Generation Paper Code
  • ECCV 2024
    MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
    Model-level (Quantization) | Efficient Inference | Vision Generation Paper Code
  • ArXiv 2024
    A Survey on Efficient Inference for Large Language Models Survey
    | Efficient Inference | Language Paper
  • ArXiv 2024
    LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K Benchmark, Evaluation
    | | Paper Code
  • FPGA 2024
    FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
    System-level, Model-level (Quantization), Model-level (Sparsification) | Efficient Inference | Language Paper
  • DATE 2024
    DyPIM: Dynamic-inference-enabled Processing-In-Memory Accelerator
    System-level | Efficient Inference | Vision Recognition Paper
  • ICML 2024
    Evaluating Quantized Large Language Models Evaluation
    Model-level (Quantization) | Efficient Inference | Language Paper Code
  • CVPR 2024
    FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
    Algorithm-level | Efficient Optimization Process | Vision Generation Paper Code
  • ICLR 2024
    A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models
    Algorithm-level | Efficient Inference | Vision Generation Paper
  • ICLR 2024
    Skeleton-of-Thought: Prompting Large Language Models for Efficient Parallel Generation
    Algorithm-level | Efficient Inference | Language Paper Code
  • WACV 2024
    TCP: Triplet Contrastive-relationship Preserving for Class-Incremental Learning
    | | Paper
  • NeurIPS Workshop 2023
    LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment
    Model-level (Quantization) | Efficient Inference | Language Paper
  • ICCV 2023
    Ada3D: Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection
    Model-level (Sparsification) | Efficient Inference | Vision Recognition Paper
  • ICML 2023
    OMS-DPM: Deciding The Optimal Model Schedule for Diffusion Probabilistic Model
    Algorithm-level | Efficient Inference | Vision Generation Paper Code
  • AAAI 2023 (Oral)
    Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start"
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • AAAI 2023
    Memory-Oriented Structural Pruning for Efficient Image Restoration
    Model-level (Structure Optimization) | Efficient Inference | Vision Generation Paper
  • AAAI 2023
    Ensemble-in-One: Ensemble Learning within Random Gated Networks for Enhanced Adversarial Robustness
    | | Vision Recognition Paper
  • TPAMI 2023
    A Generic Graph-based Neural Architecture Encoding Scheme with Multifaceted Information
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • DATE 2022 & TCAD 2023
    Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture
    Model-level (Structure Optimization), System-level | Efficient Optimization Process, Efficient Inference | Vision Recognition Paper
  • TCAD 2022
    Exploring the Potential of Low-bit Training of Convolutional Neural Networks
    Model-level (Quantization) | Efficient Training | Vision Recognition Paper
  • CVPR 2022
    CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance
    Model-level (Structure Optimization) | Efficient Inference | Vision Recognition Paper
  • CVPR 2022
    FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning
    Algorithm-level | Efficient Training | Vision Recognition Paper
  • ECCV 2022
    CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper
  • NeurIPS 2022 (Spotlight)
    TA-GATES: An Encoding Scheme for Neural Network Architectures
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper
  • Low-Power CV 2022
    Hardware Design and Software Practices for Efficient Neural Network Inference
    Model-level (Structure Optimization), Model-level (Quantization), System-level | Efficient Inference | Vision Recognition Paper
  • NeurIPS 2021
    Evaluating Efficient Performance Estimators of Neural Architectures Evaluation
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • ASP-DAC 2020
    Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search
    Model-level (Structure Optimization) | Efficient Optimization Process, Efficient Inference | Vision Recognition Paper Code
  • ECCV 2020
    A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS
    Model-level (Structure Optimization) | Efficient Optimization Process | Vision Recognition Paper Code
  • ECCV 2020 (Spotlight)
    DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation
    Model-level (Structure Optimization) | Efficient Inference | Vision Recognition Paper
  • ArXiv 2020
    aw_nas: A Modularized and Extensible NAS framework
    Model-level (Structure Optimization) | Efficient Inference, Efficient Optimization Process | Vision Recognition, Language Paper Code