DiTFastAttn: Attention Compression for Diffusion Transformer Models
Diffusion Transformers (DiT) have emerged as a powerful tool for image and video generation tasks. However, their quadratic computational complexity due to the self-attention mechanism poses a significant challenge, particularly for high-resolution and long video tasks. This paper mitigate the computational bottleneck in DiT models by introducing a novel post-training model compression method. We identify three key redundancies in the attention computation during DiT inference and we propose three techniques.
- Window Attention with Residual Caching - Reduces spatial redundancy.
- Temporal Similarity Reduction - Exploit the similarity between steps.
- Conditional Redundancy Elimination - Skips redundant computations during conditional generation.
Generation Speed Comparasion
data:image/s3,"s3://crabby-images/3e7e7/3e7e7b1fae347733b1bf8393699ce10432cf95ba" alt="DiTFastAttn demo"
Image Generation Results
data:image/s3,"s3://crabby-images/5c3b5/5c3b52f10fa5eac9839e7c61ed118d7eb07bce5e" alt="pixart1k_result"
data:image/s3,"s3://crabby-images/a21f2/a21f2030d22ac788954b68809612e0eb43489c72" alt="DiTFastAttn_overview"
Video Generation Results
data:image/s3,"s3://crabby-images/38a3a/38a3a4adce86fb70f4aee642f974686053d1b01f" alt="raw"
data:image/s3,"s3://crabby-images/ae3fb/ae3fb600fa79ae1e870ffc1a72b792ae71d398c9" alt="D1"
data:image/s3,"s3://crabby-images/40c47/40c47cc35ef3e6c8955eb23d389d3fc82269e39b" alt="D2"
data:image/s3,"s3://crabby-images/6a7f4/6a7f4bd6a7f766ee54ac52035703fcc8cc36d3a2" alt="D3"
data:image/s3,"s3://crabby-images/7b1cd/7b1cd4f3b58cf941e30ba6e1ebfe2ac7ce0206e4" alt="D4"
data:image/s3,"s3://crabby-images/e8c34/e8c3460e92bad895d43043b732b71e25b0778abf" alt="D5"
data:image/s3,"s3://crabby-images/41f69/41f69fb2e8063aa9e6c6d43e1015ce76aba626a8" alt="D6"
data:image/s3,"s3://crabby-images/6b7c4/6b7c4122c45a58b6c1c5863cc204b38bc87e2d23" alt="raw"
data:image/s3,"s3://crabby-images/e7083/e7083f10b2eeffe5237ba23f613b4209cdc359f8" alt="D1"
data:image/s3,"s3://crabby-images/639df/639dfd2f915bb06201a6dfab9eb3789e461ccb2c" alt="D2"
data:image/s3,"s3://crabby-images/07b22/07b223b3760344f7ad24a00fd00e33d0c178d49a" alt="D3"
data:image/s3,"s3://crabby-images/983f0/983f06e745f3ea8c01e0390f099607ddbe680057" alt="D4"
data:image/s3,"s3://crabby-images/467d7/467d7b45353d79b24c4a1cf9d9bbd46916f178cd" alt="D5"
data:image/s3,"s3://crabby-images/be932/be9321ca4c0b7f6e6abba87edd7d353abc878156" alt="D6"
FLOPs Reduction
data:image/s3,"s3://crabby-images/79036/7903622c3f67a825b7376076e2c165cf8824e0b5" alt="DiTFastAttn_overview"
You can find the code for DiTFastAttn on GitHub at DiTFastAttention. Feel free to check out the repository for more details and to access the code.