MDT
MDTv2 excels in image synthesis, achieving a cutting-edge FID score of 1.58 on ImageNet. It boasts a learning speed over 10 times faster than DiT by utilizing a unique masked latent modeling scheme that improves contextual learning. MDTv2 effectively reconstructs complete images, enhancing training efficiency and output quality, positioning it as a robust tool for sophisticated image generation.