OmniTokenizer
OmniTokenizer is a model that efficiently tokenizes images and videos, delivering top-notch reconstruction across diverse datasets. It supports high-resolution and extended videos, integrates with language and diffusion models, and excels in visual generation. Available in VQVAE and VAE versions, it comes pretrained on extensive datasets for seamless integration. The project includes detailed setup, training, and evaluation guides, making it a valuable resource for researchers and developers in visual generation.