en

#multimodal embedding

Chinese-CLIP is a Chinese adaptation of the CLIP model, trained with around 200 million image-text pairs for tasks such as image-text feature extraction, cross-modal retrieval, and zero-shot classification. Building on the open_clip project, it is tailored for Chinese data with enhancements like coreml model conversion, fine-tuning through knowledge distillation, and deployment using ONNX and TensorRT. The model demonstrates strong performance in benchmark tasks such as text-to-image retrieval and zero-shot classification.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]