Long-CLIP
Long-CLIP expands CLIP's text input limit to 248 characters, enhancing performance in long-caption text-image retrieval with a 20% increase in R@5 metric. It also improves traditional text-image retrieval by 6%. This solution is versatile for developers and researchers needing long-text processing. Recently updated with model checkpoints and evaluation codes, the project is featured in ECCV2024, providing insights for SDXL and Urban-1k dataset enhancement.