Introduction to PickScore
PickScore is an innovative project linked to the fascinating world of text-to-image generation. It provides tools and datasets that help in understanding and predicting user preferences when images are generated from text prompts. This project forms part of the broader research presented in the paper "Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation."
Datasets and Tools
At the core of PickScore are the open-source datasets and the models that enable developers and researchers to delve into user preferences in text-to-image tasks. Two main datasets, Pick-a-Pic v1 and Pick-a-Pic v2, are available on Hugging Face. The former is the original dataset used in the paper, while the latter offers over a million examples for those who need more extensive data. PickScore also provides a model trained using Pick-a-Pic v1, which facilitates preference prediction.
Web Application and Demo
For those interested in practical experiences, the Pick-a-Pic web application allows users to contribute to the dataset and experiment with this technology. Additionally, a simple demonstration of PickScore is accessible through HF Spaces, showcasing its capabilities in a user-friendly environment.
Installation Instructions
To get started with PickScore, users can set up a virtual environment and install PyTorch alongside other necessary packages. The installation is designed to be flexible, allowing users to either download all packages at once or select them according to specific requirements—whether for training or evaluation purposes. Conda and pip are the primary tools recommended for installing dependencies.
Inference with PickScore
PickScore can be used as a prediction tool to ascertain user preferences for images based on given text prompts. By leveraging the model trained from the Pick-a-Pic dataset, users can input prompts and images to calculate probabilities of preference using the provided Python script. This functionality is intended to assist in understanding which generated images are more likely to meet user satisfaction.
Downloading the Pick-a-Pic Dataset
Those interested in leveraging the large-scale data provided by Pick-a-Pic can download the datasets directly from Hugging Face. For faster performance and to avoid large downloads, data streaming is recommended. The Pick-a-Pic datasets are substantial, with the v1 dataset occupying approximately 190GB of space.
Training PickScore
For users aiming to train PickScore from the beginning, this can be done either locally or on a Slurm cluster. The training process requires considerable computing power and is specifically equipped to run with multiple GPUs to enhance efficiency. Instructions for both modes are provided to cater to different user setups.
Testing PickScore on Pick-a-Pic
Users interested in evaluating the performance of their trained models can use the provided script which tests the preference predictor against the Pick-a-Pic dataset, thus ensuring that the model performs well in practical scenarios.
Contribution and Citation
The creators of PickScore invite users to explore, analyze, and contribute to their datasets, thereby facilitating advancements in text-to-image generation research. If users find the project beneficial to their work, they are encouraged to cite the original paper to aid further dissemination and recognition of this research.
By offering comprehensive datasets and user-friendly tools, PickScore supports researchers and developers in exploring the intricate connections between text prompts and user image preferences, paving the way for innovation in the field.