clip-video-encode - Streamlined Video Frame Encoding with CLIP Image Encoder

Introducing clip-video-encode

clip-video-encode is a robust tool designed to easily compute embeddings from video frames using a CLIP (Contrastive Language–Image Pretraining) model. This versatile project allows users to transform video content into meaningful data representations seamlessly.

Installation

There are two primary methods for installation:

Using pip: Simply run the following command in your terminal to install via pip:
```
pip install clip-video-encode
```
Building from source: For those interested in building from the source, the following command will set you up:
```
python setup.py install
```

Usage

clip-video-encode is operated through a command-line interface, where users specify inputs, desired outputs, and configuration flags.

Input Source Options (SRC):
- Local path to an mp4 file.
- YouTube link.
- Path to a text file listing multiple video sources.
- Direct input as a list of video links.
Flags for Customization:
- --dest: Location to save the generated embeddings. By default, it appends .npy to the source path.
- --output_format: Choose how you want the output; options are "files" or "webdataset".
- --take_every_nth: Determines frame capture frequency (e.g., every 5th frame).
- --frame_workers: Number of processes assigned to reading video frames.
- --frame_memory_size: Memory allocation for frame reading, in gigabytes.
- --metadata_columns: Specify metadata columns for analysis.
- --use_dst_name: Use a destination name as suggested by a secondary tool, video2numpy.
- --distribute: Defines distribution strategy, such as using 'slurm' or none.
- --oc_model_name: Specifies the OpenCLIP model architecture.
- --pretrained: Indicates which pretrained weights to use.

API Access

In addition to command-line usage, clip-video-encode provides a convenient API. This single function, clip_video_encode, can be incorporated into Python scripts for more programmable control.

Example of using the API:

import glob
from clip_video_encode import clip_video_encode

VIDS = glob.glob("some/path/my_videos/*.mp4")
EMBEDDING_DIR = "some/path/my_embeddings"
take_every_5 = 5

clip_video_encode(VIDS, EMBEDDING_DIR, take_every_5)

Who is Using clip-video-encode?

A few prominent examples include:

CLIP-Kinetics700: This project compresses the massive 700GB Kinetics700 dataset to around 8GB by encoding at 1 frame per second (FPS).
CLIP-WebVid: Handles the vast WebVid dataset, containing 10 million videos, encoding them into CLIP ViT-B/32 embeddings at 1 FPS.

Examples and Testing

clip-video-encode demonstrates its capabilities through practical examples:

Thing Detector: Utilizes generated embeddings to identify objects in videos.
Large Dataset Processing: Provides a guide for transforming extensive datasets like WebVid into CLIP embeddings.

To maintain the code and performance, users can set up a virtual environment, activate it, and run tests to ensure everything is functioning correctly. Code formatting and specific test execution options are available for developers interested in contributing or modifying the code.

The clip-video-encode project is a powerful solution for anyone looking to harness the power of CLIP embeddings in video analysis, offering flexibility and ease of use for various applications.