CVPR2024-Papers-with-Code - Stay Informed with CVPR 2024's Innovations in Computer Vision

Welcome to the CVPR 2024 Papers with Code Collection!

The CVPR 2024 decisions have now been released on OpenReview, and this project curates a comprehensive collection of the most notable papers and their associated code repositories, making it easier for researchers, developers, and enthusiasts to access the latest advancements in computer vision. This collection is organized into a wide array of categories, reflecting the diverse subfields and interests within the computer vision community.

Quick Navigation

Below is a glance at some of the many categories included in the CVPR 2024 Papers with Code collection. Each category contains pioneering papers, gives you direct access to the code, and highlights new developments in their respective fields.

3DGS (Gaussian Splatting): Delve into advanced 3D rendering techniques. Check out "Scaffold-GS" for structured 3D rendering using Gaussians, which offers adaptability across views. For real-time applications, explore "GPS-Gaussian," which focuses on human view synthesis with great generalization.
Avatars: If realistic human avatars pique your interest, the project "GaussianAvatar" offers impressive modeling from single videos using 3D Gaussians.
Backbone: This section sheds light on innovative network architectures like "RepViT," which rethinks mobile CNNs through the lens of vision transformers, offering robust visual perception.
CLIP: Explore extensions to the popular CLIP model that enhance focus and fairness in vision-language learning with projects like "Alpha-CLIP."
Embodied AI: Technologies blending AI with physical entities are growing. Notable here are "EmbodiedScan" for holistic 3D multi-modal perception, and "MP5," which operates within Minecraft environments for open-ended intelligent systems.
NeRF (Neural Radiance Fields): Dive into "PIE-NeRF," which brings an interactive physics-based component to elastodynamics through NeRF.
Diffusion Models: A booming category for image generation. "InstanceDiffusion" provides instance-level control while "DeepCache" proposes acceleration methods that are resource-efficient.
Vision Transformer: This part of the directory offers insights into advanced vision transformers, highlighting work like "TransNeXt" for robust visual perception.
Object Detection: This category showcases advanced detection algorithms. For instance, "DETRs Beat YOLOs" presents developments surpassing even YOLO models in real-time object detection tasks.

Beyond these categories, the collection encompasses numerous other topics, including multi-modal large language models (MLLM), re-identification (ReID), self-supervised learning, medical image processing, video understanding, and 3D reconstruction, amongst others. Each section is illustrated with related projects, accompanied by links to papers and source code.

Participate and Share

The project encourages the academic community to contribute by submitting issues or suggestions, sharing new CVPR 2024 papers, and linking open-source implementations. It supports an ongoing collaborative effort to advance and democratize research within the broader field of computer vision.

Join the Academic Conversation

To keep abreast of cutting-edge discussions and resources, you are invited to join the CVer academic chat community, a vibrant hub for computer vision and AI knowledge sharing.

Overall, the CVPR 2024 Papers with Code collection presents a vital resource for anyone looking to deepen their understanding, contribute to ongoing developments, or keep pace with cutting-edge advancements in the computer vision landscape.