#NeurIPS
ShareGPT4Video
ShareGPT4Video presents a comprehensive video-text dataset featuring 40K captions generated by GPT4-Vision. It includes adaptable video captioning models like ShareGPT4Video-8B and ShareCaptioner-Video, which significantly enhance text-to-video applications. The initiative offers accessible demos and extensive resources, including publications, project documentation, datasets, and source code, all contributing to the advancement of video comprehension in AI. Acknowledged by NeurIPS 2024, ShareGPT4Video is central to the progress in video-language modeling.
meta-dataset
Meta-Dataset offers a reliable benchmarking suite for few-shot learning, compatible with TensorFlow Datasets API for varied evaluation protocols. It provides a codebase supporting models like CrossTransformers and FLUTE, facilitating spatial correspondence and generalization in new datasets. Resources include installation guides, data processing tools, and training models, along with instructions for reproducing experiments and contributing to the leaderboard. This open-source project effectively tackles challenges in few-shot classification, enhancing model evaluation on comprehensive, large-scale tasks.
awesome-model-based-RL
This repository provides a detailed collection of research papers on model-based reinforcement learning (MBRL), consistently updated with the latest developments. It includes papers from prestigious conferences such as NeurIPS and ICML and presents a taxonomy of MBRL algorithms categorized into 'Learn the Model' and 'Given the Model'. This resource is beneficial for studying foundational works as well as the latest innovations in the field, offering insights into various algorithmic strategies in model-based RL.
continual-learning
This repository offers a PyTorch implementation for exploring continual learning through non-overlapping task sequences. It accommodates academic settings with options for task-incremental and task-free learning, employing methods like Synaptic Intelligence and Elastic Weight Consolidation. Interactive demos and scripts for custom experiments are provided, and visdom integrations enable real-time training visualization. Referenced benchmarks originate from the 'Three types of incremental learning' study in Nature Machine Intelligence.
Awesome-Monocular-3D-detection
Browse a detailed and continuously updated collection of papers on monocular 3D object detection from 2016 to 2024. This repository showcases the latest methods such as complementary depths and pseudo-labeling frameworks including MonoCD and UniMODE. Access exhaustive methodologies, publication links, and implementations to gain insights into improving detection accuracy in autonomous driving and AI applications. Stay informed on the evolving technologies within the industry.
CogView
CogView uses a 4 billion parameter transformer model for general text-to-image generation. It includes code releases and demos, with PB-relax and Sandwich-LN techniques for stable transformer training. While supporting multiple languages, CogView primarily uses Chinese text input with recommended English translations. It offers pretrained models, inference, and super-resolution features, along with detailed setup instructions for various environments, suitable for complex AI tasks, including both single and multi-node training.
stark
STaRK provides a comprehensive benchmark for assessing the retrieval effectiveness of large language models on knowledge bases. It includes practical applications in areas like product search, academic inquiry, and biomedicine, offering realistic query challenges to spur advancements in retrieval technology. With easy installation via pip, resources on Hugging Face, and a dedicated leaderboard, STaRK supports researchers in refining context-specific retrieval strategies.
CAGrad
CAGrad introduces a novel approach to multitask learning by utilizing conflict-averse gradient descent, which optimizes multiple objectives simultaneously. Recognized at NeurIPS 2021, this methodology reduces the computational burden in calculating task gradients, enhancing efficiency for varied applications. With the addition of FAMO, the tool further supports dynamic optimization without computing all task gradients. Experiments on NYU-v2, CityScapes, and Metaworld datasets illustrate its effectiveness in image-to-image prediction and reinforcement learning. This resource aids researchers in optimizing multitask objectives with minimal resource usage.
awesome-ebm
Awesome EBM offers a curated collection of works on energy-based learning, including papers, workshops, and libraries. It serves as a crucial resource for researchers and practitioners delving into energy-based models. Key areas include data generation, density estimation, and adversarial robustness. Featuring reverse chronological research papers from noteworthy conferences such as NeurIPS and ICML, it provides a window into current advancements. Additionally, it underscores applications in image and language modeling, acting as an informative guide for implementing energy-based techniques.
MobileAgent
Mobile-Agent leverages multi-agent collaboration for mobile device management, integrating visual perception to streamline navigation. It offers versions with enhanced speed and reduced memory use across platforms like Mac and Windows. Recognized at NeurIPS 2024, it provides easy, configuration-free access through demos on platforms like Hugging Face.
UltraPixel
UltraPixel advances high-resolution image synthesis, facilitating the creation of detailed and high-quality images across multiple resolutions. Recent updates improve compatibility with PyTorch and Torchvision, optimizing generation speed on RTX 4090 GPUs. It offers text-guided and personalized image generation through a user-friendly Gradio interface and employs advanced pre-trained models. Memory-efficient techniques support resolutions up to 4K.
sige
The Spatially Incremental Generative Engine (SIGE) optimizes image editing by focusing computation on edited regions, thus decreasing computational workload for models like DDPM, Stable Diffusion, and GauGAN without compromising image quality. Featuring significant performance gains on NVIDIA RTX 3090 and Apple M1 hardware, SIGE leverages GAN Compression among other techniques. It offers broad application support, including Stable Diffusion compatibility and Mac MPS backend optimization, particularly benefiting the M1 MacBook Pro. The project provides accessible resources, enabling comprehensive experimentation and benchmarking on widely used platforms.
Feedback Email: [email protected]