#Benchmark

Logo of deep-text-recognition-benchmark
deep-text-recognition-benchmark
This four-stage framework provides a neutral platform for evaluating scene text recognition models, focusing on improving accuracy, speed, and memory usage. The project features a PyTorch implementation, pretrained models, and comprehensive resources to assist in training and evaluation, enabling objective and consistent comparisons of STR modules across diverse datasets.
Logo of OSWorld
OSWorld
OSWorld provides Docker support to enhance the setup of virtual environments across platforms such as AWS and Azure. The project has updated its environment code for better platform integration beyond VMware, adding compatibility with VirtualBox. It offers options for configuring VMware Workstation Pro, using Docker on non-bare metal servers, or choosing the lightweight desktop-env installation, making it adaptable to different infrastructures. The project includes detailed instructions, agent baselines, and experimental setups to facilitate multimodal agent benchmarks in computer environments.
Logo of meta-dataset
meta-dataset
Meta-Dataset offers a reliable benchmarking suite for few-shot learning, compatible with TensorFlow Datasets API for varied evaluation protocols. It provides a codebase supporting models like CrossTransformers and FLUTE, facilitating spatial correspondence and generalization in new datasets. Resources include installation guides, data processing tools, and training models, along with instructions for reproducing experiments and contributing to the leaderboard. This open-source project effectively tackles challenges in few-shot classification, enhancing model evaluation on comprehensive, large-scale tasks.
Logo of Parameter-Efficient-Transfer-Learning-Benchmark
Parameter-Efficient-Transfer-Learning-Benchmark
Investigate a benchmark for parameter-efficient transfer learning in computer vision, assessing 25 leading algorithms on 30 varied datasets. The platform provides a modular codebase for comprehensive analysis in image recognition, video action recognition, and dense prediction. Pre-trained models like ViT and Swin are used to attain high performance with fewer parameters. The benchmark facilitates easy evaluation and continuous updates for new PETL methods and applications.
Logo of OpenSTL
OpenSTL
OpenSTL offers a robust framework for spatio-temporal predictive learning with a variety of methods and use cases. Its modular design facilitates integration and customization for fields like video prediction, weather forecasting, and traffic analysis. The project provides implementations in PyTorch Lightning and naive PyTorch, addressing various technical preferences. Key features include adaptable code design, established benchmarks, a wide array of datasets, and necessary dependencies. The platform's comprehensive documentation, model zoo, and visualization tools enable researchers to effectively engage with and enhance the OpenSTL framework.
Logo of KwaiAgents
KwaiAgents
KwaiAgents, developed by Kuaishou Technology, is an open-source initiative featuring the KAgentSys-Lite system and KAgentLMs models. It provides tools for agent planning, reflection, and tool usage. Key datasets include KAgentInstruct, consisting of over 200k agent-related instructions, and KAgentBench, offering comprehensive evaluation data. This project supports the development and testing of AI agent systems.
Logo of awesome-model-quantization
awesome-model-quantization
This repository serves as an extensive resource on model quantization, collecting essential papers, documents, and codes for researchers in the field. It features continuous updates and includes topics like network binarization and benchmarking with BiBench and MQBench, as well as comprehensive surveys on quantization methods and binary neural networks. Highlighting the 'Awesome Efficient AIGC' initiative, the project focuses on contemporary techniques for compressing and speeding up large language and diffusion models. Contributions are welcomed to enhance the breadth and utility of this repository.
Logo of model-vs-human
model-vs-human
Discover 'modelvshuman,' a Python toolkit offering benchmarks for the performance gap between human and machine vision. This tool evaluates PyTorch and TensorFlow models across 17 out-of-distribution datasets with human comparison data. Key features include an extensive model zoo with over 20 standard supervised models, self-supervised contrastive models, vision transformers, and adversarially robust models. It provides straightforward installation and management, ideal for researchers and developers assessing model generalization. Determine if models achieve human-like behavior and OOD robustness with precise evaluations.
Logo of ReVersion
ReVersion
Explore Relation Inversion in image processing, utilizing diffusion techniques to capture and synthesize relations in images. ReVersion provides tools for generating relation-specific images across varied contexts, with optimized code and integration with platforms such as Hugging Face. Learn about updates and benchmarks for enhanced accessibility.
Logo of YOLOv6
YOLOv6
YOLOv6 is an adaptive object detection framework built for industrial scalability. It features advanced segmentation, mobile capabilities, and optimized performance across various hardware platforms, including low-power devices. This robust single-stage detection model ensures versatile deployment and seamless integration into existing systems, ideal for real-time processing and large-scale tasks, prioritizing accuracy and reliability in demanding environments.
Logo of code-act
code-act
CodeAct utilizes executable code actions to unify LLM agents' actions, showing increased effectiveness in comparison to Text and JSON. It is integrated with a Python interpreter for dynamic action revision. Key elements include CodeActInstruct for instruction tuning and the CodeActAgent for out-of-domain tasks. Detailed setup includes Kubernetes deployment, Docker, and LLama.cpp for serving models. Access updates, datasets, and the scholarly paper for detailed insights.
Logo of BLIVA
BLIVA
BLIVA offers a streamlined approach to handling visual questions abundant in text, securing significant rankings in both perception and cognition tasks. Featuring models that are commercially and openly accessible, BLIVA demonstrates high efficacy in multiple VQA benchmarks, providing precise insights across varied datasets.
Logo of MixEval
MixEval
Discover a state-of-the-art evaluation suite for large language models using dynamic and ground-truth-based benchmarks which ensure precise and economical model assessment. MixEval stands out by providing a fast and budget-friendly evaluation, cutting time and costs to only 6% of standard evaluations, while keeping a strong correlation with actual model rankings. This methodical approach, updated routinely, employs both free-form and multiple-choice formats for comprehensive and unbiased AI model analysis, perfect for researchers and developers in need of dependable, reproducible evaluation solutions.
Logo of Baichuan-13B
Baichuan-13B
Baichuan-13B is a scalable open-source language model with 130 billion parameters, excelling in Chinese and English benchmarks. It offers both pre-trained and alignment models for versatile applications. Supporting ALiBi positional encoding and bilingual text, it provides deployment solutions with int8 and int4 quantization, reducing resource needs. Available for academic and commercial use upon request, Baichuan-13B enhances efficient inference in language modeling.
Logo of SwiftInfer
SwiftInfer
SwiftInfer integrates TensorRT to enhance Streaming-LLM, enabling LLM inference with extended input length and mitigating model collapse through Attention Sink technology. Developed from TensorRT-LLM, it offers a flexible framework for the deployment of efficient, multi-turn conversational AI systems. The platform features detailed installation guidance, compatibility checks, and benchmarking data against the original PyTorch version. SwiftInfer persistently advances to lead in LLM technology, underlining effective integration and computational efficiency. Discover a solid solution for sophisticated AI inference.
Logo of superpixel-benchmark
superpixel-benchmark
This repository provides a detailed evaluation of 28 superpixel algorithms utilizing 5 datasets to assess visual quality, performance, and robustness. It acts as a supplemental resource for a comparison published in Computer Vision and Image Understanding, 2018. Key updates include Docker implementations and evaluations of average metrics. The repository allows for fair benchmarking by optimizing parameters on separate training sets, focusing on metrics such as Boundary Recall and Undersegmentation Error.
Logo of FL-bench
FL-bench
This project provides a robust framework for assessing advanced Federated Learning methods focused on user data privacy. It covers a range of Federated Learning approaches, including traditional, personalized, and domain generalization techniques like FedAvg, FedProx, pFedSim, and FedSR. With detailed environment setup instructions, including PyPI and Docker, and a step-by-step guide for experiments, the benchmark supports thorough exploration and application of FL strategies. It also offers parallel training through Ray and visualization capabilities with Visdom and Tensorboard, aiding precise optimization of FL models.
Logo of Semi-supervised-learning
Semi-supervised-learning
Discover the USB PyTorch-based package for Semi-Supervised Learning, offering a practical framework for creating AI models in computer vision, natural language processing, and audio classification. This package includes 14 algorithms based on Consistency Regularization to optimize small datasets, making advanced AI accessible to smaller teams. The library provides essential resources, from data preparation to algorithm evaluation with comprehensive benchmarking, catering to researchers and developers aiming to improve machine learning projects.