#benchmarking
FlexGen
FlexLLMGen enables efficient large language model inference on single GPUs by optimizing memory usage through IO offloading and effective batch management. Designed for throughput-oriented tasks, it reduces costs while supporting applications in benchmarking and data processing. While less suited for small-batch operations, FlexLLMGen remains a viable solution for scalable AI deployments.
bench
Bench is a versatile toolkit for assessing Large Language Models (LLMs) in production scenarios. It facilitates the comparison of different LLMs, prompt strategies, and generation parameters such as temperature and token count. By standardizing LLM evaluation workflows, Bench empowers open-source LLMs to rival leading closed-source APIs and converts leaderboard ranks into practical scores. Easily deployable in Python environments, it also offers optional local result serving and extensive documentation to assist in setup and usage. Engage with the community on Discord for ongoing support.
RGBD-semantic-segmentation
Delve into extensive archives of academic papers on RGBD semantic segmentation. With regular updates, this repository offers insights on datasets, performance metrics, and benchmark results, supporting focused research. Noteworthy updates highlight integration with key datasets such as NYUDv2, SUN RGB-D, and Cityscapes, enriched by performance assessments using metrics like Pixel Accuracy and mIoU. Uncover the latest methodologies propelling advancements in image segmentation.
python_audio_loading_benchmark
This project evaluates the speed and format support of audio I/O performance across Python libraries, essential for real-time machine learning models. It benchmarks libraries such as scipy, librosa, and torchaudio on their efficiency at loading audio into Numpy, PyTorch, and TensorFlow tensors. The results highlight the quickest options for various output types. The framework also facilitates the generation of sample audio data and the setup of testing environments using Docker or virtual setups for simplified replication and contribution.
LawBench
This page provides an objective overview of LawBench, a benchmark for evaluating large language models (LLMs) in the Chinese legal system. LawBench highlights tasks such as legal entity recognition and crime amount calculation across three cognitive dimensions: memory, understanding, and application. Unique metrics like the waiver rate assess models' legal query responses, with evaluations on 51 LLMs offering insights into multilingual and Chinese LLM performance in various legal contexts.
awesome-low-light-image-enhancement
This compilation offers resources crucial for enhancing images in low light conditions, beneficial for fields such as night surveillance and automated driving. It includes a well-curated selection of datasets and enhancement methods, such as learning-based and Retinex-based approaches, alongside diverse metrics. Recently updated with ICCV2023 papers, it serves as a vital resource for researchers and developers to elevate the quality of low light imagery and video. Contributions and insights are welcomed through the issue tracker or pull requests, promoting a collaborative space for progress in this domain.
sbc-reviews
This repository offers detailed real-time data and reviews on various Single-Board Computers (SBCs) through meticulous testing. Key models like Raspberry Pi and Orange Pi are included, providing insights into CPU, GPU, memory, disk, and network performances. Engage in discussions for potential board testing and stay informed on top SBCs, ideal for tech enthusiasts and developers seeking hardware insights.
UnboundedNeRFPytorch
This project benchmarks cutting-edge unbounded Neural Radiance Fields (NeRF) algorithms, offering a streamlined, high-performance code repository. The results highlight comparisons with widely-used methods such as NeRF++, Plenoxels, and DVGO, showcasing notable PSNR improvements. With practical guidelines on installation, data processing, and training, this project is a valuable resource for researchers and developers aiming for optimized neural radiance field performance using public datasets. The project also provides ongoing updates and comprehensive documentation for building custom NeRFs.
fashion-mnist
Fashion-MNIST provides a modern alternative to the traditional MNIST dataset, featuring 28x28 grayscale images of Zalando's clothing articles across 10 categories. This dataset includes 60,000 training and 10,000 testing samples, making it suitable for rigorous testing of machine learning models. Unlike MNIST, Fashion-MNIST is designed to challenge algorithms with more complex image classification tasks. Researchers can easily integrate this dataset using popular libraries like TensorFlow and PyTorch, supporting the creation of more robust machine learning solutions.
hyrise
Hyrise is a versatile in-memory database ideal for research in data management. With full SQL capabilities and advanced query optimization, it seamlessly runs benchmarks like TPC-H and TPC-DS. It's a key resource for academics and professionals developing on Linux and macOS, optimized for server-grade infrastructure, and configurable via native, Nix, or Docker environments.
benchmark
This open source benchmark suite enables the evaluation of PyTorch performance with a range of structured models and settings. Supporting various Python versions and CUDA options, it offers a flexible benchmarking environment. The suite includes standardized model testing, AWS-based machine tuning automation, and diverse benchmarking methods like test scripts and customizable benchmarks. It is a useful resource for developers and researchers, providing performance insights and supporting continuous integration in PyTorch.
extension-cpp
Learn how to create custom C++ and CUDA extensions in PyTorch to improve computational efficiency. The project features a 'mymuladd' operation with support for both CPU and CUDA, compatible with PyTorch 2.4+. Includes straightforward build and test commands and a benchmark comparison across Python, C++, and CUDA implementations. Developed by Peter Goldsborough and Richard Zou, it supports developers in enhancing PyTorch application performance.
langchain-benchmarks
LangChain Benchmarks provides an organized framework for evaluating LLM-related tasks, with a focus on end-to-end use cases and integration with LangSmith. Users can gain insights into dataset collection and task evaluation methods. The project also encourages community contribution for improving benchmarking techniques, with results discussed in blogs. Access detailed tool usage documentation, learn how to recreate benchmarks, and explore historical data through archived benchmarks for deeper insights.
anomalib
Anomalib is a deep learning library for anomaly detection with a modular API and CLI. It supports visual anomaly detection and allows model export to OpenVINO for enhanced performance on Intel hardware. Equipped with benchmarking tools and experiment tracking integration, it aids in developing innovative detection models efficiently across datasets.
uncertainty-baselines
Uncertainty Baselines provides comprehensive resources for researchers in deep learning, focusing on uncertainty and robustness. It includes implementations of standard and cutting-edge methods, aiding new research ideas and applications. The project emphasizes minimal dependencies for easy customization and suggests benchmarking best practices for precise result comparisons. While not yet stable, it is usable on platforms like Google Cloud and Colab with frameworks such as TensorFlow, Jax, and PyTorch. Perfect for researchers requiring consistent and reliable baselines.
x-stable-diffusion
Improve Stable Diffusion image generation with optimized, cost-effective techniques using AITemplate and TensorRT. Includes advanced benchmarks and user-friendly CLI, compatible with Google Colab.
Feedback Email: [email protected]