#Python library
Scrapegraph-ai
ScrapeGraphAI is an open-source Python library designed for efficient data extraction using language models and graph logic. It supports extraction from both websites and local files such as XML, HTML, and JSON. The library offers flexible pipeline creation for various scraping needs, additional language model integrations, and advanced semantic processing tools. Easy to install via PyPI, it also provides features for script generation and audio output. Enhanced by OpenAI support and local model options, it serves as a versatile solution for web scraping tasks.
camel
CAMEL is an open-source project aimed at understanding the scaling laws of AI agents by exploring their behaviors and capabilities. It supports a variety of agents, tasks, and simulated environments to advance research. The project offers installation via PyPI and source code, and facilitates integration with platforms like HuggingFace. Comprehensive documentation and examples assist users in setup and model deployment, inviting contributions to the study of AI agent dynamics.
langchain-ray
This repository offers a range of examples to quickly build and deploy large language model applications using Python libraries LangChain and Ray. It includes cases like open-source search engines, scalable embedding generation, and retrieval-based QA systems, all designed to integrate efficiently into your projects. The site also provides community links, documentation, and resources for comprehensive support in LLM development.
blackmaria
Black Maria, a Python library, revolutionizes web scraping by employing natural language to access any webpage's data. Compatible with Python 3.6+, it is easily installed via pip. The library employs guardrails, guiding instructions for crafting structured output from LLMs. Black Maria effectively extracts organized data, like movie summaries and casts, streamlining tasks for developers. Installation is simplified through environment variables and easy function calls, offering precise and structured data effortlessly.
scattertext
Utilize Scattertext for interactive HTML scatter plots that highlight distinguishing terms in text corpora. This browser-based tool aids in analyzing word and phrase variations across different categories like political parties, providing insights with clear labels. Benefit from advanced plotting techniques, original term importance formulas, and comprehensive dispersion metrics. Perfect for text analysis applications in various domains, including political conventions, this tool supports customizable visualizations for effective data interpretation, suitable for comparing term usage in diverse document sets.
facetorch
Facetorch is a Python library designed for efficient face detection and facial feature analysis through deep neural networks. It combines open-source models optimized with TorchScript, facilitating configurability with Hydra, and supports reproducibility via conda-lock and Docker. Both CPU and GPU acceleration are available. Users can expand its features by adding models and configurations, utilizing components like detection and prediction. Facetorch is suitable for applications from facial recognition to emotion detection, easily installed via PyPI or Conda. Ethical use is recommended in accordance with European Commission guidelines.
pedalboard
This is a versatile Python library for audio processing, supporting popular formats and VST3®/Audio Unit plugins. Developed by Spotify, it enhances machine learning and simplifies content creation by applying advanced audio effects, compatible across macOS, Windows, and Linux, with seamless TensorFlow integration for fast processing.
ollama-python
The Ollama Python Library facilitates easy integration of chat functionalities into Python 3.8+ projects via Ollama REST API. By using function calls like chat and generate, developers can interact with models such as 'llama3.1'. The library also supports real-time streaming, custom clients, and asynchronous operations for improved application performance. Easily installable with pip, it provides efficient tools to manage, modify, and control chat models.
DnaFeaturesViewer
DnaFeaturesViewer is a Python library that visualizes DNA features from GenBank, GFF files, and Biopython SeqRecords. It produces clear plots for complex sequences with overlapping features and long labels. Compatible with Matplotlib and Biopython, it supports output formats like PNG, JPEG, SVG, and PDF, suitable for reports and scientific illustrations. With simple PIP installation and integration options for Bokeh and Pandas, it aids in genomic data visualization. It also allows plotting of nucleotide sequences, translations, and feature statistics such as GC content.
lhotse
Lhotse, a Python library, enhances speech and audio data preparation by offering flexible and accessible solutions. It smoothly integrates with PyTorch and supports both novice and seasoned users with its command-line interface and standardized data preparation methods. Lhotse's features include dynamic audio cuts for real-time operations like mixing and truncation, optimizing storage and bandwidth usage. It allows for data augmentation and feature extraction in both pre-computed and real-time modes, supports feature-space cut mixing, and works with Kaldi and ESPnet frameworks, making it a valuable tool for researchers and developers in audio processing.
ddddocr
DdddOcr is an open-source SDK that provides offline captcha recognition with minimal dependencies, designed to lower setup and usage costs. This tool leverages deep neural network training with random data to offer features such as character recognition and target detection, and accepts custom models from the dddd_trainer. Compatible with Windows, Linux, and macOS, it includes detailed deployment instructions for installation via PyPI or directly from the source.
tensorly
TensorLy, a comprehensive Python library, simplifies tensor computations and learning, effectively catering to both academic and practical needs. It integrates smoothly with NumPy, PyTorch, TensorFlow, and JAX, allowing flexible computations across various platforms. Supporting diverse tensor formats and decompositions, TensorLy offers a practical solution for machine learning and data analysis without exaggeration. Discover its features to optimize tensor operations efficiently.
harvesters
Harvester is a Python library that facilitates image acquisition in computer vision tasks, with an efficient data collection process. It supports image acquisition from GenTL Producers and allows manipulating GenICam feature nodes in Python. Harvester is licensed under Apache License-2.0, making it suitable for various uses, including commercial applications. Additionally, it supports multiple transport layers, adapting to different needs, and is linked to a GUI project for enhanced user interaction.
llm
This Python library and CLI tool facilitates interaction with Large Language Models via remote APIs or local installations. Key features include command-line prompts, SQLite storage, and embedding generation, with support for model expansion through plugins. Comprehensive documentation ensures streamlined usage and integration.
fastembed
FastEmbed is a Python library for generating text and image embeddings. It supports various popular models and uses ONNX Runtime instead of PyTorch, which is optimized for serverless environments and provides significant speed and accuracy improvements over competitors like OpenAI's Ada-002. The library can be installed via pip, with GPU support if needed, and is suitable for large datasets using data parallelism. FastEmbed supports multiple embeddings types including dense, sparse, and late interaction models, and integrates with Qdrant.
gTTS
gTTS allows Python users to leverage Google Translate's text-to-speech through simple CLI commands or library integration. Generate 'mp3' files with adjustable sentence tokenization and pronunciation for natural sound. Pip-installable for straightforward access with no Google Cloud requirements. For comprehensive usage, refer to the extensive documentation and join community discussions.
nitrain
Nitrain is an adaptable AI framework focused on medical imaging, offering streamlined model training and data augmentation across leading platforms such as Torch, TensorFlow, and Keras. It features intuitive defaults and high-level abstractions for easier use. Access comprehensive tutorials for medical imaging AI model integration and explore advanced Python techniques for improved medical visualization, catering to AI researchers and healthcare professionals aiming to enhance diagnostic capabilities.
UnlimitedGPT
UnlimitedGPT is a robust Python library that serves as an alternative to the OpenAI paid API for ChatGPT. It offers a range of features such as proxy support, session and user data management, and Cloudflare anti-bot bypass. Compatible with multiple operating systems including Windows, Linux, macOS, and headless servers like Google Colab, UnlimitedGPT is a flexible tool for streamlining ChatGPT automation and customization, making it suitable for developers aiming to enhance their AI interactions.
tetos
TeToS offers a streamlined Python library for integrating multiple Text-to-Speech (TTS) providers, including Google, Azure, and OpenAI, allowing for easy customization of output with various providers, languages, and voices via command-line or API. Installation is simple with Python 3.8 or newer. The library accommodates proxy settings, enhancing its utility across different network setups, and will eventually support SSML. Currently, its functionality is available under the Apache License 2.0.
pycantonese
Explore PyCantonese, a Python library specializing in Cantonese linguistics and NLP. It offers tools including corpus access, Jyutping conversion, text parsing, stop words filtering, word segmentation, and part-of-speech tagging. Perfect for research and programming, PyCantonese also provides consulting and training services for educational and commercial sectors. Stay informed about updates and engage through social media.
TextDescriptives
TextDescriptives is a Python library designed for calculating text metrics using spaCy v.3 components, providing a new API for enhanced analysis with metrics like quality, readability, and coherence. It features a code-free web application, ensuring seamless integration with spaCy pipelines for detailed analysis. Comprehensive documentation and tutorials support efficient use of the library.
pygod
PyGOD is a Python library for graph outlier detection, featuring over 10 algorithms for anomaly detection in networks and security systems. Utilizing PyTorch and PyTorch Geometric bases, it provides a smooth API, comprehensive documentation, and versatile examples for node, edge, and graph-level tasks. PyGOD is designed for scalability and efficiently handles large graphs through mini-batch processing and sampling, enabling rapid outlier detection with concise code. The ongoing updates offer optimal performance and solid community backing from renowned research institutions.
moto
Moto aids developers in efficiently simulating AWS services, improving test accuracy without real AWS expenses. This Python library covers a wide array of AWS features, enabling accurate scenario simulations. Moto's capabilities help ensure seamless AWS interaction, boosting reliability and minimizing test complexities. Access comprehensive documentation and community resources for optimal use and contribution to this open-source project.
Gymnasium
Gymnasium provides a consistent API for reinforcement learning, building on OpenAI's Gym by offering diverse environments like Classic Control and Atari. Explore flexible installations and related libraries like CleanRL for comprehensive learning toolkits.
huggingface_hub
The huggingface_hub library serves as an official Python client for seamless interactions with the Hugging Face Hub. This platform is committed to democratizing open-source machine learning by enabling users to discover, access, and exchange pre-trained models, datasets, and applications. Its core functionalities include efficient file management with downloading and uploading capabilities, comprehensive repository management, and running model inferences. The library also fosters community collaboration and supports integration with other machine learning libraries, offering free model hosting, versioning, and serverless API deployment. Dive into its features to enhance your machine learning projects and engage with an active ecosystem.
gan
TF-GAN is a versatile library that simplifies the process of training and evaluating Generative Adversarial Networks (GANs). Easily installable via pip, it offers seamless integration with existing workflows through TF-GAN calls, custom scripts, or other frameworks. Its modular components include Core for foundational training support, Features for standard GAN operations, Losses such as Wasserstein, and Evaluation metrics like Inception Score and Frechet Distance. TF-GAN is utilized in various Google projects and supports numerous research initiatives. It accommodates different GAN configurations and offers flexibility in model training, making it accessible for a broad audience, from academic researchers to industry professionals.
stanza
Stanza is a Python NLP library that offers comprehensive support for processing over 60 languages, including named entity recognition and syntactic analysis. It integrates with Java Stanford CoreNLP for efficient text processing and dependency graph manipulation. The library now includes specialized biomedical and clinical models for advanced text analysis. Stanza is easy to install using pip or Anaconda and provides interactive learning options through Google Colab. Users can also train custom models, thus enhancing the adaptability of NLP tasks.
FlashRank
FlashRank is a lightweight and fast Python library that enhances search pipeline efficiency with state-of-the-art re-ranking features. It supports pairwise and listwise re-rankers using LLMs and cross-encoders, optimizing ranking precision with compact models starting at 4MB. It's designed to work in serverless environments, providing competitive, resource-efficient performance across various languages.
tomotopy
Explore Tomotopy: a Python extension for the Gibbs-sampling based library Tomoto, optimized for fast and efficient topic modeling. With SIMD instruction support and compatibility across Linux, macOS, and Windows, Tomotopy offers LDA models like Labeled LDA and Supervised LDA. It facilitates faster iterations using multicore CPUs. Installation is straightforward, with options to save and load models easily. The interactive model viewer provides visual analysis, complemented by comprehensive documentation and robust features suitable for NLP research and applications.
pyntcloud
Pyntcloud is a Python 3 library that simplifies 3D point cloud processing through the Python scientific stack. It offers efficient methods to load, manipulate, and convert 3D data. Features include RGB to HSV conversion, voxel grid creation, and integration with Open3D and PyVista. Ideal for researchers and developers, it facilitates advanced 3D tasks. Installation is available via Conda or Pip, ensuring a straightforward setup.
hazm
Hazm is a fundamental tool for Persian text processing, offering features such as normalization, tokenization, and lemmatization. It includes robust tools for POS tagging, dependency parsing, and effective word embedding. The toolkit supports linguists and developers with comprehensive pre-trained models and easy access to documentation, providing a scalable solution for both research and practical applications in Persian NLP.
DataProfiler
DataProfiler is a Python library that transforms data analysis and sensitive data detection. It supports file types such as CSV, JSON, and Parquet, and efficiently loads them into Pandas DataFrames. The library excels in profiling data, recognizing schema, statistics, and sensitive data elements like PII/NPI. Featuring a straightforward setup and a pre-trained deep learning model, it offers flexibility for adding new entities or pipelines for entity recognition. Ideal for automated data monitoring and generating comprehensive reports, DataProfiler integrates seamlessly into various workflows, offering valuable insights.
augraphy
Augraphy provides a Python-based augmentation pipeline simulating realistic document distortions for AI/ML training. Its customizable features assist in creating varied synthetic data crucial for tasks like OCR and document classification.
pynlpl
PyNLPl serves as a comprehensive Python library supporting a range of natural language processing tasks, from basic n-gram extraction to sophisticated language model building. It features a variety of data structures, algorithms, and format parsers, notably excelling in FoLiA XML processing. Compatible with both Python 2.7 and 3, it includes modules for data evaluation, format parsing, search algorithms, and statistical analysis. Installation is straightforward via pip or available on select Linux distributions.
financial-datasets
The Financial Datasets library enables the creation of question-answer sets from documents such as 10-K, 10-Q, and PDFs using advanced Large Language Models (LLMs). This Python-based open-source library, featuring models like gpt-4-turbo, facilitates the production of realistic datasets from varied financial texts. It's ideal for analysts and developers wanting to enhance their analytical projects with tailored information. The library supports easy installation via pip or Poetry, with opportunities for community contributions.
bravado-core
Bravado-core provides client and server-side support for OpenAPI Specification v2.0, featuring schema validation, request/response transformation, and customization through Python models. The library is easily installed via pip and integrates smoothly with projects like 'bravado' and 'pyramid-swagger'. Developers are encouraged to contribute through forking and utilize tools such as Sphinx, virtualenv, and tox for enhanced development workflows.
camelot
Camelot is a Python library that extracts tables from PDFs with high accuracy and customizable settings, outputting to formats like CSV and Excel. Easily integrate with data workflows and install via conda or pip. Comprehensive documentation supports users in achieving precise data extraction.
ice
The Interactive Composition Explorer (ICE) is a Python library enabling language model program analysis through execution trace visualization. Key features include multiple recipe modes, browser-based debugging, and language model agent creation. ICE allows parallel execution and component recipe reuse for tasks such as question-answering. Note: The API is actively evolving, may undergo changes, and is compatible with Python 3.9 and above, requiring a virtual environment and WSL for Windows.
adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) is an extensive Python library designed to help developers and researchers enhance machine learning models' defense against adversarial attacks, including evasion, poisoning, extraction, and inference. Sponsored by the Linux Foundation AI & Data Foundation, ART supports major ML frameworks like TensorFlow, Keras, PyTorch, etc., and caters to varied data types and tasks. It acts as a crucial resource for security teams, facilitating evaluation and defense mechanisms to strengthen AI security across diverse applications. Continuous updates and community support keep it leading in machine learning security.
prompt-layer-library
PromptLayer is designed for prompt engineers, providing a platform to track, manage, and share GPT prompt activities efficiently. Acting as middleware with OpenAI's Python library, it records API requests for easy exploration in the dashboard. With a Python wrapper, it facilitates integration and enables efficient tracking with tags and REST API support for multilingual use.
agency
Agency provides a Python framework utilizing the Actor model for flexible agent integration with existing software systems. It features an intuitive API with comprehensive documentation, supporting concurrency through multiprocessing, multithreading, and AMQP for networked agents. The library emphasizes detailed logging and control via observability tools and access policy features. A demo illustrates integration with tools like Gradio UI and Docker using examples from OpenAI and HuggingFace, catering to those developing or experimenting with custom agent-based applications.
quickai
QuickAI facilitates testing of complex Machine Learning models with minimal Python code. It supports diverse architectures like EfficientNet, VGG, ResNet, YOLO, and GPT-NEO for tasks such as image classification, NLP, and object detection. Compatible with TensorFlow and PyTorch, and offering Docker for easy setup, it simplifies development by reducing code length and enhancing productivity.
openai-multi-client
The openai-multi-client Python library facilitates efficient handling of multiple requests to the OpenAI API with built-in retry mechanisms. It retains synchronous application code while effectively managing concurrent requests and potential API failures. Particularly useful for projects that handle a high volume of OpenAI API requests, it supports both ordered and unordered handling and offers customizable retry settings. This library simplifies API interactions without the complexity of managing concurrency.
audiomentations
Audiomentations is a Python library providing audio data augmentation tools for deep learning models. It operates on CPUs and supports mono and multichannel audio, adapting to frameworks such as Tensorflow/Keras and Pytorch. The library offers functions like Gaussian noise addition, pitch shift, and time stretch, vital for optimizing audio-based AI systems. Widely recognized for its success in Kaggle competitions, it is a preferred tool among top audio tech companies. The comprehensive documentation and examples ensure ease of integration and application in diverse projects.
dvclive
DVCLive, a Python library, facilitates the logging of machine learning metrics and metadata in simple file formats. It integrates seamlessly with DVC, eliminating the requirement for extra services or servers to track experiments. Data is recorded as plain text files, allowing for versioning with Git or tracking in DVC storage, providing flexible data management. Metrics visualization is possible via DVC CLI or tools like the VS Code DVC Extension. Furthermore, its compatibility with platforms such as PyTorch Lightning and Scikit-learn underscores its adaptability across various machine learning environments.
meerkat
Meerkat is an open-source Python library tailored for visualizing and annotating unstructured datasets like text and images. It integrates seamlessly with Pandas and SQL, offers diverse visualization tools, and supports machine learning models integration. Perfect for exploratory data analysis and model behavior assessment, Meerkat is developed by Stanford's Hazy Research lab but is less suited for structured data and large-scale labeling operations.
Auto1111SDK
The Auto1111SDK is a lightweight Python library for image creation and enhancement using Stable Diffusion models. It provides user-friendly integration with the Automatic 1111 Web UI parameters for seamless Text-to-Image and Image-to-Image transformations, Inpainting, and Outpainting. Supporting ESRGAN upscaling and model downloads from Civit AI, the SDK plans to incorporate additional features such as Dreambooth training and face restoration, offering developers a comprehensive tool for efficient image processing.
einx
Explore a versatile Python library that simplifies tensor operations across frameworks like Numpy, PyTorch, Jax, and Tensorflow. Drawing inspiration from Einstein notation, it features unique concepts like full composability and bracket notation. The library allows just-in-time compilation, offering smooth integration with existing code for optimized performance. Suitable for neural network operations including layer normalization and multi-head attention, it is excellent for advanced computational tasks.
clipsai
ClipsAI, an open-source Python library, efficiently converts long, audio-focused videos into short clips suitable for podcasts, interviews, and speeches. It uses advanced transcription to identify key segments for clips and dynamically resizes video to focus on speakers, easily changing footage from 16:9 to 9:16 ratios. Discover detailed documentation and view live demonstrations.
tnlearn
Tnlearn is an open-source Python library designed to optimize neural networks using symbolic regression to create task-based neurons. It constructs neural networks with diverse neuron types for improved feature representation and task adaptation, inspired by human brain diversity. Key features include vectorized symbolic regression and learnable parameter functions. Compatible with Python 3.9+, Tnlearn is easily installed with pip or conda, aiding efficient machine learning model development.
Feedback Email: [email protected]