Introduction to the Wiseflow Project
Wiseflow, also known as the "Chief Intelligence Officer," is an innovative tool designed for agile information extraction. It sifts through massive amounts of data from various information sources such as websites, WeChat public accounts, and social media platforms. The goal is to distill valuable insights by filtering out the noise. This helps in making meaningful information stand out, making it a valuable asset for users inundated with data overload.
Key Features of Wiseflow
-
Universal Web Content Parser: Wiseflow employs a combination of statistical learning, using the open-source project GNE, and Language Model (LLM) capabilities to cover over 90% of news sites effectively. It includes a specialized parser for WeChat public account articles but requires wxbot for real-time article fetching.
-
Asynchronous Task Architecture: This ensures seamless handling of multiple tasks without blocking other operational processes.
-
LLM-Powered Information Extraction: Wiseflow uses language models as small as 9B to efficiently execute its tasks of extracting relevant information and tagging.
Recent Updates in Version 0.3.1
In this version, enhancements have been made to address challenges in extracting complex or specific tags, and an explanation field has been introduced for further tag specification. Additionally, language selection issues observed in the previous versions have been streamlined, simplifying deployment and usage with support for both Simplified Chinese and English prompts.
How Wiseflow Differs from Traditional Tools
Feature | Wiseflow | Crawler / Scraper | LLM-Agent |
---|---|---|---|
Problem Solved | Data processing (filtering, extracting, tagging) | Raw data acquisition | Downstream applications |
Integration | Can integrate crawlers for enhanced data acquisition | Can serve as a dynamic knowledge base |
Integrating Wiseflow into Your Application
Wiseflow is lightweight and efficient, operating seamlessly on systems using LLMs as small as 7B to 9B without requiring vector models. It stores extracted data in its integrated Pocketbase database. Users can access this data directly, without needing to delve into the code. Developers interested in using Wiseflow as a real-time info processing tool can refer to the demonstration project, Awada.
Installation and Usage
-
Clone the Repository
git clone https://github.com/TeamWiseFlow/wiseflow.git cd wiseflow
-
Docker Installation (recommended)
docker compose up
Ensure network settings are fine-tuned as necessary.
-
Manual Python Installation (alternative)
conda create -n wiseflow python=3.10 conda activate wiseflow cd core pip install -r requirements.txt
-
Configure Your Environment Please configure your
.env
file with necessary tokens and API keys.
For Developers: More detailed steps can be found in the /core/README.md
documentation.
Setting Up Monitoring Channels and Scheduled Scans
Through the Pocketbase Admin Dashboard, users can manage their tags and site sources to control the information extracted and processed by Wiseflow.
Local Deployment Considerations
Wiseflow can be locally deployed using systems with a minimum configuration of an RTX 3090, demonstrating its low overhead. It's flexible and can integrate LLM services that comply with the OpenAI SDK.
Conclusion
Wiseflow is an open-source project licensed under Apache 2.0. It is designed to be resource-efficient while providing powerful data processing capabilities, making it an ideal choice for those looking to streamline their information mining operations.
For more information or collaboration inquiries, reach out via GitHub issues.
Acknowledgments
Wiseflow is supported by several open-source projects like GeneralNewsExtractor and Python Pocketbase. Users incorporating Wiseflow in their work are encouraged to cite the project appropriately.