wiseflow - Efficiently Extract and Filter Information with Agile Data Mining Tools

Introduction to the Wiseflow Project

Wiseflow, also known as the "Chief Intelligence Officer," is an innovative tool designed for agile information extraction. It sifts through massive amounts of data from various information sources such as websites, WeChat public accounts, and social media platforms. The goal is to distill valuable insights by filtering out the noise. This helps in making meaningful information stand out, making it a valuable asset for users inundated with data overload.

Key Features of Wiseflow

Universal Web Content Parser: Wiseflow employs a combination of statistical learning, using the open-source project GNE, and Language Model (LLM) capabilities to cover over 90% of news sites effectively. It includes a specialized parser for WeChat public account articles but requires wxbot for real-time article fetching.
Asynchronous Task Architecture: This ensures seamless handling of multiple tasks without blocking other operational processes.
LLM-Powered Information Extraction: Wiseflow uses language models as small as 9B to efficiently execute its tasks of extracting relevant information and tagging.

Recent Updates in Version 0.3.1

In this version, enhancements have been made to address challenges in extracting complex or specific tags, and an explanation field has been introduced for further tag specification. Additionally, language selection issues observed in the previous versions have been streamlined, simplifying deployment and usage with support for both Simplified Chinese and English prompts.

How Wiseflow Differs from Traditional Tools

Feature	Wiseflow	Crawler / Scraper	LLM-Agent
Problem Solved	Data processing (filtering, extracting, tagging)	Raw data acquisition	Downstream applications
Integration	Can integrate crawlers for enhanced data acquisition	Can serve as a dynamic knowledge base

Integrating Wiseflow into Your Application

Wiseflow is lightweight and efficient, operating seamlessly on systems using LLMs as small as 7B to 9B without requiring vector models. It stores extracted data in its integrated Pocketbase database. Users can access this data directly, without needing to delve into the code. Developers interested in using Wiseflow as a real-time info processing tool can refer to the demonstration project, Awada.

Installation and Usage

Clone the Repository

git clone https://github.com/TeamWiseFlow/wiseflow.git
cd wiseflow

Docker Installation (recommended)
```
docker compose up
```
Ensure network settings are fine-tuned as necessary.

Manual Python Installation (alternative)

conda create -n wiseflow python=3.10
conda activate wiseflow
cd core
pip install -r requirements.txt

Configure Your Environment Please configure your .env file with necessary tokens and API keys.

For Developers: More detailed steps can be found in the /core/README.md documentation.

Setting Up Monitoring Channels and Scheduled Scans

Through the Pocketbase Admin Dashboard, users can manage their tags and site sources to control the information extracted and processed by Wiseflow.

Local Deployment Considerations

Wiseflow can be locally deployed using systems with a minimum configuration of an RTX 3090, demonstrating its low overhead. It's flexible and can integrate LLM services that comply with the OpenAI SDK.

Conclusion

Wiseflow is an open-source project licensed under Apache 2.0. It is designed to be resource-efficient while providing powerful data processing capabilities, making it an ideal choice for those looking to streamline their information mining operations.

For more information or collaboration inquiries, reach out via GitHub issues.

Acknowledgments

Wiseflow is supported by several open-source projects like GeneralNewsExtractor and Python Pocketbase. Users incorporating Wiseflow in their work are encouraged to cite the project appropriately.