Introducing Milvus: An Open-Source Vector Database
What is Milvus?
Milvus is a cutting-edge open-source vector database designed to drive embedding similarity search and support AI applications. Its primary purpose is to make searching through unstructured data both feasible and efficient, ensuring that users enjoy a seamless experience irrespective of the deployment environment.
Milvus 2.0, the latest version, is crafted as a cloud-native vector database, where it smartly separates storage and computation. This modern architecture ensures that all components are stateless, enhancing both elasticity and adaptability. For more insights on its architecture, one can refer to the Milvus Architecture Overview. Originally launched under the Apache License 2.0 in October 2019, it has gained recognition as a graduate project under the stewardship of the LF AI & Data Foundation.
Key Features
Millisecond Search on Trillion Vector Datasets
Milvus excels with response times measured in milliseconds even when dealing with datasets that contain trillions of vectors.
Simplified Unstructured Data Management
Milvus offers rich APIs tailored for data science activities, ensuring a consistent user experience on various platforms like personal laptops, local clusters, and the cloud. Moreover, it aids in embedding real-time search and analytical functions into almost any application.
Reliable and Always-On
Milvus's built-in replication and failover/failback mechanisms ensure that data and applications continue to operate smoothly even in case of disruptions.
Highly Scalable and Elastic
Thanks to its component-level scalability, Milvus can adjust the scale based on demand, ensuring resource distribution is both efficient and apt for the type of load.
Hybrid Search
Starting from version 2.4, Milvus introduced multi-vector support, allowing for hybrid searches. Users can incorporate several vector fields into a single collection, representing diverse data facets. The search results are then refined using reranking techniques such as Reciprocal Rank Fusion (RRF) and Weighted Scoring.
Unified Lambda Structure
Milvus integrates both stream and batch processing for data storage, achieving a balance between timeliness and efficiency. This unified interface simplifies vector similarity searches.
Community Supported, Industry Recognized
With more than 1,000 enterprise adopters and over 27,000 stars on GitHub, Milvus boasts a thriving open-source community. Its recognition as a graduate project by the LF AI & Data Foundation provides further institutional backing.
Quick Start
For those eager to jump in, Zilliz Cloud provides a fully managed cloud service and is the simplest way to experience Milvus. Users can embark on a free trial or delve into various installation guides such as:
Building Milvus from source code is detail-oriented, with specific system prerequisites for Linux and MacOS. More guidance can be found in the developer's documentation.
Milvus 2.0 vs. 1.x
Milvus 2.0 offers a substantial improvement over the 1.x series with its cloud-native and distributed architecture along with enhanced scalability. Additional comparisons can be viewed here.
Real-World Demos
Milvus showcases its versatility through various real-world applications:
- Image Search: Quickly retrieve the most similar images from extensive databases.
- Chatbots: Efficiently handle digital customer interactions.
- Chemical Structure Search: Perform rapid similarity, substructure, or superstructure searches for molecules.
Bootcamps and Contributing
Milvus bootcamps are curated to demonstrate the simplicity and depth of vector databases, offering insights into building applications like chatbots, recommendation systems, and more. Contributions are warmly welcomed, with guidelines available on how to get involved and support the community.
Join the Community
Milvus boasts over 400 contributors who collaboratively enhance the project. Participation details and more about community collaboration can be explored through the community repository.
In summary, Milvus stands as a pivotal tool in the era of AI and unstructured data, offering robust solutions for vector similarity searches and a thriving community for support and development.