What is Hopsworks?
Hopsworks is a sophisticated data platform designed to enhance machine learning (ML) operations. At its core is a Python-centric Feature Store that is highly useful for managing ML activities. Hopsworks is modular, meaning users have the flexibility to use it solely as a Feature Store, or to manage and deploy models, or to operate feature and training pipelines. It is a platform that fosters collaboration among ML teams, providing a secure and regulated environment for the development, management, and sharing of various machine-learning assets such as features, models, training data, and much more.
Quickstart
APP - Serverless (Beta) To quickly start using Hopsworks, one can utilize its serverless application by visiting app.hopsworks.ai and registering with a Gmail or GitHub account. This method allows new users to explore tutorials and experience the platform firsthand before tackling more advanced installations.
Azure, AWS & GCP Hopsworks offers a cloud-based managed service that integrates seamlessly with AWS, Azure, and GCP environments. It also connects effortlessly with third-party platforms like Databricks, SageMaker, and KubeFlow. Detailed setup guides are available for each cloud provider:
Installer - On-premise Hopsworks can also be operated on-premises, allowing companies to harness their infrastructure for running machine learning workloads. This setup offers enhanced flexibility, control, and potential cost savings, as well as compliance with specific security requirements. Each on-premises setup is unique and tailored to fit existing infrastructure and requirements. Key specifications include a Centos/RHEL 8.x or Ubuntu 22.04 server with minimum hardware and software requirements.
Documentation and API
Hopsworks provides extensive documentation and API resources to assist users. The documentation includes user guides, feature store documentation, and administrative guidelines. Moreover, the API documentation covers various categories such as Hopsworks API, Feature Store API, and MLOps API, detailing project-level operations, feature management, and model deployment.
Tutorials
Hopsworks offers a range of tutorials to help users get started with different use cases. To access these tutorials, an account on app.hopsworks.ai is required. Examples include fraud detection and churn prediction models, available with step-by-step guidance in a dedicated repository.
Main Features
Project-based Multi-Tenancy and Team Collaboration Hopsworks enhances collaboration by providing project-based workspaces where ML teams can securely share and manage assets. Its unique model allows sensitive data storage within a shared infrastructure while maintaining strict asset-sharing controls. Projects can facilitate team structuring with clear responsibilities and can be tailored to different environments like development, staging, and production.
Development and Operations The platform offers tools for data science development, including conda environments for Python and Jupyter notebooks. With built-in support for Airflow, users can create production pipelines and execute ML training pipelines with GPU support. Spark, Spark Streaming, and Flink programs are also supported, with adaptable worker provisioning in cloud environments.
Available on any Platform Whether in the cloud or on Linux-based virtual machines, Hopsworks is accessible, even for air-gapped data centers. Its availability as a serverless platform streamlines the management and serving of features and models.
Community
Hopsworks thrives on community involvement, inviting contributions to improve and expand the platform. Users can report bugs, suggest enhancements, and engage with the community via forums, a public Slack channel, and Twitter. Hopsworks is open-source under the AGPL-V3 license, encouraging freedom of use while requiring modifications to be shared under the same license.
Overall, Hopsworks offers a comprehensive suite of tools and resources for managing and optimizing machine learning lifecycle and operations, promoting a collaborative and efficient environment for innovative data solutions.