FedScale - Scalable Open-Source Federated Learning with Extensive Datasets

Introduction to FedScale

FedScale is a powerful and flexible open-source platform designed for federated learning (FL), a modern machine learning approach where decentralized data is utilized. This innovative project makes it easier for developers and researchers to implement FL algorithms, as well as to deploy and evaluate these models across various hardware and software environments. FedScale also offers the most extensive FL benchmark available, containing a wide array of tasks like image classification, object detection, language modeling, and speech recognition.

Getting Started

Quick Installation on Linux

To get started quickly on a Linux system, you can run the install.sh script, which automates the installation process. If you're working with CUDA, you can simply append --cuda to the installation command.

source install.sh # Add `--cuda` if needed 
pip install -e .

Installation from Source on Linux/MacOS

If you prefer a more customized setup or are using MacOS, you can install FedScale from source. This requires Anaconda to be installed:

Navigate to your FedScale directory.
Set the FEDSCALE_HOME environment variable and create a handy alias.
Initialize your Conda environment and activate it.
Finally, install any additional necessary packages and setup GPU support if required.

cd FedScale

FEDSCALE_HOME=$(pwd)
echo export FEDSCALE_HOME=$(pwd) >> ~/.bashrc 
echo alias fedscale=\'bash $FEDSCALE_HOME/fedscale.sh\' >> ~/.bashrc 
conda init bash
. ~/.bashrc

conda env create -f environment.yml
conda activate fedscale
pip install -e .

Tutorials

Once the installation is complete, you can dive into FedScale through a series of tutorials:

Explore FedScale datasets – Learn about the different datasets available within FedScale.
Deploy your FL experiment – Understand how to deploy a federated learning experiment.
Implement an FL algorithm – Try implementing a federated learning algorithm using FedScale.
Deploy FL on smartphones – Discover how to leverage FedScale for deploying FL on mobile devices.

FedScale Datasets

FedScale comprises over 20 large-scale, diversified datasets suitable for federated learning tasks. It spans multiple domains such as computer vision and natural language processing, among others. For each dataset, training, validation, and testing subsets are provided to ensure a comprehensive setup for model development and assessment. Contributors to these datasets are acknowledged, and users are encouraged to explore and contribute further.

FedScale Runtime

The FedScale Runtime is a robust platform for both deploying and evaluating federated learning models. Building on FedScale's predecessor, Oort, this runtime efficiently scales FL experiments to include thousands of clients per round. Comprehensive documentation aids users in setting up training scripts and deploying models effectively, even on mobile devices.

Repository Structure

For those interested in more technical aspects, the FedScale repository is organized into key sections including the core source code, deployment tools, benchmarking datasets, example configurations, and documentation.

References

FedScale has been recognized in various academic conferences, and more details can be found in papers presented at the International Conference on Machine Learning (ICML) and the USENIX Symposium on Operating Systems Design and Implementation (OSDI).

Contributions and Communication

FedScale invites contributions from the community. Users can engage by submitting issues or pull requests on GitHub. For communication and support, there is an active Slack channel, or users can reach out via email for any questions or feedback.

FedScale represents an exciting opportunity to advance the field of federated learning, with its extensive resources, supportive community, and focus on scalability and extensibility.