Introduction to TonY
TonY is a specialized framework designed to execute deep learning tasks directly on Apache Hadoop. It supports popular machine learning libraries, including TensorFlow, PyTorch, MXNet, and Horovod. TonY enables both single-node and distributed training to function efficiently as Hadoop applications. This native integration, paired with its other features, strives to make machine learning tasks both reliable and flexible.
Compatibility
TonY is compatible with Hadoop versions 2.6.0 and above. For those requiring GPU isolation, Hadoop versions 2.10 or higher for Hadoop 2, and 3.1.0 or higher for Hadoop 3 are necessary.
Building TonY
To build TonY, the framework utilizes Gradle. The build process can be initiated using:
./gradlew build
To build without running tests, this command can be used:
./gradlew build -x test
After building, the required jar file will be located in the ./tony-cli/build/libs/
directory.
Usage
TonY offers two main methods for launching deep learning jobs:
-
Zipped Python Virtual Environment: This method doesn't require setting up Docker support on the Hadoop cluster and avoids dependency on a Docker registry. However, the cluster and the Python environment need the same OS version.
-
Docker Container: Requires a Docker-supported Hadoop cluster, where a Docker image is prepared with necessary Python dependencies such as TensorFlow or PyTorch.
Zipped Python Virtual Environment
With this setup, you prepare a zipped virtual environment and an XML configuration file (tony.xml
). Here's a basic configuration example:
<configuration>
<property>
<name>tony.worker.instances</name>
<value>4</value>
</property>
<property>
<name>tony.worker.memory</name>
<value>4g</value>
</property>
<property>
<name>tony.worker.gpus</name>
<value>1</value>
</property>
<property>
<name>tony.ps.memory</name>
<value>3g</value>
</property>
</configuration>
Execute the job using the Java command line with components like the path to the Python environment and scripts.
Docker Container
In this configuration, you need a Docker image with necessary dependencies. Configuration involves tony.xml
similar to the following:
<configuration>
<property>
<name>tony.worker.instances</name>
<value>4</value>
</property>
<property>
<name>tony.worker.memory</name>
<value>4g</value>
</property>
<property>
<name>tony.worker.gpus</name>
<value>1</value>
</property>
<property>
<name>tony.ps.memory</name>
<value>3g</value>
</property>
<property>
<name>tony.docker.enabled</name>
<value>true</value>
</property>
<property>
<name>tony.docker.containers.image</name>
<value>YOUR_DOCKER_IMAGE_NAME</value>
</property>
</configuration>
TonY Arguments
TonY offers several command-line arguments to manage training jobs, like specifying the script entry point, source directories, Python environments, and more.
TonY Configurations
Configurations for TonY jobs can be set either in an XML file or directly via the command line, with options to override settings.
Examples and Resources
Examples of distributed deep learning tasks with TensorFlow, PyTorch, and other frameworks are available. Additional resources such as presentations and papers provide deeper insights into TonY's capabilities and applications.
TonY is an open-source endeavor aimed at leveraging Hadoop's potential for deep learning, providing flexibility, and simplifying the integration of machine learning processes into cloud-based infrastructures.