Introduction to arxiv_latex_cleaner
The arxiv_latex_cleaner
is a handy tool designed to streamline the preparation of LaTeX code for submission to the academic repository arXiv. It automates the cleaning process, reducing both the size of the files and removing unnecessary content, thus making your submission ready with just a few commands. Here's how it works and what it has to offer.
Usage Example
Using the tool is straightforward. With a simple command, you can initiate the cleaning process:
arxiv_latex_cleaner /path/to/latex --resize_images --im_size 500 --images_allowlist='{"images/im.png":2000}'
You can also use a configuration file for a more customized cleaning process:
arxiv_latex_cleaner /path/to/latex --config cleaner_config.yaml
Installation
Installing arxiv_latex_cleaner
is user-friendly and can be done through various methods:
-
Via pip (Python's package installer):
pip install arxiv-latex-cleaner
-
Using Homebrew on MacOS:
brew install arxiv_latex_cleaner
-
From the source code:
git clone https://github.com/google-research/arxiv-latex-cleaner cd arxiv-latex-cleaner/ python setup.py install
Note that this tool requires Python 3.9 or later.
Key Features
Privacy-Oriented Features
One of the primary concerns the tool addresses is privacy. It performs the following tasks:
- Deletes auxiliary files (
.aux
,.log
) which are unnecessary for the submission. - Removes comments within the code, ensuring that any notes or remarks are not visible on arXiv.
- Allows user-defined command deletions to clean up the code further.
- Supports custom regex replacement rules, which can be specified through a configuration file.
Size Management
Given arXiv's 50MB submission limit, the tool helps manage file sizes by:
- Removing unused
.tex
files and images. - Offering the option to resize images to a specified dimension.
- Compressing PDF files to reduce their sizes.
- Providing the option to allowlist certain images that shouldn't be resized.
TikZ Picture Concealment
For users incorporating technical illustrations using the TikZ package, arxiv_latex_cleaner
offers a feature to convert TikZ code into external PDF files. This prevents exposure of raw data by:
- Replacing the TikZ environment with a command to include externally compiled PDFs.
- Ensuring only specified TikZ diagrams are externalized, identified by a command preceding the environment.
Advanced Pattern Replacement
The tool supports sophisticated LaTeX pattern replacement using regular expressions. This feature is particularly useful for custom command removal:
- By defining patterns and corresponding replacements, certain LaTeX commands can be transformed or simplified.
- This ensures the final document adheres to plain LaTeX standards suitable for arXiv submission.
Basic Commands and Arguments
arxiv_latex_cleaner
offers a variety of optional arguments to cater to diverse requirements, allowing adjustments in image resizing, PDF compressions, and customized cleaning strategies through commands like --resize_images
, --im_size
, and --compress_pdf
.
Testing
To ensure the tool works correctly, users can run:
python -m unittest arxiv_latex_cleaner.tests.arxiv_latex_cleaner_test
Conclusion
Overall, arxiv_latex_cleaner
is a versatile utility that simplifies the task of cleaning LaTeX projects for arXiv submissions. By automating the cleanup process, it helps researchers focus on their content without worrying about minor formatting and compliance issues. This tool, although developed under Google's auspices, is not an officially supported Google product.