arxiv.py: Simplifying Access to Academic Papers
Overview
The arxiv.py project provides a Python wrapper for the arXiv API, making it easier for developers and researchers to access and manipulate academic papers from a wide range of fields such as Physics, Mathematics, Computer Science, and more. The arXiv platform is maintained by the Cornell University Library and offers over a million articles available for open access. By using arxiv.py, interacting with this vast repository becomes seamless, enabling users to integrate paper fetching and downloading functionalities directly into their Python applications.
Installation
Getting started with arxiv.py is straightforward. The package can be easily installed using pip:
$ pip install arxiv
Once installed, users can import the library into their Python scripts with:
import arxiv
Key Features and Usage
arxiv.py offers a range of features that simplify interactions with the arXiv API:
Fetching Results
To fetch results, users can use the Client
and Search
classes. Below is an example of how to retrieve the ten most recent articles related to "quantum":
import arxiv
client = arxiv.Client()
search = arxiv.Search(
query = "quantum",
max_results = 10,
sort_by = arxiv.SortCriterion.SubmittedDate
)
results = client.results(search)
for r in results:
print(r.title)
Users can also perform more advanced queries or search for specific papers by ID.
Downloading Papers
arxiv.py provides functionality for downloading paper PDFs or source files (.tar.gz). This is done using the Result
object returned by searches. Here's a quick example:
import arxiv
paper = next(arxiv.Client().results(arxiv.Search(id_list=["1605.08386v1"])))
paper.download_pdf(filename="downloaded-paper.pdf")
paper.download_source(filename="downloaded-paper.tar.gz")
Advanced Features
Custom Clients
For users needing more control, arxiv.py allows customization of the Client
to adjust pagination and retry settings:
big_slow_client = arxiv.Client(
page_size = 1000,
delay_seconds = 10.0,
num_retries = 5
)
Logging
To closely monitor the library’s network behavior and API interactions, users can configure a DEBUG
-level logger:
import logging, arxiv
logging.basicConfig(level=logging.DEBUG)
client = arxiv.Client()
paper = next(client.results(arxiv.Search(id_list=["1605.08386v1"])))
Summary
arxiv.py offers a powerful set of tools for accessing and managing papers from the arXiv repository through Python scripts. With its user-friendly interface, comprehensive features for searching and downloading papers, and provision for advanced customization, it caters to both simple and complex user needs. Whether one is a researcher looking to automate literature reviews or a developer integrating academic data into an application, arxiv.py proves to be an invaluable resource.