SceneGraphParser
SceneGraphParser (sng_parser
) is a Python toolkit designed to transform natural language sentences into scene graphs, a type of symbolic representation, using dependency parsing. It takes inspiration from the Stanford Scene Graph Parser, but distinguishes itself by being exclusively written in Python. This parser offers an accessible user interface and straightforward configuration options. Its primary function is to convert sentences into graphs, where the nodes represent nouns (accompanied by modifiers like descriptors or articles) and the edges indicate relationships between these nouns.
Unique Aspects and Development
SceneGraphParser is distinct from its Stanford counterpart in several ways. Most notably, it operates entirely within Python, making it more accessible for those familiar with this programming language. The project is still receiving updates, meaning that its application programming interfaces (APIs) may undergo significant changes. The developers actively seek community involvement to identify and rectify shortcomings or edge cases encountered in the parsing process.
The development of this project is part of efforts for the presentation titled Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations, which was presented at the Conference on Computer Vision and Pattern Recognition (CVPR) in 2019.
Installation and Setup
Installing SceneGraphParser is straightforward using the Python package manager, pip. It requires the spaCy library as a backend, which needs to be installed separately for parsing English:
pip install SceneGraphParser
python -m spacy download en # to use the parser for English
Using SceneGraphParser
The toolkit's functionality can be easily accessed through the parse
function. Although it is designed to work with multiple backends, it currently supports only spaCy. Here's an example of how it functions:
import sng_parser
graph = sng_parser.parse('A woman is playing the piano in the room.')
The resulting graph can be visualized using Python's pprint
module:
from pprint import pprint
pprint(graph)
This example would output a structured representation of the sentence, identifying entities such as 'woman,' 'piano,' and 'room,' and relationships such as 'playing' and 'in.' Additionally, SceneGraphParser provides a feature for tabular visualization of the graph:
sng_parser.tprint(graph)
For those who need a tailored parser, the toolkit allows customization of its settings:
import sng_parser
parser = sng_parser.Parser('spacy', model='en')
graph = parser.parse('A woman is playing the piano in the room.')
Graph Structure
Scene graphs created by SceneGraphParser are represented using Python's native dict
and list
structures. This approach, while flexible, ensures that the toolkit can be seamlessly integrated into any Python-based project. The generated graphs consist of entities, which are noun phrases, and relations, which depict how these nouns interconnect. Each entity contains information about its span, lemmatized version, head noun, and any modifiers. Relations indicate the subject and object, often referred to as the "head" and "tail" in scholarly settings.
Overall, SceneGraphParser provides a versatile, user-friendly tool for converting natural language into structured, interpretable visual representations, suitable for integration into diverse applications, especially those bridging visual and linguistic elements.