Introduction to pptxtojson
pptxtojson is a JavaScript library designed to convert .pptx files into readable JSON data directly in the web browser. This library distinguishes itself from other PPTX file parsing tools by providing results that are easy to understand and work with, rather than simply converting XML content into complex JSON structures.
Key Differences
-
Browser-Based Operation: Unlike many parsing tools that run server-side, pptxtojson operates entirely within the browser, making it highly accessible and versatile for client-side applications.
-
Readable JSON Output: The conversion result is not just a simple translation of XML into JSON. Instead, it provides a meaningful and comprehensible JSON format that can be easily utilized for further data manipulation or analysis.
An online demo of pptxtojson can be accessed here.
Usage Scenarios
Originally developed as part of the PPTist project, pptxtojson serves as a reference implementation for importing .pptx files. However, the parsed data still has significant stylistic differences compared to the original PowerPoint file. Therefore, it's not yet ready for direct deployment in production environments where styling accuracy is crucial.
For those looking to extract text content and media resource information from PPT files, without a primary focus on precise layout and styling details, pptxtojson can be extremely useful.
Data Unit Note
All length values in the JSON output are given in pt
(points). This is a change from version 0.x, where the unit was px
(pixels).
Installation
To install pptxtojson, use the Node Package Manager (npm):
npm install pptxtojson
Basic Usage
Below is a simple example demonstrating how to use pptxtojson in an HTML document:
<input type="file" accept="application/vnd.openxmlformats-officedocument.presentationml.presentation"/>
import { parse } from 'pptxtojson';
document.querySelector('input').addEventListener('change', evt => {
const file = evt.target.files[0];
const reader = new FileReader();
reader.onload = async e => {
const json = await parse(e.target.result);
console.log(json);
};
reader.readAsArrayBuffer(file);
});
Example Output
The JSON output from the pptxtojson library might look like:
{
"slides": {
"fill": {
"type": "color",
"value": "#FF0000"
},
"elements": [
{
"left": 0,
"top": 0,
"width": 72,
"height": 72,
"borderColor": "#1f4e79",
"borderWidth": 1,
"borderType": "solid",
"borderStrokeDasharray": 0,
"fillColor": "#5b9bd5",
"content": "<p style=\"text-align: center;\"><span style=\"font-size: 18pt;font-family: Calibri;\">TEST</span></p>",
"isFlipV": false,
"isFlipH": false,
"rotate": 0,
"vAlign": "mid",
"name": "矩形 1",
"type": "shape",
"shapType": "rect"
},
// more...
],
},
"size": {
"width": 960,
"height": 540
}
}
Supported Features
Slide Dimensions
- width: The width of the slide.
- height: The height of the slide.
Page Background
- type: The background type (color, image, or gradient).
- value: The background value depending on its type.
Elements on Slides
- Text: Elements that contain textual content, described by properties like position, dimensions, border, shadow, fill color, and more.
- Image: Elements that represent images with attributes such as position, dimensions, source, and rotation.
- Shape: Elements like rectangles or custom paths, with detailed styling attributes.
- Table: Representing tabular data with position, size, and border properties.
- Chart: Graphical data representations with specific attributes for chart type and style.
- Video & Audio: Embedded media elements with position and source details.
- SmartArt and Grouped Items: Composed of multiple elements creating complex structures.
For more comprehensive details and types, please refer to the TypeScript definitions.
Acknowledgments
The development of pptxtojson was greatly inspired by PPTX2HTML and PPTXjs. However, unlike these projects, which focus on converting PPT files to web pages, pptxtojson aims to produce clean JSON data.
License
pptxtojson is open-sourced under the MIT License. All related rights are attributed to pipipi-pikachu from 2020 to the present.