AI-Powered Receipt and Invoice Scanner for Laravel
The receipt-scanner project provides a convenient and efficient way to extract structured data from receipts and invoices within a Laravel application. By leveraging OpenAI's capabilities, this tool simplifies the process of handling receipt data from various formats such as images, PDFs, and emails.
Features
The receipt-scanner is equipped with several features that make it a robust tool for data extraction:
- Integration with OpenAI: The package utilizes OpenAI's Chat and Completion endpoints to process and extract relevant receipt information.
- Multiple Input Formats: It supports extraction from text, PDFs, images, Word documents, and web content, making it versatile for different data sources.
- Optimized Prompt: Includes a carefully designed prompt to ensure accurate receipt parsing.
- OCR Support: Integrates with AWS Textract for optical character recognition (OCR) to handle image-based receipts efficiently.
Installation
To install the receipt-scanner package, use the following Composer command:
composer require helgesverre/receipt-scanner
After installation, publish the configuration file using:
php artisan vendor:publish --tag="receipt-scanner-config"
Since the package relies on the OpenAI Laravel Package, users must also publish its configuration and add an OPENAI_API_KEY
in their .env
file:
php artisan vendor:publish --provider="OpenAI\Laravel\ServiceProvider"
OPENAI_API_KEY="your-key-here"
Usage
Extracting Data from Plain Text
This feature is useful for instances where the receipt data is available in text form. For example, a text from a Paddle.com receipt email can be parsed by invoking the scan
method of the ReceiptScanner
:
ReceiptScanner::scan($text);
Extracting Data from Other Formats
ReceiptScanner
can handle various formats, providing flexibility to users. Here's how to extract data from different sources:
use HelgeSverre\ReceiptScanner\Facades\Text;
$textPdf = Text::pdf(file_get_contents('./receipt.pdf'));
$textImageOcr = Text::textract(file_get_contents('./receipt.jpg'));
// And others...
These contents can then be passed to the ReceiptScanner::scan()
method for data extraction.
Data Models
The package parses the receipt into a Data Transfer Object (DTO) that includes:
- Receipt Metadata: Contains the core information regarding the receipt.
- Merchant Information: Details about the seller or issuer of the receipt.
- Line Items: All individual items listed on the receipt, encapsulated as separate DTOs.
Flexibility and Customization
- Array Output: Users can choose to receive the output as an array for further manipulation.
- Model Specification: Users can specify different OpenAI models to optimize for speed or accuracy.
- Template Customization: Additional prompts/templates can be created to suit specific needs by simply adding blade files.
OCR and Textract Configuration
For large images or PDFs, AWS Textract is used to extract textual data. The configuration requires AWS credentials and setting up a specific ‘Textract’ disk in the Laravel filesystems configuration.
Publishing Prompts
Users can publish and customize the prompt templates used by the scanner by using:
php artisan vendor:publish --tag="receipt-scanner-prompts"
This flexibility allows for tailored templates that are stored within Laravel’s view directory.
Conclusion
The receipt-scanner package is an efficient solution for handling and processing receipts within a Laravel application. Its integration with AI-powered services ensures accuracy and reliability, providing a great tool for developers focused on data extraction and processing needs. The package is licensed under the MIT License, encouraging open usage and modification.