Автоматический парсинг чеков с LlamaIndex и Pydantic / Comments / Habr

Wiggin2014 Oct 13 2025 at 11:55

я примерно такое делаю на стеке aws для чеков на иврите. без этого фреймворка - все сам. юз кейс такой : юзер фоткает чек в бот телеги - система распознает, классифицирует и сохраняет. Потом юзер в том же боте может спрашивать инсайты, типа сколько я потратил на спиртное в прошлом месяце или где молоко самое дешевое. И система (псевдо РАГ) отдает ответы.

кусок README.md

Architecture

Core Components

Producer Lambda (telegram_bot_handler.py) - Handles Telegram webhook and queues messages
Consumer Lambda (consumer_handler.py) - Processes SQS messages via OrchestratorService
Orchestrator Service - Routes messages by type (photo/text/command) and coordinates processing
PostgreSQL Database - Stores receipt data and analysis results
S3 Bucket - Stores receipt images
SQS FIFO Queue - Message queue for asynchronous processing with deduplication
API Gateway HTTP API - Webhook endpoint for Telegram
CloudWatch - Monitoring, alarms, and centralized logging

Processing Flow

Webhook Reception: Producer Lambda receives Telegram updates
Message Queuing: Messages queued to SQS with deduplication
Message Processing: Consumer Lambda processes via OrchestratorService
Document Processing: Multi-strategy approach (LLM/OCR+LLM/Enhanced+OCR+LLM)
Data Validation: Pydantic schema validation and storage
User Response: Formatted results sent back via Telegram API

Services

Orchestrator Service - Main message routing and processing coordination
Receipt Service - End-to-end receipt processing workflow
Document Processor Service - Hybrid OCR/LLM document analysis with strategy pattern
Query Service - Natural language query processing with filter-based retrieval
LLM Service - AI-powered text analysis and structured output generation
Message Queue Service - SQS message queuing for asynchronous processing
Telegram Service - Bot communication and file handling
Storage Service - Database operations and data persistence

Features

Receipt Processing

Supports Israeli receipts in Hebrew
OCR using Google Vision API or AWS Textract
LLM analysis using AWS Bedrock (Claude Sonnet 4) or OpenAI GPT models
Automatic categorization using predefined taxonomy system
Multi-image receipt support with album processing and image stitching
Advanced image preprocessing (deskewing, enhancement, grayscale conversion)
Pydantic-based data validation and schema enforcement
Receipt limits per user (100 receipts maximum)
Support for various payment methods and currencies

Processing Modes

LLM Mode: Direct image analysis using vision-enabled LLMs
OCR+LLM Mode: OCR text extraction followed by LLM structuring
Preprocessed+OCR+LLM Mode: Image enhancement + OCR + LLM analysis

Deployment & Infrastructure

Multi-stage deployment (dev/prod)
AWS CDK Infrastructure as Code
Docker-based Lambda functions with shared image
GitHub Actions CI/CD pipeline
CloudWatch monitoring with custom alarms
Dead letter queue for failed message handling

Так вот уперся в проблему длинных чеков (у нас бывают чеки по полметра длиной). И, оказалось, что никто не может точно склеить фотку 2-х половин чека. Пробовал и разные ллмы (буквально всех топов), и opencv, и pillow ... Всегда косяки... на том и застрял...

Автоматический парсинг чеков с LlamaIndex и Pydantic

Comments 2

Articles