TL;DR: Traditional OCR struggles with complex PDFs like bank statements. This guide presents a powerful solution using n8n workflow automation, Syncfusion Document Processing APIs, and advanced Multimodal LLMs to achieve highly accurate and automated data extraction from PDF statements, going beyond basic OCR to interpret structured content reliably.
Transcribing financial documents like bank statements has traditionally relied on OCR (Optical Character Recognition), which often struggles with complex layouts and tabular data. However, recent advancements in AI, specifically multimodal large language models, a type of artificial intelligence (AI) system that can process and understand data from multiple formats such as text, images, audio, and video, offer a more intelligent and accurate approach.
In this blog, we’ll explore how to use n8n, an open-source, low-code workflow automation tool, to build a streamlined pipeline for extracting data from bank statement PDFs.
👉 New to n8n? Get started with installation here.
A key component of this solution is the use of Syncfusion® document processing APIs, which provide powerful tools for reading, converting, and manipulating PDF documents. These APIs ensure that the content is cleanly extracted and pre-processed before being passed to the language model, dramatically improving accuracy and efficiency.
By combining Syncfusion’s robust PDF handling, n8n’s flexible workflow automation, and the reasoning capabilities of multimodal large language models (LLMs), this solution goes far beyond basic OCR. It enables reliable interpretation of structured content such as tables, nested sections, and even non-standard document formats.
Whether you’re a developer looking to automate document processing or a business user aiming to streamline financial operations, this approach showcases the power of combining modern AI with enterprise-grade tools and visual automation.
Say goodbye to tedious PDF tasks and hello to effortless document processing with Syncfusion's PDF Library.
Building an Automated PDF Data Extraction Workflow in n8n
This workflow demonstrates how to automate the extraction and processing of bank statement data using n8n and Syncfusion’s PDF conversion tools.
Prerequisites
To run this workflow successfully, ensure you have:
- Access to the Google Gemini API (or an alternative multimodal LLM)
- Google Drive access for document storage
- Access to Syncfusion’s PDF to Image API for PDF conversion
Example PDF
In this demo, we use a mock bank statement. It contains complex 5-column layouts, which challenge traditional OCR tools.
n8n workflow JSON
You’ll find an example of the complete workflow below. Simply copy the JSON and paste it into the n8n workflow editor to get started instantly.
{
"name": "My workflow",
"nodes": [
{
"parameters": {
"sortFieldsUi": {
"sortField": [
{
"fieldName": "fileName"
}
]
},
"options": {}
},
"id": "f3f22508-1142-4751-89c8-fd7c2d10f372",
"name": "Sort Pages",
"type": "n8n-nodes-base.sort",
"position": [
1460,
300
],
"typeVersion": 1
},
{
"parameters": {},
"id": "4e77d811-2da0-418d-b3eb-0e0292948127",
"name": "Extract Zip File",
"type": "n8n-nodes-base.compression",
"position": [
940,
300
],
"typeVersion": 1.1
},
{
"parameters": {
"jsCode": "let results = [];\n\nfor (item of items) {\n for (key of Object.keys(item.binary)) {\n results.push({\n json: {\n fileName: item.binary[key].fileName\n },\n binary: {\n data: item.binary[key],\n }\n });\n }\n}\n\nreturn results;"
},
"id": "0a83fa28-3e0e-4da4-9af9-1694a1b13986",
"name": "Images To List",
"type": "n8n-nodes-base.code",
"position": [
1200,
300
],
"typeVersion": 2
},
{
"parameters": {
"content": "## 2. Convert and Split PDF Pages into Seperate Images\n\nCurrently, the vision model we'll be using can't accept raw PDFs so we'll have to convert our PDF to a image in order to use it. To achieve this, we'll use the Syncfusion's PDF to Image converter API from [document processing APIs](https://hub.docker.com/r/syncfusion/document-processing-apis) for convenience but if we need data privacy (recommended!).\n\nWe will ask the PDF service to return each page of our statement as separate images, which it does so as a zip file. Next steps is to just unzip the file and convert the output as a list of images.",
"height": 634,
"width": 1608,
"color": 7
},
"id": "b3620bd1-77a4-4eb7-aefc-35ced590e42c",
"name": "Sticky Note2",
"type": "n8n-nodes-base.stickyNote",
"position": [
0,
60
],
"typeVersion": 1
},
{
"parameters": {
"content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n### Privacy Warning!\nThis example uses a public third party service. If your data is senstive, please swap this out for the self-hosted version!",
"height": 374.95069767441856,
"width": 199.23348837209306
},
"id": "634c3171-07c9-4587-851d-26f588fb8032",
"name": "Sticky Note5",
"type": "n8n-nodes-base.stickyNote",
"position": [
1140,
580
],
"typeVersion": 1
},
{
"parameters": {},
"id": "5f7aad5b-ff26-4999-8991-d1b8b4cf7edb",
"name": "When clicking ‘Test workflow’",
"type": "n8n-nodes-base.manualTrigger",
"position": [
-500,
320
],
"typeVersion": 1
},
{
"parameters": {
"operation": "download",
"fileId": {
"__rl": true,
"mode": "id",
"value": "1wS9U7MQDthj57CvEcqG_Llkr-ek6RqGA"
},
"options": {}
},
"id": "9ca59074-0f2f-4e3e-abec-c8a0a0d36f34",
"name": "Get Bank Statement",
"type": "n8n-nodes-base.googleDrive",
"position": [
-260,
320
],
"typeVersion": 3
},
{
"parameters": {
"content": "## 1. Download Bank Statement PDF\n[Read more about Google Drive node](https://docs.n8n.io/integrations/builtin/app-nodes/n8n-nodes-base.googledrive)\n\nFor this demonstration, we'll pull an example bank statement off Google Drive however, you can also swap this out for other triggers such as webhook.\n\nYou can use the example bank statement created specifically for this workflow here: https://drive.google.com/file/d/1wS9U7MQDthj57CvEcqG_Llkr-ek6RqGA/view?usp=sharing",
"height": 478.89348837209275,
"width": 546.4534883720931,
"color": 7
},
"id": "ed8e430e-25e4-4d1d-add2-104f2f4a0b65",
"name": "Sticky Note1",
"type": "n8n-nodes-base.stickyNote",
"position": [
-620,
60
],
"typeVersion": 1
},
{
"parameters": {
"content": "### 💡 About the Example PDF\nScanned PDFs (ie. where each page is a scanned image) are a use-case where extracting PDF text content will not work. Vision models are a great solution as this workflow aims to demonstrate!",
"height": 125.41023255813957,
"width": 366.00558139534894,
"color": 5
},
"id": "d966739b-024b-4d35-ad3a-9a192d09c102",
"name": "Sticky Note6",
"type": "n8n-nodes-base.stickyNote",
"position": [
-440,
560
],
"typeVersion": 1
},
{
"parameters": {
"method": "POST",
"url": "http://host.docker.internal:8003/v1/conversion/pdf-to-image",
"sendBody": true,
"contentType": "multipart-form-data",
"bodyParameters": {
"parameters": [
{
"name": "settings",
"value": "={{JSON.stringify({file: \"file\", password: \"\", imageFormat: \"JPG\"})}}"
},
{
"parameterType": "formBinaryData",
"name": "file",
"inputDataFieldName": "data"
}
]
},
"options": {}
},
"id": "c5848fa7-80b6-4fb9-8f02-00da1173173c",
"name": "Split PDF into Images",
"type": "n8n-nodes-base.httpRequest",
"position": [
140,
320
],
"typeVersion": 4.2,
"notesInFlow": false
},
{
"parameters": {
"content": "## 3. Resize and Convert the images to Markdown Using Vision Model\n[Learn more about using the Basic LLM node](https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.chainllm)\n\nUnlike traditional OCR, vision models (\"VLMs\") \"transcribe\" what they see so while we shouldn't expect an exact replication of a document, they may perform better making sense of complex document layouts ie. such as with horizontally stacked tables.\n \nIn this demonstration, we can transcribe our bank statement scans to markdown text for the purpose of further processing. With markdown, we can retain tables or columnar data found in the document. We'll employ two optimisations however as a workaround for token and timeout limits (1) we'll only transcribe one page at a time and (2) we'll shrink the pages just a little just enough to speed up processing but not enough to reduce our required resolution.",
"height": 636,
"width": 895,
"color": 7
},
"id": "a383fb51-344b-4b44-86f2-e13eb2ea53a8",
"name": "Sticky Note",
"type": "n8n-nodes-base.stickyNote",
"position": [
1700,
60
],
"typeVersion": 1
},
{
"parameters": {
"content": "## 4. Extract Key Data Confidently From Statement\n[Read more about the Information Extractor](https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.information-extractor)\n\nWith our newly generated transcript, let's pull just the deposit line items from our statement. Processing all pages together as images may have been compute-extensive but as text, this is usually no problem at all for our LLM.\n\nFor our example bank statement PDF, the resulting extraction should be 8 table rows where a value exists in the \"deposits\" column.",
"height": 634,
"width": 780,
"color": 7
},
"id": "35c3e16f-eddb-434c-8414-80130984ec90",
"name": "Sticky Note8",
"type": "n8n-nodes-base.stickyNote",
"position": [
2660,
60
],
"typeVersion": 1
},
{
"parameters": {
"content": "### 💡 Don't use Google?\nFeel free to swap the model out for any state-of-the-art multimodal model which supports image inputs such as GPT4o(-mini) or Claude Sonnet/Opus. Note, I've found Gemini to produce the most accurate and consistent for this example use-case so no guarantees if you switch!",
"height": 130.35162790697677,
"width": 498.18790697674433,
"color": 5
},
"id": "0bb5eba0-f00d-4069-b030-24fc74e225be",
"name": "Sticky Note9",
"type": "n8n-nodes-base.stickyNote",
"position": [
2020,
740
],
"typeVersion": 1
},
{
"parameters": {
"operation": "resize",
"width": 75,
"height": 75,
"resizeOption": "percent",
"options": {}
},
"id": "01a8bd7f-d8b8-4284-99fc-e3db4767a0d7",
"name": "Resize Images for AI",
"type": "n8n-nodes-base.editImage",
"position": [
1860,
380
],
"typeVersion": 1
},
{
"parameters": {
"promptType": "define",
"text": "transcribe the image to markdown.",
"messages": {
"messageValues": [
{
"message": "=You help transcribe documents to markdown, keeping faithful to all text printed and visible to the best of your ability. Ensure you capture all headings, subheadings, titles as well as small print.\nFor any tables found with the document, convert them to markdown tables. If table row descriptions overflow into more than 1 row, concatanate and fit them into a single row. If two or more tables are adjacent horizontally, stack the tables vertically instead. There should be a newline after every markdown table.\nFor any graphics, use replace with a description of the image. Images of scanned checks should be converted to the phrase \"\"."
},
{
"type": "HumanMessagePromptTemplate",
"messageType": "imageBinary"
}
]
}
},
"id": "7b0b5bf0-06af-44a6-b4dc-dd6255adaff0",
"name": "Transcribe to Markdown",
"type": "@n8n/n8n-nodes-langchain.chainLlm",
"position": [
2160,
380
],
"typeVersion": 1.4
},
{
"parameters": {
"modelName": "models/gemini-1.5-pro-latest",
"options": {}
},
"id": "c9d5f4fe-d42d-4817-a1c0-9d15c9c28b86",
"name": "Google Gemini Chat Model",
"type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
"position": [
2180,
560
],
"typeVersion": 1
},
{
"parameters": {
"modelName": "models/gemini-1.5-pro-latest",
"options": {
"safetySettings": {
"values": [
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE"
}
]
}
}
},
"id": "d3b9e255-4a2d-4548-9ad8-222df43ec85b",
"name": "Google Gemini Chat Model1",
"type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
"position": [
3080,
540
],
"typeVersion": 1
},
{
"parameters": {
"fieldsToAggregate": {
"fieldToAggregate": [
{
"fieldToAggregate": "text",
"renameField": true,
"outputFieldName": "pages"
}
]
},
"options": {}
},
"id": "06f435fa-ed0b-4833-b958-0694ca4bc33e",
"name": "Combine All Pages",
"type": "n8n-nodes-base.aggregate",
"position": [
2760,
380
],
"typeVersion": 1
},
{
"parameters": {
"text": "= {{ $json.pages.join('---') }}",
"schemaType": "manual",
"inputSchema": "{\n \"type\": \"array\",\n \"items\": {\n\t\"type\": \"object\",\n\t\"properties\": {\n \"date\": { \"type\": \"string\" },\n \"description\": { \"type\": \"string\" },\n \"amount\": { \"type\": \"number\" }\n\t}\n }\n}",
"options": {
"systemPromptTemplate": "This statement contains tables with rows showing deposit and withdrawal made to the user's account. Deposits and withdrawals are identified by have the amount in their respective columns. What are the deposits to the account found in this statement?"
}
},
"id": "c31d83f2-9427-48c7-acac-51d190848e4f",
"name": "Extract All Deposit Table Rows",
"type": "@n8n/n8n-nodes-langchain.informationExtractor",
"position": [
3060,
380
],
"typeVersion": 1
},
{
"parameters": {},
"type": "n8n-nodes-base.wait",
"typeVersion": 1.1,
"position": [
720,
400
],
"id": "758a35c6-48fa-4c9d-8659-2e9dd7de963e",
"name": "5 Seconds delay",
"webhookId": "a6b4358f-b5b7-4aec-85aa-491b8a421335"
},
{
"parameters": {
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "c77662b7-68d0-4fd4-9376-ce35184c4328",
"leftValue": "={{$binary.data.mimeType}}",
"rightValue": "application/zip",
"operator": {
"type": "string",
"operation": "equals",
"name": "filter.operator.equals"
}
}
],
"combinator": "or"
},
"options": {}
},
"type": "n8n-nodes-base.if",
"typeVersion": 2.2,
"position": [
520,
320
],
"id": "0f76b8a5-eb54-49b3-a6a9-f655b7d6f2fb",
"name": "If Job Completed"
},
{
"parameters": {
"url": "=http://host.docker.internal:8003/v1/conversion/status/{{ $json.jobID }}",
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [
340,
320
],
"id": "293c1809-6f38-45b9-941b-34ae67ebad2f",
"name": "Job Request Status"
}
],
"pinData": {},
"connections": {
"Sort Pages": {
"main": [
[
{
"node": "Resize Images for AI",
"type": "main",
"index": 0
}
]
]
},
"Extract Zip File": {
"main": [
[
{
"node": "Images To List",
"type": "main",
"index": 0
}
]
]
},
"Images To List": {
"main": [
[
{
"node": "Sort Pages",
"type": "main",
"index": 0
}
]
]
},
"When clicking ‘Test workflow’": {
"main": [
[
{
"node": "Get Bank Statement",
"type": "main",
"index": 0
}
]
]
},
"Get Bank Statement": {
"main": [
[
{
"node": "Split PDF into Images",
"type": "main",
"index": 0
}
]
]
},
"Split PDF into Images": {
"main": [
[
{
"node": "Job Request Status",
"type": "main",
"index": 0
}
]
]
},
"Resize Images for AI": {
"main": [
[
{
"node": "Transcribe to Markdown",
"type": "main",
"index": 0
}
]
]
},
"Transcribe to Markdown": {
"main": [
[
{
"node": "Combine All Pages",
"type": "main",
"index": 0
}
]
]
},
"Google Gemini Chat Model": {
"ai_languageModel": [
[
{
"node": "Transcribe to Markdown",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"Google Gemini Chat Model1": {
"ai_languageModel": [
[
{
"node": "Extract All Deposit Table Rows",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"Combine All Pages": {
"main": [
[
{
"node": "Extract All Deposit Table Rows",
"type": "main",
"index": 0
}
]
]
},
"5 Seconds delay": {
"main": [
[
{
"node": "Job Request Status",
"type": "main",
"index": 0
}
]
]
},
"If Job Completed": {
"main": [
[
{
"node": "Extract Zip File",
"type": "main",
"index": 0
}
],
[
{
"node": "5 Seconds delay",
"type": "main",
"index": 0
}
]
]
},
"Job Request Status": {
"main": [
[
{
"node": "If Job Completed",
"type": "main",
"index": 0
}
]
]
}
},
"active": false,
"settings": {
"executionOrder": "v1"
},
"versionId": "b792ccc8-fca3-440e-8cbd-9c5b23361369",
"meta": {
"templateCredsSetupCompleted": true,
"instanceId": "421720c8acb23308b326fb5f8046f722913be5c12f0e4650e656491b33729ce7"
},
"id": "z8lti4WW9UJGdy0B",
"tags": []
}
How the Workflow works
Step 1: Import the PDF from Google Drive
The workflow starts by pulling a PDF from Google Drive. It contains complex tabular data (5 columns), which often causes misalignment issues with standard OCR, such as mistaking deposits for withdrawals.
Note: To access and run files properly, authenticate them with your own Google Drive credentials.
Step 2: Convert PDF to images using Syncfusion® PDF to image API
Multimodal large language models (LLMs) do not support direct input of PDF files. Therefore, it is necessary to convert each page of a PDF into an image format ( PNG or JPG) for further processing.
The Syncfusion® PDF to image API, part of Syncfusion’s Document Processing Libraries, enables high-quality, programmatic conversion of PDF files into image formats. This solution is fully self-hosted, making it ideal for handling sensitive or confidential documents, such as bank statements, where cloud-based services may not be appropriate.
You can easily deploy the API using Syncfusion’s official document processing APIs Docker image, which can be pulled from Docker Hub and hosted locally or on any platform of you choose.
🔗Hosting and deployment documentation: Refer to Syncfusion’s hosting guide.
Note: Syncfusion’s Document Processing Libraries are licensed products. A valid license is required for use in production environments.
To convert a PDF to images, use the following endpoint path after your Syncfusion® Document Processing APIs hosted domain:
/v1/conversion/pdf-to-image
Step 3: Decompress and sort images
If the images are returned as a ZIP archive, use n8n’s Decompress node to extract them. Then, use the Sort node to ensure images are arranged in the correct page order before further processing.
Step 4: Resize images
Use the edit image node in n8n to resize each page image. This step helps balance image resolution (for accuracy) with processing speed and API limitations.
Step 5: Pass images to the Multimodal LLM (Gemini 1.5 Pro)
Each resized image is then passed to a basic LLM node configured to use Google Gemini 1.5 Pro (or another multimodal LLM of your choice).
- In the LLM node, set the user message type to binary (data) — this is how the image is injected.
- Your prompt can instruct the LLM to either:
- Transcribe the entire page to markdown (for full page reconstruction), or
- Extract specific data points directly (a faster, more focused method).
Note: Configure the node with your API credentials to ensure proper access and avoid authentication errors.
Step 6: Extract structured data
If you choose Markdown transcription, pass the resulting text into a second LLM node to extract key information (deposit line items).
After running the workflow, you’ll receive a clean, structured Markdown transcription of your bank statement, ready for easy review and further processing.
Customizing the Workflow for different Multimodal LLMs
At the time of writing, Gemini 1.5 Pro provides the most accurate results for document parsing at a relatively low cost. However, you can switch to other multimodal LLMs such as:
- OpenAI GPT-4 with vision
- Anthropic Claude 3
- Any other LLM that supports image input
If you don’t need a Markdown output, you can instruct the LLM to extract specific information directly within the prompt. This reduces complexity and speeds up processing.
Unleash the full potential of Syncfusion's PDF Library! Explore our advanced resources and empower your apps with cutting-edge functionalities.
Bonus: Reuse the template for other document types
Need to process more than just bank statements? This workflow can also be adapted for:
- Invoices.
- Inventory lists.
- Contracts.
- Legal documents.
- Any other PDF with structured information.
Why use Syncfusion® Document Processing APIs?
1. Comprehensive PDF toolkit
Syncfusion® provides APIs for reading, converting, editing, and extracting data from PDFs, eliminating the need for multiple tools.
2. High accuracy with complex layouts
The APIs handle tables, nested sections, and irregular formats with precision, ensuring high-quality data extraction.
3. Seamless automation integration
Designed for easy use with platforms like n8n, Syncfusion’s APIs simplify building automated workflows.
4. Enterprise-grade performance
Ideal for small and large-scale operations, delivering consistent results even with heavy document loads.
5. Flexible licensing & cross-platform support
Supports Windows, Linux, and cloud environments with flexible licensing options to fit your deployment needs.
FAQs
Q1: What if my PDF contains scanned images or handwritten content—can this workflow still extract data accurately?
Yes, the workflow can handle scanned images and handwritten content, but accuracy depends on image quality and the OCR capabilities used. When PDFs contain scanned pages or handwritten notes, the workflow first converts each page into an image. These images are then processed using OCR (Optical Character Recognition) and multimodal LLMs.
- For typed text in scans: OCR tools like Tesseract or Azure OCR perform well.
- For handwritten content: Accuracy varies based on handwriting clarity. Advanced handwriting recognition models (e.g., Google Cloud Vision or Microsoft Read API) can help.
- Low-quality scans: Preprocessing steps like image enhancement, noise reduction, and contrast adjustment can significantly improve results.
Q2: How do I handle token limits or timeouts when processing large bank statements with multiple pages?
To manage token limits and timeouts effectively:
- Process one page at a time: Convert each PDF page to an image and send it individually to the LLM.
- Use summarization or chunking: For long pages, split content into smaller sections or summarize before sending to the model.
- Optimize image size: Shrink images to reduce processing time without losing readability.
- Parallel processing: If supported, process multiple pages concurrently to speed up workflows.
- Use external storage: Store intermediate results (e.g., extracted text or structured data) in a database or file system to avoid reprocessing.
Q3: How customizable is the data extraction step for different use cases (e.g., extracting only totals, dates, or specific vendors)?
The data extraction step is highly customizable. You can tailor prompts or post-processing logic to extract specific fields such as:
- Totals or balances
- Transaction dates
- Vendor names or categories
- Custom keywords or patterns
Ways to customize:
- Prompt engineering: Modify the LLM prompt to focus on specific data points.
- Regex or rule-based filters: Apply post-processing filters to refine extracted data.
- Template-based extraction: Use predefined templates for known document formats.
Q4: Why is PDF conversion to images necessary for this workflow, and what role do Syncfusion® Document Processing APIs play?
Multimodal LLMs typically do not support direct PDF input. Converting each page to an image (e.g., JPG or PNG) allows the model to “see” the content visually, enabling better understanding of layout, tables, and embedded text.
Syncfusion® Document Processing APIs play a crucial role by:
- Converting PDFs to high-quality images programmatically.
- Preserving layout and resolution, which is vital for accurate extraction.
- Supporting batch processing and automation, making it scalable for enterprise use.
Syncfusion’s high-performance PDF Library allows you to create PDF documents from scratch without Adobe dependencies.
Conclusion
By combining Syncfusion® document processing APIs, n8n, and multimodal large language models (LLMs), we unlock a powerful and modern approach to transcribing bank statements and other complex documents. Syncfusion’s robust PDF tools handle the heavy lifting in converting, analyzing layouts, and extracting pages to deliver clean, structured input for the language model to interpret.
This integration enables significantly more accurate data extraction than traditional OCR methods, particularly when dealing with intricate tables, multi-column layouts, or scanned documents. With n8n as the automation backbone, the entire pipeline from file ingestion to AI-driven transcription to structured data output can be orchestrated seamlessly, without writing complex code.
As the demand for document automation grows across finance, insurance, and legal industries, this approach provides a scalable, intelligent solution that combines the strengths of low-code automation, AI, and enterprise-grade PDF processing.
Need help getting started? Feel free to reach out via our support forums, support portal, or feedback portal. We are always happy to assist you!