OCR 服务

基于 Flask 的 OCR（光学字符识别）服务，支持图片和 PDF 文档的文字识别。

功能特性

支持图片格式（JPG, PNG 等）
支持 PDF 文档（多页识别）
使用 Tesseract OCR 引擎
支持中文简体识别
REST API 接口

环境要求

Python 3.7+
Tesseract OCR 引擎（需单独安装）

安装步骤

1. 安装 Python 依赖

pip install -r requirements.txt

2. 安装 Tesseract OCR

Windows:

下载安装：https://github.com/UB-Mannheim/tesseract/wiki

安装后可能需要在代码中设置路径：

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Linux (Ubuntu/Debian):

sudo apt-get install tesseract-ocr
sudo apt-get install tesseract-ocr-chi-sim

macOS:

brew install tesseract
brew install tesseract-lang

使用方法

启动服务

前台运行：

python3 main.py

后台运行：

nohup python3 main.py > app.log 2>&1 &

查看运行状态：

ps aux | grep main.py

查看日志：

tail -f app.log

停止服务：

pkill -f "python3 main.py"

服务将在 http://0.0.0.0:8090 启动

API 接口

POST /api/ocr

上传图片或 PDF 文件进行 OCR 识别。

请求示例：

curl -X POST -F "file=@test.jpg" http://localhost:8090/api/ocr

响应示例：

{
  "pages": [
    {
      "text": "识别到的文字内容"
    }
  ]
}

Python 请求示例：

import requests

with open('test.jpg', 'rb') as f:
    response = requests.post('http://localhost:8090/api/ocr', files={'file': f})
    print(response.json())

目录结构

ocr_service/
├── main.py              # 主程序
├── requirements.txt     # Python 依赖
├── .gitignore          # Git 忽略文件
└── README.md           # 项目说明

注意事项

Tesseract OCR 需要单独安装
中文识别需要下载中文语言包（chi_sim）
PDF 处理需要安装 poppler（Windows 需要添加到 PATH）

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR 服务

功能特性

环境要求

安装步骤

1. 安装 Python 依赖

2. 安装 Tesseract OCR

使用方法

启动服务

API 接口

目录结构

注意事项

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OCR 服务

功能特性

环境要求

安装步骤

1. 安装 Python 依赖

2. 安装 Tesseract OCR

使用方法

启动服务

API 接口

目录结构

注意事项

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages