使用Python去除红色印章并识别审计师手写签名

调用的接口是讯飞的手写字识别API，比百度通用字识别精准，比百度的手写字识别便宜。新用户可以领取10万次服务量，有效期3个月。我论文涉及到的截图只有6000多张，妥妥够用了。

官网地址：https://www.xfyun.cn/service/wordRecg

教程

使用方法很简单，去下载他们的demo即可：https://www.xfyun.cn/doc/words/wordRecg/API.html

我只具体说一下我是怎么用的

1 定义API调用方法

直接把开发文档里面python的demo粘贴过来，修改的地方只有两个，一是你的APPID，二是你的API_KEY

from urllib import parse
import base64
import hashlib
import time
import requests
from PIL import Image

# 临时存储位置
tmp_path = 'tmp.png'

URL = "http://webapi.xfyun.cn/v1/service/v1/ocr/handwriting"
# 你的APPID
APPID = ""
# 你的API_KEY
API_KEY = ""

def getHeader():
    curTime = str(int(time.time()))
    param = "{\"language\":\""+language+"\",\"location\":\""+location+"\"}"
    paramBase64 = base64.b64encode(param.encode('utf-8'))
    m2 = hashlib.md5()
    str1 = API_KEY + curTime + str(paramBase64, 'utf-8')
    m2.update(str1.encode('utf-8'))
    checkSum = m2.hexdigest()
    header = {
        'X-CurTime': curTime,
        'X-Param': paramBase64,
        'X-Appid': APPID,
        'X-CheckSum': checkSum,
        'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8',
    }
    return header

def getBody(filepath):
    with open(filepath, 'rb') as f:
        imgfile = f.read()
    data = {'image': str(base64.b64encode(imgfile), 'utf-8')}
    return data

language = "cn|en"
location = "true"

2 选取图片并预处理

选取图片

img= Image.open(r"C:\Users\rvw\Desktop\毕业论文\2 数据核对\原始数据\签名截图\2019\002017_2.png")

预处理，并存储到临时位置

new_img = img.split()[0].point(lambda i: i > 150 and 255)
new_img.save(tmp_path)

预处理的目的是通过通道分离、二值化的方法去掉红色印章，150的阈值是我随便设置的，也可以使用什么OTSU的算法，但也不一定比随便设置的强。

至于为什么用pillow不用opencv，是因为我发现用opencv读取我的png截图总会出现莫名其妙的错误，干脆使用pillow算了。效果如下：

3 调用API识别

r = requests.post(URL, headers=getHeader(), data=getBody(tmp_path))
print(r.json())
print(r.json()['data']['block'][0]['line'][0]['word'][0]['content'])

效果如下：

再整两个试一试：

高晓峰

蒲金凤

识别效果还是不错的

使用Python去除红色印章并识别审计师手写签名

教程

1 定义API调用方法

2 选取图片并预处理

3 调用API识别

Comments | NOTHING

取消回复