我还是初学者,但我想写一个字符识别程序.该计划尚未准备好.我编辑了很多,因此评论可能不完全匹配.我将使用8连接进行连接组件标记.
from PIL import Image import numpy as np im = Image.open("D:\\Python26\\PYTHON-PROGRAMME\\bild_schrift.jpg") w,h = im.size w = int(w) h = int(h) #2D-Array for area area = [] for x in range(w): area.append([]) for y in range(h): area[x].append(2) #number 0 is white, number 1 is black #2D-Array for letter letter = [] for x in range(50): letter.append([]) for y in range(50): letter[x].append(0) #2D-Array for label label = [] for x in range(50): label.append([]) for y in range(50): label[x].append(0) #image to number conversion pix = im.load() threshold = 200 for x in range(w): for y in range(h): aaa = pix[x, y] bbb = aaa[0] + aaa[1] + aaa[2] #total value if bbb<=threshold: area[x][y] = 1 if bbb>threshold: area[x][y] = 0 np.set_printoptions(threshold='nan', linewidth=10) #matrix transponation ccc = np.array(area) area = ccc.T #better solution? #find all black pixel and set temporary label numbers i=1 for x in range(40): # width (later) for y in range(40): # heigth (later) if area[x][y]==1: letter[x][y]=1 label[x][y]=i i += 1 #connected components labeling for x in range(40): # width (later) for y in range(40): # heigth (later) if area[x][y]==1: label[x][y]=i #if pixel has neighbour: if area[x][y+1]==1: #pixel and neighbour get the lowest label pass # tomorrows work if area[x+1][y]==1: #pixel and neighbour get the lowest label pass # tomorrows work #should i also compare pixel and left neighbour? #find width of the letter #find height of the letter #find the middle of the letter #middle = [width/2][height/2] #? #divide letter into 30 parts --> 5 x 6 array #model letter #letter A-Z, a-z, 0-9 (maybe more) #compare each of the 30 parts of the letter with all model letters #make a weighting #print(letter) im.save("D:\\Python26\\PYTHON-PROGRAMME\\bild2.jpg") print('done')
jbochi.. 34
OCR确实不是一件容易的事.这就是为什么文本CAPTCHA仍然有效:)
要仅讨论字母提取而不是模式识别,您用来分隔字母的技术称为连接组件标签.由于您要求更有效的方法来执行此操作,请尝试实现本文中描述的两遍算法.可以在文章Blob提取中找到另一种描述.
编辑:这是我建议的算法的实现:
import sys from PIL import Image, ImageDraw class Region(): def __init__(self, x, y): self._pixels = [(x, y)] self._min_x = x self._max_x = x self._min_y = y self._max_y = y def add(self, x, y): self._pixels.append((x, y)) self._min_x = min(self._min_x, x) self._max_x = max(self._max_x, x) self._min_y = min(self._min_y, y) self._max_y = max(self._max_y, y) def box(self): return [(self._min_x, self._min_y), (self._max_x, self._max_y)] def find_regions(im): width, height = im.size regions = {} pixel_region = [[0 for y in range(height)] for x in range(width)] equivalences = {} n_regions = 0 #first pass. find regions. for x in xrange(width): for y in xrange(height): #look for a black pixel if im.getpixel((x, y)) == (0, 0, 0, 255): #BLACK # get the region number from north or west # or create new region region_n = pixel_region[x-1][y] if x > 0 else 0 region_w = pixel_region[x][y-1] if y > 0 else 0 max_region = max(region_n, region_w) if max_region > 0: #a neighbour already has a region #new region is the smallest > 0 new_region = min(filter(lambda i: i > 0, (region_n, region_w))) #update equivalences if max_region > new_region: if max_region in equivalences: equivalences[max_region].add(new_region) else: equivalences[max_region] = set((new_region, )) else: n_regions += 1 new_region = n_regions pixel_region[x][y] = new_region #Scan image again, assigning all equivalent regions the same region value. for x in xrange(width): for y in xrange(height): r = pixel_region[x][y] if r > 0: while r in equivalences: r = min(equivalences[r]) if not r in regions: regions[r] = Region(x, y) else: regions[r].add(x, y) return list(regions.itervalues()) def main(): im = Image.open(r"c:\users\personal\py\ocr\test.png") regions = find_regions(im) draw = ImageDraw.Draw(im) for r in regions: draw.rectangle(r.box(), outline=(255, 0, 0)) del draw #im.show() output = file("output.png", "wb") im.save(output) output.close() if __name__ == "__main__": main()
这是输出文件:
死链接
它并非100%完美,但由于您只是出于学习目的而这样做,这可能是一个很好的起点.使用每个角色的边界框,您现在可以像其他人在这里建议的那样使用神经网络.
OCR确实不是一件容易的事.这就是为什么文本CAPTCHA仍然有效:)
要仅讨论字母提取而不是模式识别,您用来分隔字母的技术称为连接组件标签.由于您要求更有效的方法来执行此操作,请尝试实现本文中描述的两遍算法.可以在文章Blob提取中找到另一种描述.
编辑:这是我建议的算法的实现:
import sys from PIL import Image, ImageDraw class Region(): def __init__(self, x, y): self._pixels = [(x, y)] self._min_x = x self._max_x = x self._min_y = y self._max_y = y def add(self, x, y): self._pixels.append((x, y)) self._min_x = min(self._min_x, x) self._max_x = max(self._max_x, x) self._min_y = min(self._min_y, y) self._max_y = max(self._max_y, y) def box(self): return [(self._min_x, self._min_y), (self._max_x, self._max_y)] def find_regions(im): width, height = im.size regions = {} pixel_region = [[0 for y in range(height)] for x in range(width)] equivalences = {} n_regions = 0 #first pass. find regions. for x in xrange(width): for y in xrange(height): #look for a black pixel if im.getpixel((x, y)) == (0, 0, 0, 255): #BLACK # get the region number from north or west # or create new region region_n = pixel_region[x-1][y] if x > 0 else 0 region_w = pixel_region[x][y-1] if y > 0 else 0 max_region = max(region_n, region_w) if max_region > 0: #a neighbour already has a region #new region is the smallest > 0 new_region = min(filter(lambda i: i > 0, (region_n, region_w))) #update equivalences if max_region > new_region: if max_region in equivalences: equivalences[max_region].add(new_region) else: equivalences[max_region] = set((new_region, )) else: n_regions += 1 new_region = n_regions pixel_region[x][y] = new_region #Scan image again, assigning all equivalent regions the same region value. for x in xrange(width): for y in xrange(height): r = pixel_region[x][y] if r > 0: while r in equivalences: r = min(equivalences[r]) if not r in regions: regions[r] = Region(x, y) else: regions[r].add(x, y) return list(regions.itervalues()) def main(): im = Image.open(r"c:\users\personal\py\ocr\test.png") regions = find_regions(im) draw = ImageDraw.Draw(im) for r in regions: draw.rectangle(r.box(), outline=(255, 0, 0)) del draw #im.show() output = file("output.png", "wb") im.save(output) output.close() if __name__ == "__main__": main()
这是输出文件:
死链接
它并非100%完美,但由于您只是出于学习目的而这样做,这可能是一个很好的起点.使用每个角色的边界框,您现在可以像其他人在这里建议的那样使用神经网络.
OCR非常非常困难.即使使用计算机生成的字符,如果您事先不知道字体和字体大小,也会非常具有挑战性.即使你完全匹配字符,我也不会把它称为"开始"编程项目; 这很微妙.
如果你想识别扫描或手写的字符,那就更难了 - 你需要使用高级数学,算法和机器学习.有很多书和数千篇关于这个主题的文章,所以你不需要重新发明轮子.
我很佩服你的努力,但我认为你还没有达到任何实际困难.到目前为止,您只是随机探索像素并将它们从一个阵列复制到另一个阵列.你还没有真正做过任何比较,我不确定你的"随机游走"的目的.
为什么随意?编写正确的随机算法非常困难.我建议首先从确定性算法开始.
你为什么要从一个阵列复制到另一个阵列?为什么不直接比较?
当你得到比较时,你将不得不处理这个图像与"原型"不完全相同的事实,并且不清楚你将如何处理它.
根据你到目前为止所编写的代码,我有一个想法:尝试编写一个程序,通过图像中的"迷宫"找到它.输入将是图像,加上起始像素和目标像素.输出是从开始到目标的迷宫路径.这是一个比OCR更容易解决的问题 - 解决迷宫是计算机非常适合的事情 - 但它仍然充满乐趣和挑战性.
目前大多数OCR算法都基于神经网络算法. Hopfield网络是一个很好的起点.基于C语言中提供的Hopfield模型,我在python中构建了一个非常基本的图像识别算法,与您描述的类似.我在这里发布了完整的来源.这是一个玩具项目,不适合真正的OCR,但可以让你开始朝着正确的方向前进.
Hopfield模型用作自动关联存储器来存储和调用一组位图图像.通过计算相应的权重矩阵来存储图像.此后,从任意配置开始,存储器将精确地定位在存储的图像上,该存储的图像在汉明距离方面最接近起始配置.因此,给定存储图像的不完整或损坏版本,网络能够调用相应的原始图像.
可以在这里找到一个带有示例玩具的Java小程序; 使用数字0-9的示例输入训练网络.在右侧的框中绘制,单击测试并查看网络结果.
不要让数学符号吓到你,一旦你得到源代码,算法就很简单了.