Python generates a Chinese character image Font Library

Last Update:2014-06-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I recently worked on a document recognition project, and I needed to build a font for Chinese Character Recognition. I found all kinds of OCR on the internet, and I don't feel good. This technology should be quite mature. I have a lot of OCR software, however, I did not find a few papers with gold content, nor did I see any big bull public font. I used the pygame rendering font to generate the font, And I used PIL to cut the neat pictures to get the font. Pygame rendering font to generate font with pygame rendering font I reference this article, according to The GB2323-8 standard, commonly used Chinese characters 3500, covering 99.7% of the usage, plus a total of 6763 commonly used, overwrite the usage of 99.99%. To create a font image, you can find 3500 frequently used Chinese characters on the Internet and render each sub-item in the font. Copy the Code 1 def pasteWord (word): 2 ''' and enter a text, output an image containing the text ''' 3 pygame. init () 4 font = pygame. font. font (OS. path. join (". /fonts ",". ttf "), 22) 5 text = word. decode ('utf-8') 6 imgName = "E:/dataset/chinesedb/chinese/" + text + ". png "7 paste (text, font, imgName) 8 9 def paste (text, font, imgName, area = (0,-9): 10''' Based on the font, paste a text to the Image and save '''11 im = Image. new ("RGB", (32, 32 ),( 255,255,255) 12 rtext = font. render (text, True, (0, 0, 0), (255,255,255) 13 SiO2 = StringIO. stringIO () 14 pygame. image. save (rtext, SiO2) 15. seek (0) 16 line = Image. open (SiO2) 17 im. paste (line, area) 18 # im. show () 19 im. an error is always reported when the number of images rendered by save (imgName) is large. I tried again for the rendered failed text and finally got a font containing 3510 words (plus 10 digits: another way to generate a font by character segmentation is to arrange 3500 words in word and convert them to PDF files to save them as an image, as shown below: dense words, but very neat, no image processing algorithms are required. You only need to find blank rows and columns, and cut them by row or column. You only need to save the ordered cut. Cut, the cut image can still correspond to the word, the following is the cut code: Copy code 1 #! Encoding = UTF-8 2 import Image 3 import OS 4 5 def yStart (gray): 6 m, n = gray. size 7 for j in xrange (n): 8 for I in xrange (m): 9 if gray. getpixel (I, j) = 0: 10 return j11 def yEnd (gray): 12 m, n = gray. size13 for j in xrange (n-1,-1,-1): 14 for I in xrange (m): 15 if gray. getpixel (I, j) = 0: 16 return j17 18 def xStart (gray): 19 m, n = gray. size20 for I in xrange (m): 21 for j in xrange (n): 22 if gray. getpix El (I, j) = 0: 23 return i24 def xEnd (gray): 25 m, n = gray. size26 for I in xrange (m-1,-1,-1): 27 for j in xrange (n): 28 if gray. getpixel (I, j) = 0: 29 return i30 def xBlank (gray): 31 m, n = gray. size32 blanks = [] 33 for I in xrange (m): 34 for j in xrange (n): 35 if gray. getpixel (I, j) = break37 if j = n-1: 38 blanks. append (I) 39 return blanks40 41 def yBlank (gray): 42 m, n = gray. size43 blanks = [] 44 For j in xrange (n): 45 for I in xrange (m): 46 if gray. getpixel (I, j) = break48 if I = M-1: 49 blanks. append (j) 50 return blanks51 52 def getWordsList (): 53 f = open('3500.txt ') 54 line = f. read (). strip () 55 wordslist = line. split ('') 56 f. close () 57 return wordslist58 59 count = 060 wordslist = [] 61 def getWordsByBlank (img, path): 62''' fetch an image based on the blank spaces in the row and column, good results ''' 63 global count64 global wordslist65 gray = Img. split () [0] 66 xblank = xBlank (gray) 67 yblank = yBlank (gray) 68 # more than one consecutive blank pixel, however, we only keep the first and last blank pixels in the continuous area, as the start point and end point of the text 69 xblank = [xblank [I] for I in xrange (len (xblank) if I = 0 or I = len (xblank) -1 or not (xblank [I] = xblank [I-1] + 1 and xblank [I] = xblank [I + 1]-1)] 70 yblank = [yblank [I] for I in xrange (len (yblank) if I = 0 or I = len (yblank) -1 or not (yblank [I] = yblank [I-1] + 1 and yblank [I] = y Blank [I + 1]-1)] 71 for j in xrange (len (yblank)/2): 72 for I in xrange (len (xblank)/2 ): 73 area = (xblank [I * 2], yblank [j * 2], xblank [I * 2 + 1] + 32, yblank [j * 2] + 32) # Here the fixed word size is 32 pixels 74 # area = (xblank [I * 2], yblank [j * 2], xblank [I * 2 + 1], yblank [j * 2 + 1]) 75 word = img. crop (area) 76 word.save(path+wordslist?count={'.png ') 77 count + = 178 if count> = len (wordslist): 79 return80 81 82 def getWordsFormImg (imgName, path): 83 png = Image. open (imgName, 'R') 84 img = png. convert ('1') 85 gray = img. split () [0] 86 # first cut out the text area 87 area = (xStart (gray)-1, yStart (gray)-1, xEnd (gray) + 2, yEnd (gray) + 2) 88 img = img. crop (area) 89 getWordsByBlank (img, path) 90 91 def getWrods (): 92 global wordslist93 wordslist = getWordsList () 94 imgs = ["l1.png", "l2.png ", "l3.png"] 95 for img in imgs: 96 getWordsFormImg (img, 'words/') 97 98 if _ name _ = "_ main _": 99 getW Rods () can also produce good results: you are not familiar with the image processing, and use the methods of tubaozi. The recognition of Chinese characters is relatively difficult, corresponding to neat pictures, sampling DTW for similar items in the font library, the effect is not bad, but after cutting the Articles Taken by scanners and cameras, poor results. I used a back-propagation neural network, but 3500 Chinese characters are equivalent to 3500 classes. The classification problem of over-many classes is hard to cope with, mainly because the training data is too small, there is only one font in hand.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python generates a Chinese character image Font Library

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python generates a Chinese character image Font Library

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support