Use Mathematica to find the most similar Chinese Characters

Source: Internet
Author: User

Mathematica provides a useless non-Expires header function rasterize, which can output computation results in the image format. For example, the following sentence can print the expanded "reflection" of (x + 1) ^ N ":

Today, I suddenly thought that we can use this function to easily analyze the nature of Chinese characters on images. The function binarize can convert an image into a single monochrome Channel, while imagedata can convert the image into an array for quantitative analysis. Therefore, the following sentence converts a Chinese character to a 01 matrix of 12*12:


In the following sentence, we can sort the 3755 most common first-level Chinese Characters in gb2312 by the number of pixels in the 12 12 pixel dot matrix.

We can see that the minimum number of Chinese Characters in pixels is:

The maximum number of 10 Chinese characters in a pixel is:

 
 
I have seen many posts on the Internet, such as "finding me in three seconds" and "have you ever eaten Kang Shuai Bo instant noodles?". I can't help but lament the power of Chinese characters. So I began to think about which of the Chinese characters look the most like? So I used the above functions to write a Mathematica program, after several hours, we finally got the word pairs with the least pixel difference among the 12-pixel lattice characters corresponding to the first-level Chinese characters. There is only one pixel difference between a pair of words, they are "self" and "already" words. Other results are as follows:

Only two pixels apart: (Ming, Wu), (Cambodia, bundle), (unexpectedly, competing)
Only 3 pixels apart: (shell, bright), (inclusive, combined)
Only 4 pixels difference: (top, Earth), (free, rabbit), (soldiers, soldiers), (Shi, Earth)
Only 5 pixels difference: (husband, loss), (Minister, giant), (Wei, Zhu), (Yi, straight)

However, I am not satisfied with the above results, because there is a problem that is ignored: although the difference is the same number of pixels, the difference occurs in different places, subjective visual differences are different. For example, if the difference is only four pixels, people will think that the difference between (Shi, tu) is far less than that between (Shang, tu. We can use a simpler example to illustrate this situation:

Figure A and Figure B, Figure A and Figure C both have only one pixel difference, but from the perspective of human eyes, Figure C is similar to figure. Why? Maybe this is the difference between humans and machines. The machine can precisely know the location of each pixel, but it is difficult for people to do so. Generally, they can only tell the approximate location of each pixel. To simulate the human perception, I thought of blurring all Chinese characters and leaving some shadows around each pixel. This is equivalent to quantifying the differences in the shape from the perspective of myopia.

After the preceding three examples are blurred and converted to 256 gray scale, the sum of squares of the gray scale values of each pixel in Figure A and Figure B is 33699, the sum of squares of the gray values of each pixel in Figure A and Figure C is 29330, and the latter is much smaller than the former. In a few hours, Mathematica finally found the 50 words closest to the shape in this sense:

(JI, already), (unexpectedly, competing), (Ming, Wu), (Cambodia, bundle), (shell, bright), (inclusive, combined), (free, rabbit), (POD, English), (Shi, Earth), (Yi, straight)
(Hehe, Jing), (DU, ), (Fu, lost), (Shi, Jing), (XI, Yin), (Wei, Zhu), (Wei, wai), (check, pick up), (taoban, taste), (bucket, poke)
(End, not), (cowardly, Ru), (end, end), (upper, Earth), (soldiers, secrets), (Su, suo), (Minister, giant), (, Jin), (cover, Wei), (Huai, Wei)
(Excellent, worried), (officer, Yan), (Block, grade), (alcohol, yeast), (yellow, twist), (cocoon, seedling), (child, A few), (canopy, canopy), (supply, flood), (power, curtain)
(Flat, shoulder), (expensive, greedy), (gold, full), (borrow, cherish), (, house), (analyze, fold), (small, large), (big, too), (quiet, pretty), (loss, vector)

 
 
 
To what extent are these words similar? Let's use the first six groups of words in the list above to make a "Chinese Character visual table:

 
 
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.