Use Mathematica to find the most similar Chinese Characters

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Mathematica provides a useless non-Expires header function rasterize, which can output computation results in the image format. For example, the following sentence can print the expanded "reflection" of (x + 1) ^ N ":

Today, I suddenly thought that we can use this function to easily analyze the nature of Chinese characters on images. The function binarize can convert an image into a single monochrome Channel, while imagedata can convert the image into an array for quantitative analysis. Therefore, the following sentence converts a Chinese character to a 01 matrix of 12*12:

In the following sentence, we can sort the 3755 most common first-level Chinese Characters in gb2312 by the number of pixels in the 12 12 pixel dot matrix.

We can see that the minimum number of Chinese Characters in pixels is:

The maximum number of 10 Chinese characters in a pixel is:

I have seen many posts on the Internet, such as "finding me in three seconds" and "have you ever eaten Kang Shuai Bo instant noodles?". I can't help but lament the power of Chinese characters. So I began to think about which of the Chinese characters look the most like? So I used the above functions to write a Mathematica program, after several hours, we finally got the word pairs with the least pixel difference among the 12-pixel lattice characters corresponding to the first-level Chinese characters. There is only one pixel difference between a pair of words, they are "self" and "already" words. Other results are as follows:

Only two pixels apart: (Ming, Wu), (Cambodia, bundle), (unexpectedly, competing)
Only 3 pixels apart: (shell, bright), (inclusive, combined)
Only 4 pixels difference: (top, Earth), (free, rabbit), (soldiers, soldiers), (Shi, Earth)
Only 5 pixels difference: (husband, loss), (Minister, giant), (Wei, Zhu), (Yi, straight)

However, I am not satisfied with the above results, because there is a problem that is ignored: although the difference is the same number of pixels, the difference occurs in different places, subjective visual differences are different. For example, if the difference is only four pixels, people will think that the difference between (Shi, tu) is far less than that between (Shang, tu. We can use a simpler example to illustrate this situation:

Figure A and Figure B, Figure A and Figure C both have only one pixel difference, but from the perspective of human eyes, Figure C is similar to figure. Why? Maybe this is the difference between humans and machines. The machine can precisely know the location of each pixel, but it is difficult for people to do so. Generally, they can only tell the approximate location of each pixel. To simulate the human perception, I thought of blurring all Chinese characters and leaving some shadows around each pixel. This is equivalent to quantifying the differences in the shape from the perspective of myopia.

After the preceding three examples are blurred and converted to 256 gray scale, the sum of squares of the gray scale values of each pixel in Figure A and Figure B is 33699, the sum of squares of the gray values of each pixel in Figure A and Figure C is 29330, and the latter is much smaller than the former. In a few hours, Mathematica finally found the 50 words closest to the shape in this sense:

(JI, already), (unexpectedly, competing), (Ming, Wu), (Cambodia, bundle), (shell, bright), (inclusive, combined), (free, rabbit), (POD, English), (Shi, Earth), (Yi, straight)
(Hehe, Jing), (DU, ), (Fu, lost), (Shi, Jing), (XI, Yin), (Wei, Zhu), (Wei, wai), (check, pick up), (taoban, taste), (bucket, poke)
(End, not), (cowardly, Ru), (end, end), (upper, Earth), (soldiers, secrets), (Su, suo), (Minister, giant), (, Jin), (cover, Wei), (Huai, Wei)
(Excellent, worried), (officer, Yan), (Block, grade), (alcohol, yeast), (yellow, twist), (cocoon, seedling), (child, A few), (canopy, canopy), (supply, flood), (power, curtain)
(Flat, shoulder), (expensive, greedy), (gold, full), (borrow, cherish), (, house), (analyze, fold), (small, large), (big, too), (quiet, pretty), (loss, vector)

To what extent are these words similar? Let's use the first six groups of words in the list above to make a "Chinese Character visual table:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use Mathematica to find the most similar Chinese Characters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use Mathematica to find the most similar Chinese Characters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support