Those on the Tokyo University Admission poster (Part 2)

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I also followed suit and posted my feelings and processes below.

I chose my sister's umbrella. The first is OCR recognition 01, which is the Tesseract-OCR used in Linux. The usage is detailed in this post, however, it is a little old. For more information about the configuration process, see the official documentation.

Let's talk about it in advance. Even if we use Tesseract-OCR, it is not accurate enough =! (See what the people in stage1 use with Wang Han. I didn't try again later. I went on with 01, and I went on to the next step ..) I did this.

First, download the Tesseract-OCR source code from Google Code, solve the dependencies according to the documentation, and then configure and install. Next, download the Language Pack. You can see a math/equation package in the list. Instead of downloading this package if you want to recognize numbers, you only need to use Eng. Decompress the package to tessdata. If you have your own installation directory, You need to export the tessdata sub-folder (if you cannot find the training set, Tesseract will prompt you to set it ), otherwise, put it in/user/local/share/tessdata. The latest tesseract already supports many image formats, so it is not necessary to use convert format.

Run tesseract 1.png 1.out-l Eng. The result is nothing more than a mess .. To improve the recognition accuracy, you can cut a graph for each row and process it row by row. You can even use PS to increase the contrast and clarity of the numbers in the image, and then identify it with tesseract.

This is a lot better, but I found that there are many letters in the results. We only need numbers, and there is no need to include letters in it. Besides, it is still wrong .. You can add a configuration item digits to Tesseract so that it can only recognize numbers and execute tesseract 1.png 1.out-l Eng digits. The result contains 94... Narrow down the scope so that Tesseract can only recognize 01. However, this configuration item is not available in the default file, so we can manually write one.

In/usr/local/share/tessdata/configs, sudo CP digits binarys, and then modify binarys so that it can only identify 01. binarys. The final content is as follows: tessedit_char_whitelist 01

Now run tesseract 1.png 1.out-l Eng binarys. The results are much better. However, the difference is that this result is always untrusted. You have to check whether tesseract is correct one by one.

To sum up, OCR recognizes meaningful texts and does not require high accuracy. However, the requirement for binary code recognition is quite high. If there is no error, proceed to the next step =!

The next step is to convert the binary code of the recognized text into something meaningful. In (top), we can see that the author counts the number of rows in each row, the number of rows in total, and whether the total number is a multiple of 8, it is the most difficult step to figure out what these binary codes are, and you have to have enough imagination. The girl playing the umbrella said that writing binary files directly is the result of the next step. The red and dignified pictures show the one. The author finally said that a piece of binary code is repeated, in addition, there was a bit of noise in the last repeat, and the author directly deducted the analysis. No matter what else, I always felt that it was not perfect .. It is still the most difficult step.

The following code writes the binary code of the text into a binary file:CodeNote that this code is not common, because the total number of binary codes by default is a multiple of 8 (it is too bad to write CPP for a long time ).

# Include <fstream> # include <iostream> using namespace STD; int main () {ifstream infile; infile. open ("binary"); If (infile) {cout <"Open success! "<Endl; ofstream OUTFILE; OUTFILE. Open (" binaryout ", ofstream: Out); If (! OUTFILE) {cout <"Open Write File Failure" <Endl; return 0;} else {cout <"Open OUTFILE success! "<Endl;} Char in, out, TMP; int COUNT = 0; int totalinterator = 0; out = 0; while (true) {cout <" totalinterator: "<totalinterator <Endl; totalinterator ++; In = infile. get (); If (! Infile. good () {break;} If (in = 10) {continue;} TMP = (in-'0') <(7-count )); out = Out | TMP; count ++; If (COUNT = 8) {COUNT = 0; OUTFILE. put (out); OUTFILE. flush (); out = 0 ;}} OUTFILE. flush (); OUTFILE. close (); infile. close (); cout <"done" <Endl;} else {cout <"Open error! "<Endl;} return 0 ;}

In (I), the author repeatedly mentioned Big-Endian and little-Endian, but this concept is not used here =! Believe it or not, is it true that all of his final conclusions are big-Endian? Big-Endian, little-Endian is a data storage system that ends with a high address and ends with a low address, but the unit is byte. Therefore, it is also called high byte order and low byte order. An ASCII code is a byte, so it does not matter whether it is a high byte order or a low byte order. The author may mean whether a byte is stored or reversed. I think this is not necessary unless it is an encrypted password.

To analyze the attributes of a written binary file, you need to check the first several bytes of the file. The type is generally the first three bytes. I searched for some types on the Internet to show them.

SWF (SWF): "435753" or "465753" Jpeg (JPG): "ffd8ff" PNG (PNG): "89504e47" GIF (GIF): "47494638" tiff (TIF): "49347a00" Bitmap (BMP): "0000d" CAD (DWG): "41433130" XML (XML): "3c3f786d6c" HTML (HTML): "68746d6c3e ")); word/Excel (xls.or.doc): "d0cf11e0" Access (MDB): "5374616e64617213204a" ZIP Archive (ZIP): "504b0304" RAR Archive (RAR): "52617221" wave (WAV ): "57415645" AVI (AVI): "41564920" Real Media (RM): "2e524d46" MPEG (MPG): "000001ba" or "000001b3" QuickTime (mov ): "6d6f6f76" Windows Media (ASF): "3026b2758e66cf11"

Because Linux does not rely on the extension to identify file types, gnome recognizes this as GZ at a glance, or we will search for it and open it with the corresponding software.

Add the extension GZ to the binary file (my name is binaryout) and unzip the command line: gunzip binaryout.gz. Or keep the source package, gunzip-C binaryout.gz> binaryout

Without the extension, gunzip will not be decompressed by default unless the parameter-F is added.

Then the gorgeous class file came out ~ Cafe babe has wood ~

You can look at the binary file: Vim binaryout-B is viewed in binary mode, and then converted to hexadecimal in Vim. The viewing convenience is as follows: %! Xxd. There is an article about this.

It is normal that the class cannot be found directly, because Java requires that the class name of the file beProgramIf the main program class name is A, the compiled class file is named A. Class. We can run it in this form. In (I), the error thrown by Java directly prompts that the class name is I. I don't have this prompt, but I can't find the class. In fact, it's all in this step. Which of the following is the case? Directly decompile JD and check the results. No code is posted. The full code is written on the 62 th floor in the middle.

Then you can rename binaryout to I. class, or simply copy the code to your eclipse and run it. The URL result is displayed.

In fact, this Java code is also quite entertaining. How does this messy string become a URL?

The following code shows the index of each access.

 # Include <stdio. h> int main () {int arr [42]; int X, I; for (I = 0; I <42; I ++) {arr [I] = 0 ;}for (I = 0; I <50; I ++) {x = I * 6% 41; printf ("% d ", x); If (ARR [x] = 0) {arr [x] = 1;} else {printf ("collision! ");} Printf (" \ n ");} return 0 ;}

The output results show that the first 41 do not conflict at all. The URL Length of www. I .u-tokyo.ac.jp/fun/hikari-loveletteris 41! Therefore, as long as each character in the URL is changed to a new location, it is the chaotic character w-aurlwtcniewo./-T. kjhltiypioe. O/kvru. Fae.

The University of Tokyo is so attentive .. Intelligence, kimono, sister, poetry and music. This poster is big.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Those on the Tokyo University Admission poster (Part 2)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Those on the Tokyo University Admission poster (Part 2)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support