Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

Source: Internet
Author: User

Fan Monologue

Speaking of martial arts novels, have to mention the three great masters of Chinese martial arts- Jin Yong, Liang Yusheng, Cologne , from the last century seventy or eighty years xxx, a large number of martial arts classics appeared in the screen. Almost all of the three Master's writings have been read, and after learning Python and data analysis, they have found a lot of interesting things , and today use data analysis to explore martial arts fiction.

Points:

-Who is the protagonist (Jin Yong)

-Use Word habits (Liang Yusheng)

One: who is the protagonist of Jin Yong's novels

Tianlong Eight is a multi-protagonist novel, Captain Sao Feng, Xu Zhu, Duan Yu three brothers each have an opportunity, once also because who is the first leading issue caused a controversy.

Now we know how to use data thinking to think about the problem and see how to analyze the Chinese novel. The exit rate is an important index to evaluate a novel character, we first to the "Tianlong eight" in the appearance of the characters to carry out a statistical analysis of it.

1. Participle

Chinese word segmentation is the basis of Chinese processing, but because of the broad and profound, Chinese word segmentation is more difficult than English to a large section, fortunately Python has a lot of Chinese word segmentation of the library, Jieba is one of the popularity ofa relatively high , below we come to experience its magical use.

Because the file is too large, only a fixed-length string is read at a time


    • Jieba is very simple to use , just a few lines of code to complete the word breaker work (), but ... Look carefully and find out what's wrong.

    • "Duan Yu" as a name is not separate , but with some other verbs linked together, there are also some character names were split into two or more words, such as "Fairy sister" was divided into "fairy" and "sister" two words .

    • However, it is no wonder that Chinese flexibility is too strong , a word often has a multi-layered meaning and a variety of usages, it appears that the direct use of jieba participle or there will be no small error, we have to find ways to solve this problem, otherwise it will cause interference to the analysis results.


2. Optimization

The Jieba itself has a thesaurus, and features powerful new word recognition , but we have to load a custom thesaurus in order to get a higher rate of accuracy.

Because the purpose of this is to analyze the names of people, we first from the Internet to find an enthusiastic netizen finishing "Tianlong eight" character Daquan, and then refer to the "Jieba/dict.txt" format (a word occupies a line, each line is divided into three parts: words, Word frequency (can be omitted), part of speech (can be omitted), separated by a space, The order can not be reversed) made into a dictionary file my_dict.txt.


Use the Jieba.load_userdict () function to load into the program. If you find other problems while the program is running, you can also use functions such as Add_word (), Suggest_freq () to dynamically adjust the dictionary and Word frequency.

    • Import custom dictionaries with load_userdict, le_name as file class objects or as paths to custom dictionaries

    • Add_word () can dynamically add dictionaries in the program

    • Del_word () The dictionary can be deleted dynamically in the program

    • Suggest_freq () adjusts the word frequency of individual words so that they can (or cannot) be divided

Now we have successfully solved the problem of the accidental division of the name of the person, and for "Duan Yu Listen", "Duan Yu see" such as the name but not completely divided words, we can use the Suggest_freq function to adjust the word frequency to force segmentation

But I felt that after all this trouble, I decided to use regular expressions to take out the names we needed, and to filter out the information we don't need (words that don't contain people's names).

3. Cartography

After two steps to get a copy of the name "Tianlong Eight", we can easily read and convert it to the list format.

Now there is a feeling of "I am Daozu, it is fish" . After simple data processing, we get the frequency of each character's name appearing in the novel, because Captain Sao Feng and Qiao are the same person, in order to facilitate the statistics will be two names of the merger.

Then take the Top 30 character data from the exit rate and show it in the form of a chart.


Although the image of Captain Sao Feng in many people is more suitable for the main character of the book, from the analysis results, the name of "Duan Yu" in the novel is the highest frequency of appearances.

captain Sao Feng (Qiao) followed, no wonder someone started arguing about who was the first character. But the importance of the characters can not only see the number of plays, such as Murong Bo, Shan, Xuan Tzu, Zhengchun these people's appearance rate, although not particularly high.

But it is the fate of the last generation of these people , caused the captain Sao Feng of the three brothers of the story of the Hu-Yu, is actually another dark line of the protagonist.

about the main character of the question we will not discuss , but also to see some other characters of the frequency of appearances, it is quite interesting to ponder.

In fact, "Tianlong eight" the central idea is "to beg":

    • "Duan Yu didn't want to learn Kung fu but became a supernatural avatar.

    • The pursuit of Wang the ultimate beauty to the Murong Fu not abandon

    • Captain Sao Feng determined to defend the big song did not think that they are the people

    • decided to go with the horse, but the nature of the people, the beloved died in their own hands

The world is often difficult to control, so we usually read, learn the process also need not too impatient air, quick success, or to calm, practical training, even if the goal can not be fully realized also may encounter "Heart grow flowers flower not open, do not want to insert Liu Liu Shady" situation.

Two: the habit of using words in fiction

Everyone's writing will have their own style, and writing style to a large extent can be used in the use of words to reflect, just now we analyzed the Golden old works, this time we look at the new faction is known as the novel Jingshan Liang Yusheng two ministry representative works-"Thou wilt" and "Sea of clouds jade bow margin."

1. Take the word

Since it is the Chinese language analysis that the first step of course is a participle , considering the "say", "Go", "Start", "ordinary" and other simple words use frequency is often very high and difficult to reflect the writing style.

Here we only extract words with a length of not less than 4 idioms, sayings and phrases for analysis. At the same time, taking into account that some names (for example: Waner, Dan Ming) and other proper nouns will cause interference to the analysis results, in the word Word can be filtered together, and eventually get such a Word file:


2. Cloud

The analysis of the use of the word habits is more inclined to qualitative analyses, we use the word cloud to display, first of all to draw "thou wilt" the word cloud image.


We can see in this novel "A Smile", "laughter", "surprise", "serious" and other words use very high frequency, and then see another work "Female Emperor Chiying biography", the word cloud is as follows:


It is easy to find that the most frequently used words in this novel are "slight smile", "laughter", "surprise" and "serious" , and continue to look at other more frequently used words to find similar patterns.

It seems that Mr. Liang's writing in the use of the word habit is still traceable, interested friends can also take other masters of literary works or network novels to carry out analysis, may find very interesting laws.

Chinese participle is actually a very interesting thing, especially the exploration of these martial arts novels! I am also a fan of martial arts, super like Jin Yong's novels. Snow Sky Shoot White Deer laugh book The Spirit of the Blue Yuan , each book is very classic. In addition to the above two dimensions, we also have a character analysis of the relationship of the article, is being created, we look forward to.


If you have any suggestions for this article, please contact the message! For the Chinese word segmentation and language analysis, the author is also a beginner Chadao, for the text involved in the technical understanding and novel insights, if there are inappropriate places to welcome criticism.

Writing articles is not easy, I will adhere to the update (although it is reproduced), I hope that we have a lot of attention to praise, if there are any ideas, or what type of what kind of articles I can write in the article below, I will do my best to meet the requirements of everyone else, if you want to learn Python, I also recommend a q-un,719-139-688 for everyone.

Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.