Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

Source: Internet
Author: User

Fan monologue

Speaking of martial arts novels, we have to mention the three great masters of Chinese martial arts-Jin Yong, Liang Yusheng, Cologne, from the 780 's, a large number of martial arts classics appeared in the screen. Almost all of the three Master's writings have been read, and after learning Python and data analysis, they have found a lot of interesting things, and today use data analysis to explore martial arts fiction.

Points:

    • Who is the protagonist (Jin Yong)

    • Habit of using words (Liang Yusheng students)
      A: Who is the protagonist of Jin Yong's novels

Tianlong Eight is a multi-protagonist novel, Captain Sao Feng, Xu Zhu, Duan Yu three brothers each have an opportunity, once also because who is the first leading issue caused a controversy.

Now we know how to use data thinking to think about the problem and see how to analyze the Chinese novel. The exit rate is an important index to evaluate a novel character, we first to the "Tianlong eight" in the appearance of the characters to carry out a statistical analysis of it.

1. Participle

Chinese word segmentation is the basis of Chinese processing, but because of the broad and profound, Chinese word segmentation is more difficult than English to a large section, fortunately Python has a lot of Chinese word segmentation of the library, Jieba is one of the popularity of a relatively high, below we come to experience its magical use.

Because the file is too large, only a fixed-length string is read at a time
Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

jieba用起来非常简单,短短几行代码就完成了分词工作(),可是...仔细一看发现哪里不对了“段誉”作为一个姓名没有被单独分出来,而是和其他一些动词连在一起,另外也有一些角色名字被拆分成了两个甚至更多的单词,例如“神仙姊姊”被分成了“神仙”和“姊姊”两个词。不过这也难怪,中文的灵活性太强,一个词往往有多层含义和多种用法,看来直接使用jieba分词还是会有不小的误差,我们得想办法来解决这个问题,不然会对分析结果造成干扰。

Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

2. Optimization

The Jieba itself has a thesaurus, and features powerful new word recognition, but we have to load a custom thesaurus in order to get a higher rate of accuracy.

Because the purpose of this is to analyze the names of people, we first from the Internet to find an enthusiastic netizen finishing "Tianlong eight" character Daquan, and then refer to the "Jieba/dict.txt" format (a word occupies a line, each line is divided into three parts: words, Word frequency (can be omitted), part of speech (can be omitted), separated by a space, The order can not be reversed) made into a dictionary file my_dict.txt.
Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

Use the Jieba.load_userdict () function to load into the program. If you find other problems while the program is running, you can also use functions such as Add_word (), Suggest_freq () to dynamically adjust the dictionary and Word frequency.

用load_userdict导入自定义字典,?le_name 为文件类对象或自定义词典的路径add_word()可在程序中动态增加词典del_word()可在程序中动态删减词典suggest_freq()可调节单个词语的词频,使其能(或不能)被分出来

Now we have successfully solved the problem of the accidental division of the name of the person, and for "Duan Yu Listen", "Duan Yu see" such as the name but not completely divided words, we can use the Suggest_freq function to adjust the word frequency to force segmentation

But I felt that after all this trouble, I decided to use regular expressions to take out the names we needed, and to filter out the information we don't need (words that don't contain people's names).

3. Cartography

After two steps to get a copy of the name "Tianlong Eight", we can easily read and convert it to the list format.

Now there is a feeling of "I am Daozu, it is fish". After simple data processing, we get the frequency of each character's name appearing in the novel, because Captain Sao Feng and Qiao are the same person, in order to facilitate the statistics will be two names of the merger.

Then take the top 30 character data from the exit rate and show it in the form of a chart.
Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

Although the image of Captain Sao Feng in many people is more suitable for the main character of the book, from the analysis results, the name of "Duan Yu" in the novel is the highest frequency of appearances.

Captain Sao Feng (Qiao) followed, no wonder someone started arguing about who was the first character. But the importance of the characters can not only see the number of plays, such as Murong Bo, Shan, Xuan Tzu, Zhengchun these people's appearance rate, although not particularly high.

But it is the fate of the last generation of these people, caused the captain Sao Feng of the three brothers of the story of the Hu-Yu, is actually another dark line of the protagonist.

About the main character of the question we will not discuss, but also to see some other characters of the frequency of appearances, it is quite interesting to ponder.

In fact, "Tianlong eight" the central idea is "to beg":

段誉不想学武功却练成了绝世神通一心追求王语嫣最终美人对慕容复不离不弃萧峰立志保卫大宋没想到自己居然是契丹人决定与阿朱塞外牧马,然而造化弄人,心爱的人却死在自己手上

The world is often difficult to control, so we usually read, learn the process also need not too impatient air, quick success, or to calm, practical training, even if the goal can not be fully realized also may encounter "Heart grow flowers flower not open, do not want to insert Liu Liu Shady" situation.
Two: The habit of using words in fiction

Everyone's writing will have their own style, and writing style to a large extent can be used in the use of words to reflect, just now we analyzed the golden old works, this time we look at the new faction is known as the novel Jingshan Liang Yusheng two ministry representative works-"Thou wilt" and "Sea of clouds jade bow margin."

1. Take the word

Since it is the Chinese language analysis that the first step of course is a participle, considering the "say", "Go", "Start", "ordinary" and other simple words use frequency is often very high and difficult to reflect the writing style.

Here we only extract words with a length of not less than 4 idioms, sayings and phrases for analysis. At the same time, taking into account that some names (for example: Waner, Dan Ming) and other proper nouns will cause interference to the analysis results, in the word Word can be filtered together, and eventually get such a Word file:
Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

2. Cloud

The analysis of the use of the word habits is more inclined to qualitative analyses, we use the word cloud to display, first of all to draw "thou wilt" the word cloud image.
Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

We can see in this novel "A Smile", "laughter", "surprise", "serious" and other words use very high frequency, and then see another work "Female Emperor Chiying biography", the word cloud is as follows:
Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

It is easy to find that the most frequently used words in this novel are "Slight Smile", "laughter", "surprise" and "serious", and continue to look at other more frequently used words to find similar patterns.

It seems that Mr. Liang's writing in the use of the word habit is still traceable, interested friends can also take other masters of literary works or network novels to carry out analysis, may find very interesting laws.

Chinese participle is actually a very interesting thing, especially the exploration of these martial arts novels! I am also a fan of martial arts, super like Jin Yong's novels. Snow Sky Shoot White Deer laugh book The Spirit of the Blue yuan, each book is very classic. In addition to the above two dimensions, we also have a character analysis of the relationship of the article, is being created, we look forward to.

If you have any suggestions for this article, please contact the message! For the Chinese word segmentation and language analysis, the author is also a beginner Chadao, for the text involved in the technical understanding and novel insights, if there are inappropriate places to welcome criticism.
Writing articles is not easy, I will adhere to the update (although it is reproduced), I hope that we have a lot of attention to praise, if there are any ideas, or what type of what kind of articles I can write in the article below, I will do my best to meet the requirements of everyone else, if you want to learn Python, I also recommend a q-un,719-139-688 for everyone.

Based on the Python analysis of the protagonist in Jin Yong's novels, he is the real protagonist!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.