> Library (TM)
> Library (TMCN)
> Library (Jiebar)
> Library (TAU)
> Library (RWORDSEG)
> Library (WORDCLOUD2)
> Oldman<-readlines ("Old Man and sea. txt", encoding = "UTF-8") #read the file in lines
Warning message:
In ReadLines ("Old man with sea. txt", encoding = "UTF-8"):
Incomplete final line found on ' Keys like Blake Juan Bond pounded. txt '
> #上面的提醒是因为在文章最末尾没有回车就结束了
> class (Oldman) #inspect The class of this file
[1] "character"
> Head (Oldman) #inspect the head information
[1] "The Old man with the sea/author: Ernest Hemingway"
[2] "Status: All" "
[3] "" Description: "
[4] "This novel is based on the real experience of a Cuban fisherman, and a camera-like realism recorded the whole process of fishing for the old man Santee Agrana, shaping a under pressure still
The image of an old man who remains graceful and spiritually invincible forever. This novel is the unprecedented record in the history of human publication: 5.3 million copies sold in 48 hours! In that year, the work
Won the Pulitzer Prize, and two years later won the Nobel Prize. 』"
[5] "World ebook txt edition read, download and share more ebook please visit: http://www.txdzs.com, Mobile Access: http://wap.txdzs.com,e-
Mail:[email protected] "
[6] "------Chapter content begins-------"
> Wk=worker () #create a worker
> oldman_seg=segment (OLDMAN,WK) #divide the text
> Head (OLDMAN_SEG)
[1] "The Old Man and the Sea" "Author" "Ernest" "Hemingway" "state" "all over"
> Length (oldman_seg) #get the length of the vector
[1] 25296
> Oldman_seg<-remove_stopwords (OLDMAN_SEG,STOPWORDSCN ()) #去除停用词
> Length (oldman_seg) #计算去除停用此以后的向量长度
[1] 15085
> Oldman_freq<-getwordfreq (oldman_seg) #生成词频矩阵
> oldman_freq<-oldman_freq[oldman_freq$freq>20,] #提取词频大于20的词
> Nrow (oldman_freq) #得到词频大于20的词的数目
[1] 91
> wordcloud2 (oldman_freq) #绘制词图
> oldman_freq[order (oldman_freq$freq,decreasing = TRUE),] #降序查看词频
Word Freq
1961 Old Person 216
2793 on 211
3607 Think 189
3112 say 186
4100 Fish 161
788 fishing 152
2636 go to 117
226 not 106
1282 Children 97
844 both 85
417 Eat 79
2209 No 75
2770 Shark 75
1994 Miles 74
4375 know 73
1385 Very 72
39,481 Article 66
1483 will 64
1330 Good 61
1273 or 60
1827 See 55
1096 Feel 54
592 Large 51
3645 Boat 49
3595 now 48
4,363 only 48
830 things 47
4416 in 46
723 Places 45
2854 Body 45
2680 People 44
4185 again 44
3834 maybe 43.
2930 hours 42
3987 already 39
2956 make 36
3556 under 36
3624 Elephant 36
3232 Sun 35
482 Boat 34
496 Bow 33
2890 Voices to 33
3116 say 33
4585 left Hand 33
2329 that article 32
494 Ship Tip 30
2531 up 30
3090 Water Surface 30
3781 Eyes 30
3085 in Water 29
4535 Walk 29
38,721 points 28
4303 This fish 28
1174 more 27
3076 Hands 27
39,561 under 27
4127 Fisherman 27
618 Big Fish 26
1650 feel 26
2627 26
3005 Lot 26
3451 Tail 26
3786 right now 26
4237 Long 26
309 is only 25.
1296 Sea level 25
3570 down 25
253 not 24
265 not 24
38,771 Set 24
39,321 up 24
4118 Fish 24
1300 Seawater 23
1685 Tuna 23
1814 See 23
4059 Manic 23
4099 right Hand 23
1100 Dry 22
1474 back to 22
2003 Strength 22
2410 Lane 22
3384 Drag 22
38,741 o ' er 22
39,711 only 22
4105 Bait 22
4352 is 22
530 at the moment 21
1008 Flying Fish 21
1618 Paddles 21
1889 Fast 21
3101 in Water 21
> #如果长度小于一个汉字的词不计算在内的话 to get the following results
> Oldman_freq1<-oldman_freq[nchar (Oldman_freq$word) >1,]
> Nrow (OLDMAN_FREQ1)
[1] 55
> Wordcloud2 (OLDMAN_FREQ1)
R language participle and drawing--with "The Old Man and the Sea" as the object