Word Cloud Wordcloud class introduction &python production Word Cloud & word cloud garbled problem and other small pits

Source: Internet
Author: User



Word cloud, we must have seen, big data era we often see, we come today with Python third-party library Wordcloud, to create a big data word cloud, and will fall into the process encountered in the various pits,



For example, here is my friend from my own signature, produced by the word cloud: it seems to use the long or "square to always" ah









First we need a few libraries, Pip finished import



1 import chardet #Class for detecting character types
2 from wordcloud import WordCloud # 词 云 库
3 import matplotlib.pyplot as plt #Math plotting gallery 





Our example is 2 steps, the first step: read a paragraph from the file, the second step to make the word cloud and display



See code: Read a file from the desktop



 
 
1 with open("C:\\Users\\fyc\\Desktop\\virgo.txt", "r") as f:
2     text = f.read()
3 type = chardet.detect(text)
4 text1 = text.decode(type["encoding"])


In this to do a coding work, the Generate function of the word cloud should accept the object of a Unicode class, the other object will cause an exception, after a layer of follow-up, finally found in the wordcloud.py file this line of code:


 
 
1 stopwords = set(map(str.lower, self.stopwords))
2 
3 flags = (re.UNICODE if sys.version < ‘3‘ and type(text) is unicode
4                  else 0)
5 regexp = self.regexp if self.regexp is not None else r"\w[\w‘]+"
6 
7 words = re.findall(regexp, text, flags)


The problem is in the regular expression: if it is not a Unicode type, the incoming text is computed by Re.findall, nothing is matched, words is an empty list, and then an exception is thrown



So be sure to transcode the "Unicode" type before generate ().



Step two: Generate the word cloud and show:


1 wc1 = WordCloud (
  2 background_color = "white",
  3 width = 1000,
  4 height = 860,
  5 font_path = "C: \\ Windows \\ Fonts \\ STFANGSO.ttf", # If this sentence is not displayed, the glyph will be garbled
  6 margin = 2)
  7 wc2 = wc1.generate (text1) #We observe that generate () accepts a Unicode object, so we have to process the text into a unicode type before
  8 
  9 plt.imshow (wc2)
10 plt.axis ("off")
11 plt.show () 





Wordcloud constructs a word cloud object, and then generate () method to pass in the text "text" according to the frequency of the word appears to arrange the size of the word, where text, I was looking for an introduction about Aries, the text is as follows:


Bold and outspoken Aries girls, rich and powerful imagination, enthusiastic and courageous, female man full of flavor. It is your greatest trait to move forward bravely. So even in the face of difficult setbacks, white sheep women dare to meet the challenge. It can be said that this is a very fighting spirit of the new era of women.   Such a strong personality of the white Sheep girl, in the eyes of the opposite sex is always less a little feminine flavor of the exclusive gentleness, often is a brother's share. If you look forward to a Lin Daiyu-like girl to meet your manhood, met her, you really do not have, sensible words, quickly find a window to run! The Aries woman under Mars is usually positive and strong.  Adjectives such as bird and delicate are difficult to add to her. Aries women should be among the 12 most independent women in the constellation. She is definitely not the kind of person who keeps at home all day, waiting for you to pick her up, send her, completely lacking the ability to act independently. For most Aries girls, she would rather believe that she might be much more efficient without you being around. Listen to me, you might think a woman like that doesn't need a man! That would be wrong! Self-confident and proud Aries women do have a strong ability to survive independently. But they are deeply longing for the Prince Charming in her dream to appear quickly! It's hard to believe! She looks so sharp, in fact, is full of fairy-tale dreams.  For all Aries women, the greatest contradiction in their hearts is the desire to conquer each other, and look forward to being conquered by each other in the delicate heart. You may be a little worried now, and you don't know how to play your part, do you? Don't panic!  First put your "sincere" ready, after the method is good to discuss, although a little bit hard, but the guarantee is worth it. First of all, you must recognize that the Aries woman is basically "heroic". She would be fascinated by a man who admired her. She wants to marry a husband that she prides herself on. She might be more appreciative of a successful man, but that doesn't mean she's a money-worship. Wealthy playboy will not let her heart, filled with the ideal passionate young people will be favored by her. Therefore, if you fall in love with an Aries woman, please do not have to start a passionate pursuit. A man like a pug can make her hate and be afraid, and when she's afraid to give you a polite smile, you're going to leave her behind.  You'd better let her know your talent, your charm, arouse her curiosity (or should say to arouse her to conquer your interest), wait for you to feel that she is really fond of you, after you sincerely to her to express love, then the future will be promising! You have to look like a "big man" (I said she's heroic), but you can't dictate to her. You must sincerely care for her, but never too indulgent to her.  I think you should use a "hero cherish hero" attitude to your Aries woman, is more appropriate. The vast majority of Aries women are very competitive, they insist on the other side of the mind to maintain the most important position. And of course she will.Put you in the most important position in her heart. And she is very loyal and very generous. She is willing to share her everything with you. Of course, she will also think you should share everything with her, including your secret. The Aries woman who deceives you is as serious as the sin of bullying. You must remember that she is willing to listen to your heart-breaking confession, and not to accept the beautiful lies.  You'd better not praise her girl in front of her, especially that kind of heartfelt compliment, which is likely to cause the Thunder fire. Because of their positive and strong personality, many Aries women give the impression that they are sharp and trouble-loving. On the surface, they do not allow others (especially men) to take advantage of it. Many people will think that Aries women are always the nose of the teeth of the reason. Because of this, they often eat some dark loss, setbacks, they are always more than other girls live hard. In fact, you should understand that their hearts are mostly upright, kind, and fragile.  As long as you sincerely care for her, when she was wronged, give her a warm embrace, she will become your lifelong faithful partner. Aries women can almost become excellent professional women, but also can be a competent housewife. In fact, it would be helpful for your marriage to have her own career. When she tried to play her competitive and conquering desires at work, the chances of returning home to be a sheep were quite large. If you want an energetic Aries woman, put all your mind on the "You", I'm afraid you will be a little unbearable.  As for the family, you can rest assured that although she daily necessities such trifles, not so interested, but the strong, she will not let herself become a failed housewife. One more thing you should be thankful for is that I seldom see a sloppy Aries wife, most of whom are still very bright after marriage and do not want to be ridiculed by others for marrying a yellow-faced woman. Even if occasionally lazy, as long as the husband a little reminder, they will immediately alert. I have a little fat after the birth of the Aries girlfriend, because the husband said a "before I most admire your pair of slender legs." "She's been a 20-pound loser for two months. With her strong perseverance, can you not believe that she will go all out to be a good Wife?





The introduction of the construction method needs to explain several points: first, with the keyword parameters, do not need to remember the parameter location, technical parameters of the key line.



For the meaning of the parameters, see the quick documentation in Pycharm, which is described below:


 
class WordCloud(object) def __init__(self, font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=.9, mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None, background_color=‘black‘, max_font_size=None, font_step=1, mode="RGB", relative_scaling=.5, regexp=None, collocations=True, colormap=None, normalize_plurals=True)  Documentation is missing. The following is copied from class WordCloud.  
Word cloud object for generating and drawing.
  
font_path:
(string) Font path to the font that will be used (OTF or TTF). Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don‘t have this font, you need to adjust this path. width:
(int (default=400)) Width of the canvas.
height:
(int (default=200)) Height of the canvas.
prefer_horizontal:
(float (default=0.90)) The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn‘t fit. (There is currently no built-in way to get only vertical words.) mask:
(nd-array or None (default=None)) If not None, gives a binary mask on where to draw words. If mask is not None, width and height will be ignored and the shape of mask will be used instead. All white (#FF or #FFFFFF) entries will be considerd "masked out" while other entries will be free to draw on. [This changed in the most recent version!] scale:
(float (default=1)) Scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.
min_font_size:
(int (default=4)) Smallest font size to use. Will stop when there is no more room in this size.
font_step:
(int (default=1)) Step size for the font. font_step > 1 might speed up computation but give a worse fit.
max_words:
(number (default=200)) The maximum number of words.
stopwords:
(set of strings or None) The words that will be eliminated. If None, the build-in STOPWORDS list will be used.
background_color:
(color value (default="black")) Background color for the word cloud image.
max_font_size:
(int or None (default=None)) Maximum font size for the largest word. If None, height of the image is used.
mode:
(string (default="RGB")) Transparent background will be generated when mode is "RGBA" and background_color is None.
relative_scaling:
(float (default=.5)) Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good. 
color_func:
(callable, default=None) Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word. Overwrites "colormap". See colormap for specifying a matplotlib colormap instead.
regexp:
(string or None (optional)) Regular expression to split the input text into tokens in process_text. If None is specified, r"\w[\w‘]+" is used.
collocations:
(bool, default=True) Whether to include collocations (bigrams) of two words. 
colormap:
(string or matplotlib colormap, default="viridis") Matplotlib colormap to randomly draw colors from for each word. Ignored if "color_func" is specified. 
normalize_plurals:
(bool, default=True) Whether to remove trailing ‘s‘ from words. If True and a word appears with and without a trailing ‘s‘, the one with trailing ‘s‘ is removed and its counts are added to the version without trailing ‘s‘ – unless the word ends with ‘ss‘.
Notes
Larger canvases with make the code significantly slower. If you need a large word cloud, try a lower canvas size, and set the scale parameter.
The algorithm might give more weight to the ranking of the words than their actual frequencies, depending on the max_font_size and the scaling heuristic.





We use Baidu translation should be able to see clearly, here are a few more key parameters:



Font_path: This is the path to display text in the word cloud, especially in the display of Chinese, this parameter is particularly important, if the default is easy to cause garbled, as follows:









Width,height as the name implies, the width of the canvas.






Prefer_horizontal: Word cloud font Priority horizontal Placement

Mask: This is the shape of the background, the default is the shape of the canvas.

A few other parameters will not be said.





The third step is to use the Matplotlib library to display the word cloud, this section of code is more fixed, no change, rote



1 plt.imshow (WC2) 2 Plt.axis ("off")3 plt.show ()


Where axis is the display coordinates, here we choose the unrealistic coordinates. The overall effect is as follows:

Word Cloud Wordcloud class introduction &python production Word Cloud & word cloud garbled problem and other small pits


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.