Python Word Cloud implementation __python

Source: Internet
Author: User
Tags find font

A cool feature of Python is that it's easy to implement the word cloud. The
GitHub has open source code on this project:
Https://github.com/amueller/word_cloud
Note that the Wordcloud folder is deleted when you run the routine
The function of the word cloud is partly based on NLP, partly based on the image,
as an example of the above code in a github wordcloud

 from OS import path from PIL import Image import numpy as NP import Matplotlib.pyplot as PLT fr
Om wordcloud import wordcloud, stopwords d = Path.dirname (__file__) # Read the whole text. Text = open (Path.join (d, ' alice.txt ')). read () # Read The Mask image # taken from # Http://www.stencilry.org/stencils/movi Es/alice%20in%20wonderland/255fk.jpg alice_mask = Np.array (Image.open (Path.join (D, "alice_mask.png")) Stopwords =
               Set (Stopwords) Stopwords.add ("said") WC = Wordcloud (background_color= "White", max_words=2000, Mask=alice_mask,

stopwords=stopwords) # Generate Word Cloud Wc.generate (text) # Store to File Wc.to_file (Path.join (D, "alice.png")) # show Plt.imshow (WC, interpolation= ' bilinear ') plt.axis ("Off") Plt.figure () plt.imshow (Alice_mask, Cmap=plt.cm.gray, interpolation= ' bilinear ') plt.axis ("Off") plt.show () 

Original:

Results:

Picture of Alice and Rabbit
Where:
Text Open Document
Alice_mask is loading drawing as an array
Stopwords settings stop displaying words
Wordcloud set the properties of the word cloud
Generate generate word cloud
To_file store picture
Enter wordcloud.py You can see the related properties of the Wordcloud class:

 "" "Word cloud object for generating and drawing.
        Parameters----------font_path:string font path to the font that'll be used (OTF or TTF). Defaults to Droidsansmono path on a Linux machine.

    If you are are on another OS or don ' t have this font, you are need to adjust this path.

    Width:int (default=400) Width of the canvas.

    Height:int (default=200) Height of the canvas.
        Prefer_horizontal:float (default=0.90) The ratio of times to try horizontal fitting as opposed to vertical. If Prefer_horizontal < 1, the algorithm'll try rotating the word if it doesn ' t fit.

    (There is currently no built-in way to get only vertical words.) Mask:nd-array or None (Default=none) If not none, gives a binary mask in where to draw words. If Mask is not None, width and height would be ignored and the shape of mask'll be used instead. All white (#FF or #FFFFFF) entries'll bE considerd "Masked out" while the other entries is free to draw on.

    [This changed in the most recent version!] Scale:float (default=1) scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas the size is significantly faster, but might

    A coarser fit for the words. Min_font_size:int (default=4) smallest font size to use.

    Would stop when there be no more room at this size. Font_step:int (default=1) step size for the font.

    Font_step > 1 might speed up computation but give a worse fit.

    Max_words:number (default=200) The maximum number of words. Stopwords:set of strings or None the words that would be eliminated.

    If None, the build-in stopwords list would be used.

    Background_color:color value (default= "BLACK") background color for the word cloud image. Max_font_size:int or None (defaulT=none) Maximum font size for the largest word.

    If None, the height of the image is used. Mode:string (default= "RGB") transparent background would be generated the When mode is "RGBA" and Background_

    The color is None.  Relative_scaling:float (default=.5) Importance of relative word frequencies for font-size.  With relative_scaling=0, only word-ranks are considered.  With Relative_scaling=1, a word this is twice as frequent to have twice the size. 

        If you are want to consider the word frequencies and don't only their rank, relative_scaling around. 5 often looks good. ..

    versionchanged:2.0 Default is now 0.5. Color_func:callable, Default=none callable with parameters Word, font_size, position, orientation, font_
        Path, random_state that returns a PIL the color for each word.
        Overwrites "ColorMap".

    ColorMap for specifying a matplotlib colormap instead. Regexp : String or None (optional) Regular expression to split the input text into tokens in Process_text.

    If None is specified, ' R ' \w[\w ']+ ' is used.

        Collocations:bool, Default=true Whether to include collocations (bigrams) of two words. ..  versionadded:2.0 colormap:string or Matplotlib colormap, default= "viridis" matplotlib colormap to randomly
        Draw colors from to each word.

        ignored if "Color_func" is specified. .. versionadded:2.0 Normalize_plurals:bool, default=true Whether to remove trailing ' s ' from words. If True and a word appears with and without a trailing ' s ", the one with trailing ' s ' are removed and its C

    Ounts are added to the version without trailing ' s '--unless the word ends with ' ss '.

        Attributes----------' Words_ ': Dict of string to float Word tokens with associated frequency. .. versionchanged:2.0 ' Words_ ' is now a dictionary ' layout_ ': List of tuples (string, int, (int, int), int, color) encodes the fitted word c Loud.

    Encodes for each word the string, font size, position, orientation and color. Notes-----Larger canvases with make the code significantly slower.

    If you need a large word cloud, try a lower canvas size, and set the scale parameter. The algorithm might give weight to the ranking of the words than their actual frequencies, depending on the ' max
    _font_size ' and the scaling heuristic. """

Where:
Font_path indicates that the path to the font
width and height represents the width and height of the canvas
Prefer_horizontal can adjust the font level and vertical number in the word Cloud
Mask is the mask, Fields that produce a word cloud background
Scale: scaling between calculations and drawings
min_font_size setting the minimum font size
max_words setting the font
stopwords setting disabled words
Background_ Color set the background colors of the word cloud
max_font_size set Font size
mode sets the color of the font but the background is transparent when set to RGBA
Relative_scaling the importance of setting relative word frequency for font size
RegExp sets the regular expression
Collocations contains two words
to debug in the Generate function you can see the function:
Words=process_text (text) can return the word frequency in the text
Generate_from_frequencies creates a word cloud based on words and word frequency
Below is the implementation step of the Generate_from_frequencies function

    def generate_from_frequencies (self, frequencies, Max_font_size=none): "" "Create a word_cloud from words and FR

        Equencies. Parameters----------frequencies:dict from string to float A contains words and associated

        Frequency.
        Max_font_size:int Use this font-size instead of Self.max_font_size Returns------- Self "" "# Make sure frequencies are sorted and normalized frequencies = sorted (Frequencies.items  (), key=item1, reverse=true) If Len (frequencies) <= 0:raise valueerror ("We need at least 1 word to Plot a word cloud, "got%d."% len (frequencies)) frequencies = FREQUENCIES[:SELF.M Ax_words] # Largest entry'll be 1 max_frequency = float (frequencies[0][1)) frequencies = [(wor D, freq/max_frequency) for Word, freq in frequencies] if Self.random_staTe is not none:random_state = self.random_state else:random_state = random () if
            Self.mask is not none:mask = self.mask width = mask.shape[1] height = mask.shape[0]
                              if mask.dtype.kind = = ' F ': Warnings.warn ("Mask image should be unsigned byte between 0" "and 255. Got a float array ") if Mask.ndim = 2:boolean_mask = Mask = = 255 Elif Mask.ndim = = 3: # If all channels are white, mask out boolean_mask = Np.all (mask[:,:,: 3] = 255, ax
                                 Is=-1) else:raise valueerror ("Got Mask of Invalid shape:%s"
        % str (mask.shape)) Else:boolean_mask = None height, width = self.height, self.width Occupancy = Integraloccupancymap (height, width, boolean_mask) # Create Image Img_grey = ImAge.new ("L", (width, height)) draw = Imagedraw.draw (Img_grey) Img_array = Np.asarray (Img_grey) fon

        T_sizes, positions, orientations, colors = [], [], [], [] last_freq = 1. If Max_font_size is None: # If not provided use default font_size max_font_size = Self.max_font_si Ze if max_font_size is None: # figure out a good the font size by trying to draw with # just The two words if len (frequencies) = = 1: # We only have one word.
                We Make it big!
                                               Font_size = Self.height else:self.generate_from_frequencies (dict (frequencies[:2)), max_font_size=self.height) # Find font sizes sizes = [x[1] for x in self.layout_] font_size = Int (2 * sizes[0] * sizes[1]/(Sizes[0] + sizes[1)) Else

     : Font_size = max_font_size   # we set Self.words_ here because we called generate_from_frequencies above ... hurray for good design?
            Self.words_ = Dict (frequencies) # Start drawing grey image for Word, freq in frequencies: # Select the font size RS = self.relative_scaling if rs!= 0:font_size = Int (roun D ((RS * (Freq/float (last_freq)) + (1-RS)) * font_size)) if random_s Tate.random () < self.prefer_horizontal:orientation = None Else:orientatio n = image.rotate_90 tried_other_orientation = False while True: # try to find a PO
                sition font = Imagefont.truetype (Self.font_path, font_size) # transpose font optionally
                Transposed_font = Imagefont.transposedfont (font, orientation=orientation) # Get size of ResultinG-Text box_size = Draw.textsize (Word, Font=transposed_font) # Find possible places using I
                                                   Ntegral Image:result = occupancy.sample_position (box_size[1) + Self.margin,
                Box_size[0] + self.margin, random_state) If, not, None or Font_size < self.min_font_size: # Either we found a place or font-
                Size went too small break # if we didn ' t find a place, make font smaller
                # but the rotate! If not tried_other_orientation and Self.prefer_horizontal < 1:orientation = (image.rotate_90 if or Ientation is None else image.rotate_90) tried_other_orientation = T Rue Else:font_size-= Self.font_step Orientation = None If font_size < self.min_font_size: # We were unable to draw no more Break x, y = Np.array (Result) + Self.margin//2 # Actually draw the text DRA W.text ((y, x), Word, fill= "white", Font=transposed_font) Positions.append ((x, y)) Orientations.app End (orientation) font_sizes.append (font_size) colors.append (Self.color_func (Word, font_size=font_s Ize, position= (x, y), Orientation=orie Ntation, Random_state=random_state, fo Nt_path=self.font_path)) # Recompute integral image if Self.mask is None:img_arra y = Np.asarray (img_grey) Else:img_array = Np.asarray (Img_grey) + Boolean_mask # r
  Ecompute bottom Right          # The order of the Cumsum's is important for speed?! Occupancy.update (Img_array, x, y) last_freq = freq self.layout_ = List (Zip (frequencies, font_sizes, p Ositions, orientations, colors)) return self

Unfortunately, the word cloud did not use the OpenCV library, if the use of OpenCV library should be able to do more cool

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.