We start by loading our own text files and counting the top -ranked character frequencies
If __name__== "__main__":
corpus_root= '/home/zhf/word '
Wordlists=plaintextcorpusreader (Corpus_root, '. * ')
For W in Wordlists.words ():
Print (W)
Fdist=freqdist (Wordlists.words ())
Fdist.plot (20,cumulative=true)
The text reads as follows:
The RRC setup success rate dropped
Erab Setup Success rate dropped
Prach issue
Customer Feedback
The displayed picture is as follows, where Chinese characters display garbled characters.
The reason for this is that the plt in nltk is actually called the plot in the matplotlib . And in the ubuntun ,
Matplotlib reads the configuration from the configuration file MATPLOTLIBRC , and the font-related content is in it. matplotlib Find the configuration file in the following four locations in turn:
- The MATPLOTLIBRCunder the current working directory.
- $MATPLOTLIBRC/MATPLOTLIBRC.
- MATPLOTLIBRCUnder the user's home directory. such as Linux generally in ~/.config/matplotlib/matplotlibrc, macOS in ~/.MATPLOTLIB/MATPLOTLIBRC .
- INSTALL/MATPLOTLIB/MPL-DATA/MATPLOTLIBRC, where install refers to the specific installation directory.
Let's look at where the matplotlib configuration file is placed. There are two ways to query
Method One: You can query the configuration file and the currently used font mode. You can see that the font used is/dejavusans.ttf, this file is not in Chinese.
From Matplotlib.font_manager import Findfont, fontproperties
If __name__== "__main__":
Print (Matplotlib.get_configdir ())
Print (Findfont (Fontproperties (Family=fontproperties (). get_family ()))
/home/zhf/.config/matplotlib
/usr/local/lib/python3.6/dist-packages/matplotlib/mpl-data/fonts/ttf/dejavusans.ttf
Method Two: Can detect the python2.7 and python3.6 individual Paths
[Email protected]:/home/zhf/ Desktop # locate-b \mpl-data
/usr/local/lib/python2.7/dist-packages/matplotlib/mpl-data
/usr/local/lib/python3.6/dist-packages/matplotlib/mpl-data
Under mpl-data to enter the font/ttf folder can see all the font way, if you need to use the Chinese language simhei.ttf file . Can see no this file
Available through Fc-list | The grep simhei command checks to see if a simhei.ttf file is installed under the current system . If you have a word document installed, you should have it all. If not, you need to go online to download a
[Email protected]:/home/zhf# fc-list | grep Simhei
/usr/share/fonts/wps-office/simhei.ttf: blackbody , Simhei:style=regular,normal,oby?ejné,standard, Κανονικ? , Normaali,normál,normale,standaard,normalny, обычный , Normálne,navadno,arrunta
Copy this file to
Under the/usr/local/lib/python3.6/dist-packages/matplotlib/mpl-data/font/ttf folder
The Mpl-data directory has matplotlib configuration file matplotlibrc. Go to edit and set Simhei to the first and highest priority in Font.sans-serif .
Clearing the cache in matplotlib
[Email protected]:/home/zhf/ Desktop # cd ~/.cache/matplotlib
[Email protected]:~/.cache/matplotlib# RM ' *. * '
Run now to see normal Chinese display
If you just want to use the individual files in Chinese, you can do the following without modifying the MATPLOTLIBRC configuration file. You can also display Chinese in a single call
matplotlib.rcparams[' font.sans-serif '] = ' Simhei '
x=[1,2,3]
y=[4,5,6]
Plt.plot (x, y)
Plt.title (' test ')
Plt.xlabel (' x- axis ')
Plt.ylabel (' Y axis ')
Plt.grid (True)
Plt.show ()
PYTHON+NLTK Natural Language learning process three: How to display Chinese in a picture in Nltk/matplotlib