Recent research on how SEO and Python combine, refer to some information on the Web, write this program.
Objective: To analyze some of the most concerned words of a certain industry (for example, cylinder template), to automatically adjust TDK according to demand, and to plan the content page of the column.
How to use:
1, download install CYGWIN:HTTP://WWW.CYGWIN.COM/2, Cygwin installation don't forget to install Curl,wget,iconv,lynx,dos2unix,python and other common tools, especially Python, This is the main use of it. 3, to download Jieba Chinese sub-phrase: First choice: Https://github.com/fxsjy/jieba/archive/master.zip standby: HTTPS://PYPI.PYTHON.ORG/PYPI/JIEBA/4, Install Jieba Chinese: Automatic installation: Easy_install Jieba or pip install JIEBA/PIP3 install Jieba semi-automatic installation: first download https://pypi.python.org/pypi/j ieba/, unzip and run Python setup.py install manual installation: Place the Jieba directory in the current directory or site-packages directory by import Jieba to reference 5, copy the following code, save as "jiebacmd.py" 6, create a new folder, you need to copy the text and jiebacmd.py, remember that the text needs to be saved as Utf-8 encoding, and then in Cygwin with the CD command to switch the working directory into the new folder, and then enter the following command: Cat Abc.txt|python jiebacmd.py|sort|uniq-c|sort-nr|head-100
Code:
#encoding =utf-8#usage Example (find top words in Abc.txt): #用途: Find the top 100 most frequent words in the Abc.txt file # Copy the following command into the Cygwin to run, Abc.txt is the file name of your text, head-100 can change to how many words you want to extract #cat Abc.txt | Python jiebacmd.py | Sort | uniq-c | Sort-nr-k1 | head-100# above are comments that do not affect the program running from __future__ import Unicode_literalsimport syssys.path.append (". /") Reload (SYS) sys.setdefaultencoding (" Utf-8 ") import jiebadefault_encoding= ' Utf-8 ' If Len (sys.argv) >1: default_encoding = Sys.argv[1]while True: Line = Sys.stdin.readline () if line== "": Break line = Line.strip () for word in Jieba.cut (line): print (word)
SEO combined with Python big data to text participle and extract high-frequency words