Python note (1): installation + crawler environment configuration + packaging as an EXE file,
1,Install
Https://www.python.org/downloads/windows/Download the installer from the official website.
Windows x86 32-bit Operating System
Windows x86-64 64-bit Operating System
Web-based installer network-based installer
Executable installer
Embeddable zip file python compressed package
You can download one of the three above. They are all the same. I downloaded 2nd
Help file help document
Remember to ADD... To path.
After the installation is successful, enter PYTHON in the command prompt, and the version number indicates that the installation is successful. If it is not an executable command, add the PYTHON installation path to the path of the system variable.
PYTHONImportant environment variables (not required, configured as needed ):
PYTHONPATH |
PYTHONPATH is the Python search path. By default, all the import modules are searched for in PYTHONPATH. |
PYTHONSTARTUP |
After Python is started, search for the PYTHONSTARTUP environment variable and execute the code in the file specified by this variable. |
PYTHONCASEOK |
Adding environment variables of PYTHONCASEOK will make the python import module case insensitive. |
PYTHONHOME |
Search Path for another module. It is usually embedded in the PYTHONSTARTUP or PYTHONPATH directory, making it easier to switch between two module libraries. |
2,Crawler environment Configuration
Note: The Pip module is automatically installed. You can enter PIP in the Command Prompt window to test whether the installation is successful. (If PIP is directly executed, run CD in the Scripts folder of the PYTHON installation directory or directly execute python-m pip)
After pip is installed, run the following command:
1Pip install beautifulsoup4
2,Https://pypi.python.org/pypi/lxml/4.1.1Download lxmlLibrary, remember to download the appropriate version of your operating system and PYTHON, for example, I download windows 64-bit + PYTHON3.6
3And then execute the command pip install lxml file name and path (for example, d: \ xx. whl)
4If the installation is successful, the system prompts Successfully installed
3,Package the script into an EXE file
(1) Environment Configuration
1,Run pip install pypiwin32
2,Run pip install pyinstaller
If the installation fails, download the installation package at http://www.pyinstaller.org/downloads.html.
PythonDecompressed file path \ setup. py install
(2) package the source file into an EXE file
Pyinstaller-F-wThe file name and path. A prompt is displayed for storing the file.
-F: Package as a separate Exe file-w: no command window is displayed
4. Simple crawler and a simple GUI
Import urllib
Import re
From urllib import request
Import tkinter as tk
Def getHtml (url ):
Page = urllib. request. urlopen (url)
Html = page. read ()
Return html
Def getImg (html ):
Imgre = re. compile (r'src = "(. +? \. Jpg )"')
Html = html. decode ('utf-8') # python3
Imglist = re. findall (imgre, html)
X = 0
For imgurl in imglist:
Urllib.request.urlretrieve(imgurl,'{s.jpg '% x)
X + = 1
Def start (url ):
Html = getHtml (url)
Print (getImg (html ))
X = tk. Tk ()
Label1 = tk. Label (x, text = "Enter the URL :")
Label1.grid (row = 0, column = 0)
Label2 = tk. Label (x, text = "Enter the file storage path :")
Label2.grid (row = 1, column = 0)
Var1 = tk. StringVar ()
Entry1 = tk. Entry (x, textvariable = var1)
Entry1.grid (row = 0, column = 1)
Entry2 = tk. Entry (x)
Entry2.grid (row = 1, column = 1)
Def seturl ():
Url = "https://tieba.baidu.com/p/5475267611"
# I wanted to dynamically obtain the input in the text box. I don't know why the return value on Windows 10 is NONE,
# Windows 7 on the other computer is okay. I don't know if it is an environment configuration problem.
# Print (var1.get ())
Start (url)
Cbbtn1 = tk. Checkbutton (x, text = "agree to the Protocol ")
Cbbtn1.grid (row = 2, column = 0)
Btn1 = tk. Button (x, text = "start crawling", command = seturl)
Btn1.grid (row = 2, column = 2)
Btn2 = tk. Button (x, text = "cancel ")
Btn2.grid (row = 2, column = 3)
# Img = tk. PhotoImage (file = "C: \ Users \ 123456 \ Pictures \ lovewallpaper \ 11.jpg ")
# Imgview = tk. Label (x, image = img)
# Imgview. grid (row = 0, column = 2)
X. mainloop ()