WTI website recruitment information crawling, visualization operations, and recruitment information
Objective: To use Python to capture data on the internship website and analyze job information, and use Python for Visual Analysis
Software: Python 3.0
1. Introduction to the website Crawler
Intern monk Website: http://www.shixiseng.com/
Enter data in the search box and jump to the page. Fn + f12 will show the webpage debugging tool.
Refresh the page and click the first link.
Url is the url used by the crawler. The meaning of k and p has been explained. Click the last page to view a total of 109 data pages.
Then, the request Headers information is used to simulate browser logon.
Right-click the webpage and check the source code. We want to crawl the job name, job details URL, monthly salary, work location, and other information. The regular expression is as follows:
Okay. After the basic work is completed, the code needs to be further built.
How to flip the page and crawl the next page is to use the cycle to adjust the parameter P to capture the entire page.
Then, combine the fields to be crawled and write them into an excel file.
Required: import xlwt # Read and Write Excel files
Finally, run the code and get the result. There are 1085 records in total, which takes more than 30 seconds.
Ii. PTYHON Data Analysis
First, import the required package and then read the Excel file.
Get:
The website data is temporarily unavailable, so the two columns are deleted.
It mainly analyzes the salary, working days, work location and time requirements.
Let's take a look at a simple one:
1. Requirements and distribution of working days
2 internship requirements
3 Distribution of internship locations
What the hell?
Filter out frequencies less than 5
4. internship salary level
The same problem ,,,
There are 168 categories, so it is crowded like that... If the frequency is less than 10.
Summary:
Internship location: There are many data analysis job internships in Beijing and Shanghai, followed by Guangzhou and Shenzhen. Second, second-tier cities, such as Chengdu, Nanjing, and Hangzhou.
Working days: the maximum number of internships is required for five days/week, accounting for 44.61%, followed by four/week and three/week.
Internship time: requires at least three months of internship, followed by six months and four months.
Internship salary: the most concentrated in the-yuan range. More than half of my internship salary exceeds 100.
--------------------------------------------------
For the first time, please confirm.