Python captures special code record and python special code record
Background:
When I was a child, there was a kind of game, two protagonists: Bai XX and Zeng XX. Each family gave them up like a living Bodhisattva for them to eat and wear.
The business owner has made great efforts for the two living Bodhisattvas.
Farmers sell cattle, farms, and products for the two living Bodhisattvas
As an official officer, they threw away the gauze caps for the two living Bodhisattvas.
Jumping off a building, committing suicide, and jumping off a river has become a temporary trend
Of course, buying codes is also a joke.
For example, non-literate farmers saw a special code on the CCTV7 channel, and finally bet on the 07 game.
For the sake of white XX and Zeng XX, many people finally became loyal fans of the TV Children Channel antenna baby. This also shows the special code.
I think of an elder man who became a clerk of Bai XX and Zeng XX after retirement. Every day, he updated his records and various drawings manually. At that time, I was thinking that I would be half serious when I was studying, at least let Beijing city go after reading it.
Start to work, prepare to take this thing as a series for in-depth research
It's hard to get the data from the Internet. Today, Buddyquan tries to get the data from the Internet.
Knowledge point: the use of the splinter database saves captured data to the database. The pandas Database connects to the Mysql database for data calling and basic statistical sorting.
Tutorial: Google chrome Driver Installation
Briefly explain why this database is selected for data capture:
1. everything had to start from 12306. At that time, I had made up a set of small programs for checking tickets, but I had been thinking about how to bypass the verification code. Then I came to the Internet to find a solution, the answer to the Verification code is not found, but the title party is found. In fact, the content is to introduce the use of this library, and the patience of the sub-operations and training, after reading the official documents, I found this guy useful.
The response returned by the response is the webpage text after js (the same as the review element)
From 3.1976 to 2017, that is, 41 years. That is to say, as long as you access 41 webpages, it will not cause much damage to the memory. In a comprehensive consideration, this splinter can be used.
So what can splinter do? In a word on the Internet:
When Splinter is executed, the browser you specified is automatically opened to access the specified URL.
Then, any simulated behavior you develop will be automatically completed. You just need to sit in front of your computer and watch the various actions on the screen like watching a movie and then collect the results.
Reference: http://www.chinaz.com/program/2015/1209/481234.shtml
Official documents: https://splinter.readthedocs.io/en/latest/tutorial.html
How to install the splinter and chrome browser drivers
Step1. install the splinter Library (pip install splinter)
Step 2: Install the browser driver. I chose the chrome driver.
1. download. 2. decompress the package. 3. Place it in C: \ Windows \ System32 4. OK
Driver link: http://pan.baidu.com/s/1nv2ni5N password: 5c1c
Next, we will start to capture some fields related to the lottery results!
Web sites can be found through browser searches (there are many and many such types of websites. However, the lottery records are basically generated by js .)
We can see that this kind of prize opening record is much better than when we were a child who only paid attention to the final number. Even strokes and five elements are unique. We have counted 27 valuable fields with a magnifier.
1. It is good to create a table and write the data into the database. When interesting statistical analysis and research are required, you can simply use the data.
I don't need to create a table statement to write it into the program because I have found that navicat for mysql is really useful.
2. The basic analysis of the webpage is complete, and the table is also retrieved after it is created.
3. directly use splinter to parse js Code without using regular expressions and json.
# Coding utf-8import requestsfrom bs4 import BeautifulSoupfrom splinter. browser import Browserimport pymysqlfrom PIL import Imageimport pandas as pd # use it for data processing and analysis. B = Browser (driver_name = "chrome") # Start the browser driver con = pymysql. connect (host = '2017. 0.0.1 ', port = 3306, user = 'root', passwd = '000000', db = 'quany', charset = 'utf8') # connect to the database cur = con. cursor () # create a cursor def get_html (url): B. visit (url) html# B .html return htmlpic_url = 'HTTP: // c Latency for a in range (1976, 2018)] num = 1for c in year: url = url1 + str (c) s = BeautifulSoup (get_html (url), 'lxml ') length = len (s. find_all ('tr', class _ = 'nowto001 ') for I in range (length): value_list = [] for d in s. find_all ('tr', class _ = 'nowto001 ') [I]. stripped_strings: value_list.append (d) try: SQL = "replace into bu Ddyquan \ (year, qishu, ma1, shengxiao1, ma2, Hangzhou, ma3, shengxiao3, ma4, Hangzhou, ma5, Hangzhou, ma6, shengxiao6, tema, \ texiao, tebo, tetou, danshuang, wuxing, jiaye, daxiao, weishu, duanwei, bihua, nannv, heshu, zonghe) \ values ('% s',' % s ', '% s ', '% s ', \ '% s',' % s ', '% s',' % s', '% s') "\ % (value_list [0] [: 4], value_list [0] [5: -1], value_list [1], value_list [2], value_list [3], value_list [4], \ value_list [5], value_list [6], value_list [7], value_list [8], value_list [9], \ value_list [10], value_list [11], value_list [12], value_list [25], value_list [27], \ value_list [28], value_list [29], value_list [30], value_list [31], value_list [32], \ value_list [33], value_list [34], value_list [35], value_list [36], value_list [37], value_list [38], value_list [39]) cur. ex Ecute (SQL) con. commit () # submit the transaction print ('inserted successfully % s % s' % (value_list [0] [: 4], value_list [0] [5:], num) failed t Exception as e: print ('insertion error \ n % s' % e) con. rollback () # Roll Back transaction num = num + 1 print ('% s annual lottery result Collection Completed' % value_list [0] [: 4]) print ('special code is coming! \ N has successfully completed the collection of yearly lottery results, a total of % s '% num) with open ("d: // miss0000.jpg", 'wb') as f: f. write (requests. get (pic_url ). content) missbai = Image. open ("d: // miss).jpg") missbai. show () B. quit ()
The code method is very stupid, and there are many other aspects to optimize. Please kindly advise me a lot.
Eating hot pot singing songs, watching the browser go through the clouds and rain, the data is stored in the database
Here, a num is used to print the number of periods.
As you can see, we have already performed 4865 awards so far. How many people have been there for N years. Ah
The data has been written to the database. Today, we will simply use the pandas database to warm up a bit (although this data volume can be taken off in excel)
We first set the number of lines in the pd to 8 and the display width to 200, which makes it easier to view data.
Read all the data in the database (here you can select any data you want to read, it can be understood as using SQL to extract data and then using pandas for interaction)
Check the data (the effect is still poor, and the column width is still faulty. You can use pd. set_option ('display. max_colwidth', 20) to set the maximum width of a column. It will be used)
First, adjust the maximum width of the display unit to 49.
Come and come, fans of antenna baby, guess which number has the most frequent Awards
Ask you to buy 4X and buy. In my life, there is no wool in multiple figures.
Do you know which Zodiac has the highest frequency?
You continue to provide white XX and Zeng XX. If you use the antenna baby as an idol, it will actually become a pig.
After predicting the event, please listen to the next Decomposition
Note: the statistical analysis description in this article has no reference. The Lottery data has been mixed with false data. All the data displayed are virtual data, so you can cherish your life and stay away from BAI xjie and the antenna baby.
What is a probability event is always a refined goblin. Are you heart-warming? Due to your limited personal abilities, if you are excited, we sincerely invite you to join in a lot of interesting data statistics and analysis work.
The capabilities are limited and the performance is not good. The great gods may include many things, especially those in logic and thinking. They need to see a lot of advice and an ax.
Buddyquan blog: https://home.cnblogs.com/u/buddyquan/
QQ: 1749061919 crawlers and data exchange