This article mainly introduces the Python auto-brush Blog Browse volume instance code, small series feel very good, and now share to everyone, but also for everyone to do a reference. Let's take a look at it with a little knitting.
Source of Ideas
Today it was a chance to hear people talking about the current "brush volume" behavior, which stimulated my curiosity. Then I looked at the next requests module just for me, and I wrote a simple test case. The miraculous discovery of this trick actually worked. What are you waiting for, a brush?
Prelude
The idea is simple, is the implementation of a send request, it can be. The code is as follows:
headers = { ' referer ': ' www.php.cn ', ' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.94 safari/537.36 '}def gethtml (url,headers): req = Urllib2. Request (url,headers=headers) page = Urllib2.urlopen (req) html = page.read () return HTML
We can manually add the target URL and a headers. I will naturally use my own to do the test.
The code tries to run, and it does increase the number of views.
Slow growth
Since such a scheme is feasible, it means that the train of thought is correct. So we naturally think of writing a loop. Does this make it possible to increase the number of browsing?
Yes, that's exactly what I did. The code is as follows:
i= 0while i <: url = ' jb51.net/marksinoberg/article/details/51501377 ' gethtml (url,headers)
In the beginning, you can obviously see the increase in the number of blogs, (^^) hehe ... . However, I found that the number of views increased 10 times after. Just hehe.
Then can not increase, it is estimated that the server has made a certain limit to my visit, otherwise it should be feasible.
Find ways to Pit
As the saying goes, "There are policies, there are countermeasures," I naturally can not be bound ah, so I guess is my IP records. I then added some restrictions to my visits.
My solution:
Proxy IP access: But given that there is no server, the agent cannot access the IP.
Change IP: Since that's the case, I'll try to change my IP to access it. So how do you change your IP? (now think good regret, at that time the computer network did not take a good lecture, IP deception did not learn, otherwise now can not use it). But all roads lead to Rome, and I have other ways. As follows:
C:\users\administrator>ipconfig/release
Windows IP Configuration
Cannot perform any action on the local connection, it has disconnected the media.
Wireless LAN Adapter Wireless network connection:
Connect a specific DNS suffix .....:
Local Link IPV6 address ... : fe80::1d9f:d97b:fd16:1f6f%
Default gateway ............. :
Ethernet Adapter Local Connection:
Media status ............: Media is disconnected
Connection-specific DNS suffix .......:oureda.cn
Ethernet Adapter VMware Network Adapter VMNET1:
Connect a specific DNS suffix .....:
Local Link IPV6 address ... : fe80::359d:e81d:741:f257%1
IPV4 Address ............:192.168.229.1
Subnet Mask ............:255.255.255.0
Default gateway ............. :
Ethernet Adapter VMware Network Adapter VMnet8:
Connect a specific DNS suffix .....:
Local Link IPV6 address ... : fe80::94b1:d10f:b68:101d%1
IPV4 Address ............:192.168.244.1
Subnet Mask ............:255.255.255.0
Default gateway ............. :
Ethernet Adapter VirtualBox host-only Network:
Connect a specific DNS suffix .....:
Local Link IPV6 address ... : fe80::a5eb:545c:7d89:9451%
IPV4 Address ............:192.168.56.1
Subnet Mask ............:255.255.255.0
Default gateway ............. :
Tunnel adapter ISATAP. {4F399971-B739-4B71-BD79-E48233EEC9BE}:
Media status ............: Media is disconnected
Connect a specific DNS suffix .....:
Tunnel adapter ISATAP. {1860C94E-1007-4418-9A26-7D8AA8F06E15}:
Media status ............: Media is disconnected
Connect a specific DNS suffix .....:
Tunnel adapter ISATAP. oureda.cn:
Media status ............: Media is disconnected
Connect a specific DNS suffix .....:
Tunnel Adapter isatap.dlut.edu.cn:
Media status ............: Media is disconnected
Connect a specific DNS suffix .....:
Tunnel adapter ISATAP. {6F7F27ED-942E-4EFB-ACF2-A4E8793B161D}:
Media status ............: Media is disconnected
Connect a specific DNS suffix .....:
C:\users\administrator>ipconfig/renew
Windows IP Configuration
Cannot perform any action on the local connection, it has disconnected the media.
Wireless LAN Adapter Wireless network connection:
Connect a specific DNS suffix .....:
Local Link IPV6 address ... : fe80::1d9f:d97b:fd16:1f6f%12
IPV4 Address ............:192.168.58.70
Subnet Mask ............:255.255.252.0
Default gateway ............. : 192.168.56.1
Ethernet Adapter Local Connection:
Media status ............: Media is disconnected
Connection-specific DNS suffix .......:oureda.cn
Ethernet Adapter VMware Network Adapter VMNET1:
Connect a specific DNS suffix .....:
Local Link IPV6 address ... : fe80::359d:e81d:741:f257%14
IPV4 Address ............:192.168.229.1
Subnet Mask ............:255.255.255.0
Default gateway ............. :
Ethernet Adapter VMware Network Adapter VMnet8:
Connect a specific DNS suffix .....:
Local Link IPV6 address ... : fe80::94b1:d10f:b68:101d%15
IPV4 Address ............:192.168.244.1
Subnet Mask ............:255.255.255.0
Default gateway ............. :
Ethernet Adapter VirtualBox host-only Network:
Connect a specific DNS suffix .....:
Local Link IPV6 address ... : fe80::a5eb:545c:7d89:9451%16
IPV4 Address ............:192.168.56.1
Subnet Mask ............:255.255.255.0
Default gateway ............. :
Tunnel adapter ISATAP. {4F399971-B739-4B71-BD79-E48233EEC9BE}:
Media status ............: Media is disconnected
Connect a specific DNS suffix .....:
Tunnel adapter ISATAP. {1860C94E-1007-4418-9A26-7D8AA8F06E15}:
Media status ............: Media is disconnected
Connect a specific DNS suffix .....:
Tunnel adapter ISATAP. oureda.cn:
Media status ............: Media is disconnected
Connect a specific DNS suffix .....:
Tunnel Adapter isatap.dlut.edu.cn:
Media status ............: Media is disconnected
Connect a specific DNS suffix .....:
Tunnel adapter ISATAP. {6F7F27ED-942E-4EFB-ACF2-A4E8793B161D}:
Media status ............: Media is disconnected
Connect a specific DNS suffix .....:
Yes, we must have seen it. The two core commands are
Change the configuration of the routing table ipconfig/release//Release network, Ipconfig/renew //Reassign IP
This is basically a bit of an effect for changing your IP. Especially for LAN users.
So, I just need to call the system's cmd command in Python code, and I can change my IP dynamically. And that's what I need.
Problem
Although the IP problem has been solved, but this brush, it is still too slow. Because it takes time to update the routing table. And it's really slow and slow compared to how fast the code is running. And only 10 views can be brushed at a time, the amount. It's really a bit of an embarrassment. It took so much effort to brush 10 views. How to solve this problem?
I actually did not really solve this problem, but I found that the restriction is not particularly strong, because I ate a meal halfway, when I came back to find the original IP can be brushed. It's about 45 minutes. This is a breakthrough point.
Source
In fact, the idea is very simple, is to find ways to solve the problem. No matter how strong the other's system, it can not be seamless, there will always be a way to solve. Here is the code.
# coding:utf-8# author = ' Mark Sinoberg ' # date = ' 2016/5/26 ' # Desc = test test refresh your blog's browsing volume import Urllib2,refrom BS 4 Import beautifulsoupdef gethtml (url,headers): req = urllib2. Request (url,headers=headers) page = Urllib2.urlopen (req) html = page.read () return Htmldef Parse (data): content = BeautifulSoup (data, ' lxml ') return contentdef getreadnums (data,st): reg = Re.compile (ST) Return Re.findall (reg,data) url = ' http://jb51.net/marksinoberg/article/details/51493318 ' headers = { ' referer ': ' http://jb51.net/', ' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.94 safari/537.36 '}i = 0while i<24: html = gethtml ( url,headers) content = Parse (html) result = Content.find_all (' span ', class_= ' Link_view ') print result [0].get_text () i = i +1
Code Run Result:
D:\Software\Python2\python.exe e:/code/python/mytestset/ulib2/ ADDWATCHER.PY94 Reading 95 People reading 96 people reading 97 people reading 98 People reading 99 People reading 100 people reading 101 people reading 102 People reading 103 People reading 104 People reading 105 people reading 106 People reading 107 people reading 108 People reading 109 People reading 110 people reading 111 people read 112 113 people reading 114 People reading 115 People reading 115 People reading 115 People reading process finished with exit code 0
The better place is to use the BeautifulSoup grab a specific location of data, here is the crawl of the amount of browsing. From the above results can also be seen, an IP fetch data volume is limited, in general is 10~30, here seems to be 22 times access.
Prospect
In fact, I can do a run a number of times the effect of the refresh, but this is not particularly decent, so I would like to talk about their own ideas.
Judge result (PageView result) and, when there are two consecutive results consistent, open the Python execute cmd command and update your IP. But this is a time-consuming operation that can be put into a thread
Then you crawl through the list interface of your blog to get all of your blog posts. Of course, it's obviously going to use a mock landing. Then brush the volume for each blog. This does not really solve the problem, but a little, it will also play a good effect.
Do a timed brush of the thread, every xx time to refresh again. On this day, an article will probably be able to achieve a hundreds of visit. (I haven't tried, I don't know)