Write the lottery activity into a technical blog and a lottery activity.
How is the experience of writing a lottery into a technical blog?
Preparations for this activity: A Preliminary Exploration of Python Crawlers
Http://blog.csdn.net/eclipsexys/article/details/48193541
Please understand it first, or it will be for a lucky draw!
Lottery
Soon after my new book "English for Android" was published, I hope more people will be able to support it in order to reward you for your great recommendations. I would like to prepare for this lucky draw.
Lucky Draw recipient
If you leave a message in this blog, you can participate in the lucky draw.
The message content is as follows:
PS please do not repeat the comment. Although it can increase my popularity, it will not affect the lottery probability.
PS please do not deceive the doctor sincere, pure, sincere, kind heart!
PS if you don't want my books or my subsidies, but you have won the prize, please come directly to Shanghai, pusoft building 703, I! Please! You! Eat! Food! Tang !.
Prize !!!
- If you have not purchased "Android group English Biography", the prize will be a signed version of "Android group English Biography" (if you think my words are too ugly, you can also upload images)
- If you have purchased "Android group English pass", the prize will be reimbursed for your book purchases.
It is not easy to write a book. I only earn 4 yuan for a book.In line with the core socialist values, please do not deceive my sincere feelings ~~~~
Prize quantity
(Comments/40) + 1
Deadline
The lucky draw time is tentatively scheduled on the eve of the Mid-Autumn Festival on April 9, September 25, 2015. I hope to bring you a good Mid-Autumn Festival gift.
Below is the technical post. below is the technical post.
How to win a lottery
The lottery method is very simple. count all the limited messages, obtain their usernames, and use random numbers to determine the usernames of the winners.
As a technology house, I certainly don't want to make statistics by myself. If I can automate it, I don't need my girlfriend. If I can write scripts, I don't need my girlfriend. Therefore, the purpose of this blog is to teach you how to use your girlfriend correctly.
Analysis
First, let's look at the CSDN blog comment system.
Oh, I didn't mean to cut off so many compliments. Please ignore it.
We open Chrome's review element:
Use a magnifier to find the User Name:
Right-click the source code to find the source code. However, we suddenly found that, no, the source code does not have such comments.
Oh, this should also be true. ajax should be used for loading comments. Otherwise, the entire page will be refreshed after each comment.
OK, then we will go to the Network tag, refresh the page, and get the data:
Show the js called by the comments. In these files, take a look.
First, we try to filter several keywords first, such as comment:
Dear valued customer, the first link is very suspected. Right-click and open a new window:
It seems that good English people are not lucky.
In this way, we can easily get the comment address:
Http://blog.csdn.net/eclipsexys/comment/list/47405045? Page = 1
It can be seen from the URL, but a user name is added for distinction.
OK. Here we can use:
What else to crawl, directly request this interface, and we will get the data. Therefore, the post we mentioned above is actually fraudulent traffic.
Implementation
The implementation is very simple. Nima and all interfaces have the data to remove comments that repeat and reply to invalid comments. The rest is the valid data.
On Python, let the people who have read the prepared post not be disappointed:
# Coding: utf-8import requestsimport jsonimport randomclass Prize (object): def _ init _ (self): print u 'start drawing' # Get web page information def getSource (self, url): head = {'user-agent': 'mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) chrome/41.0.2272.118 Safari/537.36 '} html = requests. get (url, headers = head) html. encoding = 'utf-8' return html. text # retrieve all comment information def getAllCommentInfo (sel F, source): return json. loads (source) ['LIST'] # Save it to the def saveinfo (self, commentInfo): f = open('info.txt ', 'w') for each in commentInfo: print each f. writelines ('username: '+ each +' \ n') f. close () if _ name _ = '_ main _': # set the number of winners winnerCount = 1 userList = [] url = "http://blog.csdn.net/eclipsexys/comment/list/47405045? Page = 1 "androidHeros = Prize () html = androidHeros. getSource (url) commentsInfo = androidHeros. getAllCommentInfo (html) for each in commentsInfo: if '[reply]' not in each ['content'] and each ['username'] not in userList: userList. append (each ['username']) androidHeros. saveinfo (userList) for I in range (0, winnerCount): randomNum = random. randint (0, userList. _ len _ () winner = userList [randomNum] print '\ n ----------------- Winner:' + winner + '-------------------'
More
Technology is used for practical purposes. There is no best language in the world and only the most suitable language. Please use the most appropriate language to do the most appropriate thing, refuse to use a language, starting with you and me. -- People in a group who argue why crawlers do not need to write in Java
Remember to leave a message in this blog !!! Lottery results
Coming soon, please let us know and leave a message. The comments are generally lucky.
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.