QQ Space Python crawler v2.0--data analysis

Source: Internet
Author: User
Tags parse error

Remember the last time v1.0 space Crawler, ready to write a reptile analysis I said the likes of the situation

First parse the JSON:

You can find the node that likes data-->vfeeds (list)-->like-->likemans (list)-->user-->nickname&uin

The code is as follows:

1  forIinchRange (0, page):2     Try:3html = requests.get (url_x + str (numbers) + url_y, headers=headers). Content4data =json.loads (HTML)5 6         if 'vfeeds' inchdata['Data']:7              forVfeedinchdata['Data']['vfeeds']:8                 if ' like' inchVfeed:9                      forLike_maninchvfeed[' like']['Likemans']:TenQq_list.append (int (like_man['User']['UIn'])) One                         #This dict needs to be defined within the loop, because the following list.append () is a reference pass ALike_me_map =dict () -like_me_map['Nick_name'] = like_man['User']['Nickname'] -like_me_map['QQ'] = like_man['User']['UIn'] the like_me_list.append (Like_me_map) -Numbers + = 40 -Time.sleep (10) -         Print('before being analyzed'+ str (numbers) +'Bar Data') +     except: -Numbers + = 40 +Time.sleep (10) A         Print('Section'+ str (numbers) +'Parse error near bar data')

Like_me_list is a list of dict,qq_list is a collection of all QQ numbers, now define a dict to facilitate querying QQ and nickname:

1 # set up a map of QQ and nickname, so as to query 2 qq_name_map = dict ()3 for in like_me_list:4     QQ _name_map[man['qq'] = man['nick_name']

use set for automatic de-weight and count:

1 # Calculate the number of likes, and put the number of times and QQ map into map 2 qq_set = Set (qq_list)3 for in qq_set:4     Like_me _RESULT[STR (QQ)] = Qq_list.count (QQ)

And then sort by the number of likes in descending order, where the code is more ugly =. =

1 #The following processing is: Sort by the number of likes to save a new map as the final result, the code is not elegant =. =2Num_result = sorted (Like_me_result.values (), reverse=True)3 Print(Num_result)4  forNuminchNum_result:5      forKeyinchLike_me_result.keys ():6         ifLike_me_result[key] = =Num:7result[qq_name_map[key]+'('+ key +')'] = num

Finally, write the file and you are done:

1 Try:2With open (OS.GETCWD () +'\\'+'Like_me_result.txt','WB') as fo:3          forKvinchResult.items ():4Record = k +': Likes'+ str (v) +'times! \ r \ n'5Fo.write (Record.encode ('Utf-8'))6         Print("Click like data result analysis write complete")7 8 exceptIOError as msg:9     Print(msg)

However, finally, I found a problem, is the QQ space returned by the JSON point like data is not complete,

NUM stands for the number of likes, a total of 13 people to praise, but I observed that all said to say the number of likes collection Likemans up to only 3

I guess this is because the mobile QQ Space page does not need to show the likes of Information , so it does not have a complete point like information,

The UI shows that there is no like information displayed =. =

So this reptile can only be counted as a semi-finished

QQ Space Python crawler v2.0--data analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.