Use the search interface of Sina Weibo API for micro-blog pushing

Source: Internet
Author: User
Tags oauth

Zheng Yi 20100929

 

Application entry: http://t.rtmeme.com/

Let's briefly introduce the differences between our list and Sina's popular forwarding lists:

Weibo will ignore celebrity promotions, pay more attention to grassroots promotions, pay more attention to social livelihood promotions, and block non-nutritional promotions.

Weibo's top Weibo websites, headed by Sina Weibo, will be listed in the List.

 

 

1. Sina Interfaces

Sina Weibo's API provides the search method, as shown in its document:

URL:
Http://api.t.sina.com.cn/search.json Format:
JSON only
Get Logon required or not:
True Request quantity limit:
True Request Parameters:
Page: optional parameter. Page number (starting from 1, default value: 1)
RPP: optional parameter. Number of microblogs returned per page. 10 are returned by default, and the maximum value is 200.

Although it is called "login required", in fact, you only need to input the appkey, no login, no oauth, of course, it does not rule out that Sina will require oauth login in the future, however, Twitter has not yet made such a requirement on the search interface.

 

 

2. capture only the secondary stickers

We only need the posting record in Sina Weibo.

 

Calculate the Sina Weibo push list, which is the same as the Twitter push list:

    • It only extracts the information fingerprint from the source body;
    • Only the content and author of the original post are retained.

 

Unlike Twitter's Referer list:

    • No need to save the text forwarded by all the senders(Because most of them are nonsense similar to "forwarding Weibo" and there is no need to store them in large quantities.) You only need to save personal information of these people, such as portraits, screenname, and location.
    • It is important to know the number of people who have forwarded messages in the last few hours. When you want to calculate whether a message can be listed on the list, you will further obtain the number of forwards and comments. Because the old messages will once again trigger the climax of forwarding, it is important to identify the release time of the original message.
    • Saves original message thumbnails, large images, and other multimedia information.

 

3. List Calculation Method

The data of the original message (body, author, profile picture, thumbnail, and information fingerprint) is stored by default. The scanned forwarder only records the name, profile picture, and ID.

Periodically count the number of times that the information fingerprint appears in the last four hours. If there are enough times, for example, five times, try to get the original message forwarding count and the number of comments (first database, then API, if the number of forwards is large enough, for example, more than 40 times, and the number of comments is smaller than the number of forwards, you are ready to go to the list for the last machine review before the list.

 

To ensure the high quality of the list, the following rules must be set:

1. shield the ID of some original authors and reposted users;

2. Some keywords are blocked;

3. Prevent entertainment stars from being pushed up;

4. The number of tags extracted from the message body must be greater than 2 to prevent messages that are not nutritious or too short from being listed;

5. Shield and try to identify those who screen-flushing chats, for example, ignore those who are the same person as the original author;

6. Focus on blocking constellation pushing, birthday pushing, holiday pushing, and searching for people to push;

7. The posting time of the forwarded original post must be within the last n hours to prevent old posts from being overturned;

8. Block Some junk information sources, such as bookmarks, sharing, 56.com, Youku, Tudou, and related blogs;

9. shield those professional reporters, such as XX quotations and XX jokes.

10. Too many characters such as "@" "#" in the original message must be blocked;

 

4. Table Structure

Original information is stored in MongoDB. The list is stored in MySQL to facilitate web access.

 

5. Frequency

Sina Weibo has a requirement on the round-robin frequency of the search interface, for example, 1000 requests per hour, so we should try to avoid too fast round-robin.

When calculating the list message, the counts API of Sina Weibo API is called. Pay attention to the call frequency.

 

Weibo rankings Application website:

Http://t.rtmeme.com/

[End]

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.