Large data paging Twitter's cursor way of paging through web data

Last Update:2017-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The technology realization method of the graph function take MySQL for example

SELECT * from msgs where thread_id =? Limit page * count, Count

But when we look at the Twitter API, we find that many interfaces use the cursor approach, rather than page, count as an intuitive form, such as the followers IDs interface

code is as follows

copy code

URL:

Http://twitter.com/follo Wers/ids.format

Returns An array of numeric IDs for every user following the specified user.

Parameters:
* cursor. Required. Breaks the results into pages. Provide a value of-1 to begin paging. Provide values as returned to into the response body ' s next_cursor and Previous_cursor attributes to page back and forth in The list.
o example:http://twitter.com/followers/ids/barackobama.xml?cursor=-1
o example:http://twitter.com/followe rs/ids/barackobama.xml?cursor=-1300794057949944903

Http://twitter.com/followers/ids. format

As you can see from the above, http://twitter.com/followers/ids.xml this call needs to pass cursor parameters for paging instead of the traditional url?page=n&count=n form. What are the advantages of doing this? Do you want each cursor to maintain a mirror image of the current dataset? Prevent duplicate content from query results due to real-time change of result set?
In Google Groups, the cursor expiration discussion, the Twitter architect John Kalucki mentioned

The code is as follows

Copy Code

A cursor is a opaque deletion-tolerant index to A btree keyed by source
UserID and modification time. It brings in the
Reverse Chron sorted list. So, since can ' t change the past, and other than
erasing it, it's effectively stable. (Modifications bubble to the top.) But
You have to deal with additions in the list head and also blocks shrinkage
due to deletions, so your blocks Begin to overlap quite a bit as the data
ages. (If you cache cursors and read very later, you'll be in the few
rows of cursor[n+1] ' s block as duplicates of the Last rows of cursor[n] ' s
blocks. The intersection cardinality is equal into the number of deletions in
Cursor[n] ' s block. Still, there may is value in caching this cursors and
then heuristically rebalancing them when the overlap On crosses some
threshold.

In another new cursor-based pagination not multithread-friendly, John also mentions

The code is as follows

Copy Code

the page based Approach does not scale with large sets. We can no
longer support this kind of API without throwing a painful number of
503s.

Working with row-counts forces the data store to recount rows in a O
(n^2) manner. Cursors avoid this issue by allowing practically
constant time access to the next block. The cost becomes O (n/
block_size) which, yes, is O (n), but a graceful one given n < 10^7 and
a block_size of 5000. The cursor approach provides a more complete and
consistent result set.

proportionally, very few users require multiple page-fetches with a
page size of 5,000.

Also, scraping the social graph repeatedly at high speed was could
often be considered a low-value, Borderline abusive use the social
graph API.

It is clear from these two paragraphs that the purpose of using the cursor method for data in large result sets is primarily to improve performance significantly. Or take MySQL as an example to illustrate, such as paging to 100,000, without cursor, the corresponding SQL

SELECT * FROM msgs limit 100000, 100

On a millions of recorded tables, it takes more than 5 seconds to execute this SQL for the first time.
Assuming that we use the value of the table's primary key as cursor_id, the SQL corresponding to the cursor paging method can be optimized to

SELECT * FROM msgs where ID > cursor_id limit 100;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Large data paging Twitter's cursor way of paging through web data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Large data paging Twitter's cursor way of paging through web data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support