Then the previous blog, we come to talk about Java operations Cassandra paging, It is important to note that this page and we usually do page pagination is different, specific what is different, we are resistant to look down.
The last blog talked about the Cassandra of the page, I believe you will be aware of: the next query depends on the last query (all the primary key of previous query), not as flexible as mysql, so can only implement the previous page, the next page such a function, The number of pages that cannot be implemented (hard to achieve performance is too low).
Let's take a look at the paging practices that drive the official
If a query gets a large number of records and returns back at once, the efficiency is very low, and it is likely to cause memory overflow, so that the entire application Crashes. so, The driver is paging through the result set and returning the appropriate data for a particular page.
first, Set the fetch size (Setting the Fetch)
Fetch size refers to the number of records obtained from Cassandra at a time, in other words, the number of records per page; we can specify a default value for the fetch size of the cluster instance when it is created, and if not specified, the default is 5000.
// at initialization: Cluster Cluster = cluster.builder () . addcontactpoint ("127.0.0.1") . withqueryoptions ( New Queryoptions (). setfetchsize () . build (); // Or at runtime:cluster.getconfiguration (). getqueryoptions (). setfetchsize (2000);
also, fetch size can be set ON statement
New Simplestatement ("your query"); statement.setfetchsize (2000);
If fetch size is set on the statement, then statement fetch size will work, otherwise the fetch size on the cluster Function.
Note: Setting the fetch size does not mean that Cassandra always returns an exact result set (equal to fetch size), which may return a bit more or less of the result set than the fetch SIZE.
second, The result set iteration
Fetch size limits the number of result sets returned per page, and if you iterate over a page, the driver automatically crawls the next page of records in the Background. In the following example, fetch size = 20:
By default, background auto-fetching occurs at the last minute, when a Page's records are Iterated. If better control is required, the ResultSet interface provides the following methods:
Getavailablewithoutfetching () and isfullyfetched () to check the current state;
Fetchmoreresults () to force a page fetch;
Here's How to use these methods to pre-remove a page in advance to avoid the performance degradation of the next page after a page has been iterated:
ResultSet rs = session.execute ("your query"); for (Row row:rs) { if (rs.getavailablewithoutfetching () = &&! rs.isfullyfetched ()) // This is asynchronous // Process the row ... System.out.println (row);}
third, Save and reuse the paging state
sometimes, Saving the paging state is very useful for future restores, imagine a stateless web service, displaying a list of results, and displaying a link to the next page, and when the user clicks on the link, we need to execute exactly the same query as before, except that the iteration should start at the point where the previous page Stopped. The equivalent of remembering where the previous iteration came from, then the next page starts Here.
To do this, the driver exposes a Pagingstate object that represents where we are in the result set when the next page is Fetched.
ResultSet ResultSet = Session.execute ("your query"); // iterate the result set ... Pagingstate pagingstate = resultset.getexecutioninfo (). getpagingstate ();
= pagingstate.tostring (); byte [] bytes = Pagingstate.tobytes ();
The contents of the Pagingstate object being serialized can be persisted and stored as parameters for paging requests for subsequent reuse and deserialization into objects:
Pagingstate.frombytes (byte[] bytes); Pagingstate.fromstring (String str);
Note that the paging state can only be reused using exactly the same statement (same query, same parameter). moreover, It is an opaque value that is used to store a state value that can be reused, and if an attempt is made to modify its contents or use it on a different statement, the driver throws an Error.
Let's take a look at the code, the following example is a request to simulate page paging, implementing all the records in the teacher Table:
Interface:
Import java.util.Map; Import com.datastax.driver.core.PagingState; public Interface icassandrapage{ Map<string, object> page (pagingstate pagingstate);}
View Code
Main Code:
Importjava.util.ArrayList;Importjava.util.HashMap;Importjava.util.List;Importjava.util.Map;Importcom.datastax.driver.core.PagingState;Importcom.datastax.driver.core.ResultSet;Importcom.datastax.driver.core.Row;Importcom.datastax.driver.core.Session;Importcom.datastax.driver.core.SimpleStatement;Importcom.datastax.driver.core.Statement;Importcom.huawei.cassandra.dao.ICassandraPage;Importcom.huawei.cassandra.factory.SessionRepository;Importcom.huawei.cassandra.model.Teacher; public classCassandrapagedaoImplementsicassandrapage{Private Static FinalSession session =sessionrepository.getsession (); Private Static FinalString cql_teacher_page = "select * from mycas.teacher;"; @Override publicmap<string, object>page (pagingstate Pagingstate) {Final intResults_per_page = 2; Map<string, object> result =Newhashmap<string, object> (2); List<Teacher> teachers =NewArraylist<teacher>(results_per_page); Statement St=Newsimplestatement (cql_teacher_page); St.setfetchsize (results_per_page); //the first page does not have a paging state if(pagingstate! =NULL) {st.setpagingstate (pagingstate); } ResultSet RS=Session.execute (st); Result.put ("pagingstate", Rs.getexecutioninfo (). getpagingstate ()); //Note that we do not rely on results_per_page, because fetch size does not mean that Cassandra always returns an exact result set//It may return a little bit more or less than fetch size, in addition, we may end up in the result set intRemaining =rs.getavailablewithoutfetching (); for(Row row:rs) {Teacher Teacher= this. Obtainteacherfromrow (row); Teachers.add (teacher); if(--remaining = = 0) { break; }} Result.put ("teachers", teachers); returnresult; } PrivateTeacher obtainteacherfromrow (row row) {Teacher Teacher=NewTeacher (); Teacher.setaddress (row.getstring ("address")); Teacher.setage (row.getint ("age")); Teacher.setheight (row.getint ("height")); Teacher.setid (row.getint ("id")); Teacher.setname (row.getstring ("name")); returnteacher; } }
View Code
Test code:
Importjava.util.Map;Importcom.datastax.driver.core.PagingState;Importcom.huawei.cassandra.dao.ICassandraPage;Importcom.huawei.cassandra.dao.impl.CassandraPageDao; public classpagingtest{ public Static voidmain (string[] Args) {icassandrapage casspage=NewCassandrapagedao (); Map<string, object> result = Casspage.page (NULL); Pagingstate pagingstate= (pagingstate) Result.get ("pagingstate"); System.out.println (result.get ("teachers")); while(pagingstate! =NULL) { //Pagingstate objects can be serialized into strings or byte arraysSystem.out.println ("=============================================="); Result=Casspage.page (pagingstate); Pagingstate= (pagingstate) Result.get ("pagingstate"); System.out.println (result.get ("teachers")); } } }
View Code
Let's take a look at Statement's setpagingstate (pagingstate) method:
four, Offset Query
Saving the paging state ensures that moving from one page to the next page works well (or the previous page), but it does not satisfy random jumps, such as jumping to page 10th, as we do not know the page state of the previous page of page 10th. The feature of offset queries like this is not supported by Cassandra native, because the offset query is inefficient (performance is inversely proportional to the number of skipped rows), so Cassandra does not encourage the use of offsets. If you want to implement an offset query, we can simulate the implementation on the Client. But the performance is still linear inverse, it is said that the greater the offset, the lower the performance, if the performance in our acceptance range, that can be achieved. For example, each page shows 10 rows and a maximum of 20 pages, which means that when the 20th page is displayed, there is a maximum of additional 190 rows to fetch, but this does not cause too much performance degradation, so the simulation implementation of the offset query is still possible.
For example, assuming that 10 records are displayed per page, fetch size is 50, We request the 12th page (that is, line 110th to 119th):
1, the first execution of the query, The result set contains 0 to 49 rows, We do not need to use it, only need paging state;
2, using the first query to get the paging state, the second query execution;
3, the second query to get the paging state, the third query Execution. The result set contains 100 to 149 rows;
4, The result set obtained by the third query, first filter out the first 10 records, then read 10 records, and finally discard the remaining records, read the 10 records is the 12th page needs to display RECORDS.
We need to try to find the best fetch size to achieve the best balance: too small means more queries in the background, and too big means more information and more unwanted rows to Return.
In addition, the Cassandra itself does not support offset Queries. The implementation of Client-side analog offsets is only a compromise when performance is met. The official recommendations are as follows:
1. Use the expected query pattern to test the code to ensure that the assumptions are correct
2. Set the hard limit of the highest page number to prevent a malicious user from triggering a query that skips a large number of rows
v. Summary
Cassandra support for pagination is limited, the previous page, the next page is better Implemented. Do not support the offset of the query, hard to implement, you can use the Client-side simulation, but this scenario is best not used in cassandra, because Cassandra is generally used to solve big data problems, and the offset query once the data volume is too large, performance is not flattering.
In my project, the index repair used the Cassandra paging, the scene is as follows: Cassandra table does not build Two-level index, with Elasticsearch implementation of Cassandra table Two-level index, then will be related to the consistency of the index repair problem, Here to use the Cassandra of the page, the Cassandra of a table for a full table traversal, and Elasticsearch in the data of a pair, if Elasticsearch does not exist, in the Elasticsearch is added, If present and inconsistent, fix it in Elasticsearch. Specific Elasticsearch How to achieve Cassandra indexing function, in my follow-up blog will be dedicated to explain, here is not much to Say. Paging is required for full table traversal in the Cassandra table because the amount of data in the table is too large for billions of levels of data to be loaded into memory at Once.
Engineering Accessories
Java implementation of Cassandra Advanced Operations page (with project specific Requirements)