Apache Ignite Series (iii): Data processing (loading, data collocated, data query)

Source: Internet
Author: User
Tags apache ignite

A common idea for using Ignite is to import the data from an existing relational database into ignite and then use the data directly in ignite, which is equivalent to ignite as a caching service, and of course ignite functions much more than that. The following is a demonstration of ignite data storage and query-related functionality in a way that integrates ignite into a Java service. Because of personal habits, the sample demo does not use the test code, but instead uses the rest interface demo.

Several modes stored in ignite before the data is loaded (LOCAL, replicated, partitioned):

Local: Native mode, data is stored locally, no data rebalancing, similar to common storage services;

partitioned: Partitioning mode, where data is dispersed across nodes in the cluster, partitioning mode is ideal for storing large amounts of data

Is the number of backup backups set, the default number of backups is 0, if the partition mode does not set the number of backups, there is a risk of data loss.

replicated: Replication mode, with the data rebalancing process, the master node (Primary) data is consistent with the partition pattern, except that the copy mode defaults back up the rest of the data except the primary node. Replication mode is suitable for storing data with small data volumes and growing unhappiness.

Partitioning mode and replication mode each have advantages and disadvantages, the specific choice to be based on the characteristics of the actual scene to weigh:

Mode Advantages Disadvantages
Partition mode (partitioned) Can store large amounts of data, frequent updates have little impact on it Query caching involves data movement and has an impact on query performance
Replication Mode (Replicated) Suitable for data with small data volume and stable data query performance Frequent updates have a large impact on them
1, Data loading

This is used to mybatis query MySQL data and then deposit ignite, the complete code can refer to:

Github.com/cording/ignite-example

To demonstrate, you need to generate the sample data in MySQL first, and the associated SQL script ignite-example\src\main\resources\import.sql executes the SQL script to complete the creation of the table and the initialization of the test data.

Defining the cache in the configuration file

                <bean class="org.apache.ignite.configuration.CacheConfiguration">                    <property name="name" value="student"/>                    <property name="cacheMode" value="REPLICATED"/>                    <property name="backups" value="1"/>                    <property name="atomicityMode" value="ATOMIC"/>                    <property name="copyOnRead" value="false"/>                    <property name="dataRegionName" value="Default_Region"/>                    <property name="indexedTypes">                        <list>                            <value>java.lang.Long</value>                            <value>org.cord.ignite.data.domain.Student</value>                        </list>                    </property>                </bean>

Add dependent dependencies

<dependency>    <groupId>org.apache.ignite</groupId>    <artifactId>ignite-core</artifactId>    <version>${ignite.version}</version></dependency><dependency>    <groupId>org.apache.ignite</groupId>    <artifactId>ignite-spring</artifactId>    <version>${ignite.version}</version></dependency><!-- 使用索引的话需要用到ignite-indexing这个模块 --><dependency>    <groupId>org.apache.ignite</groupId>    <artifactId>ignite-indexing</artifactId>    <version>${ignite.version}</version></dependency>

In general, the way to import data into ignite clusters is to use cache.put(...) methods, but when there is a lot of data to import, the efficiency of put is not enough, for a large number of data import can use the Ignite stream processor:

DataLoader.java

......    /**导入数据到ignite*/    public void loadData(){        //查询student集合        List<Student> result = studentDao.findAllStudents();        //分布式id生成器        IgniteAtomicSequence sequence = ignite.atomicSequence("studentPk", 0, true);        //根据缓存名获取流处理器,并往流处理器中添加数据        try(IgniteDataStreamer<Long, Student> streamer = ignite.dataStreamer(CacheKeyConstant.STUDENT)) {            result.stream().forEach(r -> streamer.addData(sequence.incrementAndGet(), r));            //将流里面的剩余数据压进ignite            streamer.flush();        }    }......

After importing the data, you can see the data stored in the monitor program:

A stream can increase the speed of loading data because the flow is essentially a batch process. Ignite is guaranteed to be consistent through a consistent hash, and each time a cache record is deposited into the cluster, Ignite calculates which node the cache maps to, based on a consistent hashing algorithm, and then stores that record on that node. In a stream processor, the stream processor stores the data that is mapped to the same node in bulk to the corresponding node, which can significantly improve the efficiency of the data load.

2, data query

The most direct way to query caching is to use the cache.get(...) method, which can only deal with the simple key-value cache, if the index type is set (Indexedtypes), then the cache will become SQL table , this time need to use SQL query, When querying using SQL, there are generally a variety of query conditions, the fields corresponding to these query conditions need to be pre-set index. Ignite there are two kinds of indexes, one is a normal index, and one is a composite index, to use @QuerySqlField annotations. and the query to use the API is mainly SqlFieldsQuery and SqlQuery , the former is a domain query, that is, query part of the field result set, and the latter is a common query.

Therefore, if you want to use SQL queries, you need to set the index type (indexedtypes) in the cache definition before loading the data, add annotations to the related attributes in the corresponding entity class for the fields that may be used in the query, and set the index if necessary. When the cache is defined to set the index type, the cache is no longer the normal kv form of the cache, but with the characteristics of the database table, this time ignite into a distributed memory database, and its SQL-related functions are based on the H2 SQL engine implementation.

1) Set the cache index type
    • Setting the index type when Java code defines the cache

Here the long is the primary key, and string is the entity class as an example:

Use to CacheConfiguration.setIndexedTypes(Long.class, String.class) set the index

    • Setting the index type in the XML configuration

indexedTypesYou can also set properties

<bean class="org.apache.ignite.configuration.CacheConfiguration">......    <property name="indexedTypes">        <list>            <value>java.lang.Long</value>            <value>org.cord.ignite.data.domain.Student</value>        </list>    </property>......</bean>
2) annotations @QuerySqlFieldThree ways to use
    • Enable entity class properties for query domain
    @QuerySqlField    private String test;

After adding the annotation, the test field can be sql accessed in the statement, which does not create an index on the property column.

    • Enable the query domain and set a normal index for the column
    @QuerySqlField(index = true)    private String test;
    • Enable query fields and set up composite indexes
    @QuerySqlField(orderedGroups = {@QuerySqlField.Group(            name = "student", order = 0)})    private String name;    @QuerySqlField(orderedGroups = {@QuerySqlField.Group(            name = "student", order = 1)})    private String email;

Where the Name property specifies the name of the combined index, and order represents the sequence of the field in the combined index.

The combined index is similar to a normal database and also follows the leftmost principle , that is, whether the combined index is used to be subject to the least-left principle.

3) using Sqlfieldsquery for domain queries

There are two pre-defined fields in the SQL syntax _key and _val :

_key: Represents all keys in the cache

_val: represents all value objects in the cache

List<List<?>> res = cache.query(new SqlFieldsQuery("select _VAL,name from \"Test\".student")).getAll();System.out.format("The name is %s.\n", res.get(0).get(0));
4) Use SqlQueryMake a normal query

NormalController.class

    @RequestMapping ("/sqlquery") public @ResponseBody String sqlquery (httpservletrequest request, httpservletrespons        E response) {ignitecache<long, student> Tempcache = Ignite.cache (cachekeyconstant.student); /** Plain Query */String sql_query = "name =?"        and email =? ";        Sqlquery<long, student> csqlquery = new Sqlquery<> (Student.class, sql_query);        Csqlquery.setreplicatedonly (True). Setargs ("student_44", "student_44gmail.com");        List<cache.entry<long, student>> tempresult = Tempcache.query (csqlquery). GetAll ();        if (Collectionutils.isempty (Tempresult)) {return "result is empty!";        } Student Student = Tempresult.stream (). Map (T-t.getvalue ()). FindFirst (). get ();        System.out.format ("The beginning of student[student_44] is%s\n", Student.getdob ());        /** aggregate function Query *//**[count]*/String sql_count = "SELECT count (1) from student"; Sqlfieldsquery countquery = new SqLfieldsquery (Sql_count);        Countquery.setreplicatedonly (TRUE);        list<list<?>> countlist = Tempcache.query (countquery). GetAll ();        Long Count = 0; if (!        Collectionutils.isempty (countlist)) {count = (Long) countlist.get (0). Get (0);        } System.out.format ("Count of Cache[student] is%s\n", count);        /**[sum]*/String sql_sum = "Select sum (studid) from student";        Sqlfieldsquery sumquery = new Sqlfieldsquery (sql_sum);        Sumquery.setreplicatedonly (TRUE);        list<list<?>> sumlist = Tempcache.query (sumquery). GetAll ();        Long sum = 0; if (!        Collectionutils.isempty (sumlist)) {sum = (Long) sumlist.get (0). Get (0);        } System.out.format ("Sum of cache[student.id] is%s\n", sum);    Return "All executed!"; }

The results of the operation are as follows:

the beginning of student[student_44] is Thu Sep 28 00:00:00 GMT+08:00 2017count of cache[student] is 500sum of cache[student.id] is 125250
3, data collocated and associated query

Data is collocated mainly for data stored in partition mode, so-called data collocated, is to provide a constraint, the related data stored on the same grid node, so that the data query or distributed computing when the data is not required to move, which will improve the overall performance.

The following is an example of the collocated of the three caches in X, Y, z, see the sample project for the complete code ignite-example .

where x, Y, Z is the three partition mode cache,y with the X., that is, the data of Y is stored, according to its XId properties, the data stored in the corresponding x node, in the same way, Z and Y collocated, that is, the data of z is stored in its Yid property corresponding to the node of the Y. To form a constraint, so that the allocation of data can be artificially controlled.

To use data collocated, you have to mention an API, that AffinityKey is, when a cache is collocated with another cache, its cache key is AffinityKey typed.

Start with data initialization:

CollocatedController.java

    Private String init () {if (Init.get ()) {return "already execute init.";                }//define three cache Cacheconfiguration<long, x> XCF = new Cacheconfiguration<long, x> ("X")        . Setcachemode (cachemode.partitioned). Setindexedtypes (Long.class, X.class); Cacheconfiguration<affinitykey<long>, y> ycf = new Cacheconfiguration<affinitykey<long>, Y> (        "Y"). Setcachemode (cachemode.partitioned). Setindexedtypes (Affinity.class, Y.class); Cacheconfiguration<affinitykey<long>, z> zcf = new Cacheconfiguration<affinitykey<long>, Z> (        "Z"). Setcachemode (cachemode.partitioned). Setindexedtypes (Affinity.class, Z.class);        Ignite.destroycache ("X");        Ignite.destroycache ("Y");        Ignite.destroycache ("Z");        Ignite.getorcreatecache (XCF);        Ignite.getorcreatecache (YCF); Ignite.getorcreatecache (ZCF);        Ignitecache<long, x> XC = Ignite.cache ("X");        Ignitecache<affinitykey<long>, y> YC = Ignite.cache ("Y");        Ignitecache<affinitykey<long> z> Zc = Ignite.cache ("Z");        Load data y y;        Z Z;            for (long i = 0; i < i++) {xc.put (I, New X (I, string.valueof (i)));            y = new Y (i, string.valueof (i), i);            Yc.put (Y.key (), y);            z = new Z (i, string.valueof (i), i);        Zc.put (Z.key (), z);        } init.set (True);    Return "all executed."; }

And after the cache is collocated, how to verify the success of the collocated? This is going to use Affinity.class the mapKeyToNode() method, its role is based on the given key, to find the node to store the key information, the specific use of the following methods:

@RequestMapping("/verify")public @ResponseBodyString verifyCollocate(HttpServletRequest request, HttpServletResponse response) throws Exception {    if(!init.get()){        init();    }    Affinity<Long> affinityX = ignite.affinity("X");    Affinity<Long> affinityY = ignite.affinity("Y");    Affinity<Long> affinityZ = ignite.affinity("Z");    for (long i = 0; i < 100; i++) {        ClusterNode nodeX = affinityX.mapKeyToNode(i);        ClusterNode nodeY = affinityY.mapKeyToNode(i);        ClusterNode nodeZ = affinityZ.mapKeyToNode(i);        if(nodeX.id() != nodeY.id() || nodeY.id() != nodeZ.id() || nodeX.id() != nodeZ.id()){            throw new Exception("cache collocated is error!");        }    }    System.out.println("cache collocated is right!");    return "all executed.";}

/verifyafter execution, no exception thrown, in the monitoring program to check the storage situation:

The data distribution of the three caches is exactly the same, consistent with the results of the validator (no exception thrown), indicating that the cache was collocated successfully.

When the data is collocated successfully, the associated query can be used, which can be analogous to the multi-table in the database:

@RequestMapping ("/query") public @ResponseBodyString query (httpservletrequest request, httpservletresponse response)    {if (!init.get ()) {init ();    } ignitecache<long, X> XC = Ignite.cache ("X");    Ignitecache<affinitykey<long>, y> YC = Ignite.cache ("Y");    Ignitecache<affinitykey<long> z> Zc = Ignite.cache ("Z"); String SQL1 = "from y,\" x\ ".    X "+" where Y.xid = x.id "+" and Y.info =? "; String sql2 = "from z,\" y\ ".    Y "+" where Z.yid = y.id "+" and Z.info =? "; String sql3 = "from z,\" y\ ". Y,\ "X\".    X "+" where Z.yid = y.id and Y.xid = x.id "+" and Z.info =? ";    int i = Intstream.range (1, +). Skip ((int) (100*math.random ())). FindFirst (). Getasint ();    SYSTEM.OUT.PRINTLN ("Query X and Y:");    System.out.println (Yc.query (new Sqlquery<affinitykey<long>, y> (Y.class, SQL1). Setargs (i)). GetAll ()); System.out.println ("**************************************************************************************");    SYSTEM.OUT.PRINTLN ("Query Y and Z:");    System.out.println (Zc.query (new Sqlquery<affinitykey<long>, z> (Z.class, SQL2). Setargs (i)). GetAll ());    System.out.println ("**************************************************************************************");    SYSTEM.OUT.PRINTLN ("Query X and Y and Z:");    System.out.println (Zc.query (new Sqlquery<affinitykey<long>, z> (Z.class, Sql3). Setargs (i)). GetAll ());    System.out.println ("**************************************************************************************"); Return "all executed";}

The results of the implementation are as follows:

query X and Y:[Entry [key=AffinityKey [key=83, affKey=83], val=org.cord.ignite.example.collocated.Y@605e8969]]**************************************************************************************query Y and Z:[Entry [key=AffinityKey [key=83, affKey=83], val=org.cord.ignite.example.collocated.Z@562dbd4]]**************************************************************************************query X and Y and Z:[Entry [key=AffinityKey [key=83, affKey=83], val=org.cord.ignite.example.collocated.Z@7ff851ce]]**************************************************************************************

If there is no collocated cache, you need to enable the non-collocated distributed association When you associate the query:SqlQuery.setDistributedJoins(true)

The data collocated can also use annotations @AffinityKeyMapped annotations, which are used AffinityKey .class similar to the use, and a complete example can be found inAffinityMappedController.class

At this point, Ignite's data processing related content ended.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.