International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Apache Ignite Series (iii): Data processing (loading, data collocated, data query)

Last Update:2018-08-20 Source: Internet

Author: User

Tags apache ignite

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A common idea for using Ignite is to import the data from an existing relational database into ignite and then use the data directly in ignite, which is equivalent to ignite as a caching service, and of course ignite functions much more than that. The following is a demonstration of ignite data storage and query-related functionality in a way that integrates ignite into a Java service. Because of personal habits, the sample demo does not use the test code, but instead uses the rest interface demo.

Several modes stored in ignite before the data is loaded (LOCAL, replicated, partitioned):

Local: Native mode, data is stored locally, no data rebalancing, similar to common storage services;

partitioned: Partitioning mode, where data is dispersed across nodes in the cluster, partitioning mode is ideal for storing large amounts of data

Is the number of backup backups set, the default number of backups is 0, if the partition mode does not set the number of backups, there is a risk of data loss.

replicated: Replication mode, with the data rebalancing process, the master node (Primary) data is consistent with the partition pattern, except that the copy mode defaults back up the rest of the data except the primary node. Replication mode is suitable for storing data with small data volumes and growing unhappiness.

Partitioning mode and replication mode each have advantages and disadvantages, the specific choice to be based on the characteristics of the actual scene to weigh:

Mode	Advantages	Disadvantages
Partition mode (partitioned)	Can store large amounts of data, frequent updates have little impact on it	Query caching involves data movement and has an impact on query performance
Replication Mode (Replicated)	Suitable for data with small data volume and stable data query performance	Frequent updates have a large impact on them

1, Data loading

This is used to mybatis query MySQL data and then deposit ignite, the complete code can refer to:

Github.com/cording/ignite-example

To demonstrate, you need to generate the sample data in MySQL first, and the associated SQL script ignite-example\src\main\resources\import.sql executes the SQL script to complete the creation of the table and the initialization of the test data.

Defining the cache in the configuration file

                <bean class="org.apache.ignite.configuration.CacheConfiguration">                    <property name="name" value="student"/>                    <property name="cacheMode" value="REPLICATED"/>                    <property name="backups" value="1"/>                    <property name="atomicityMode" value="ATOMIC"/>                    <property name="copyOnRead" value="false"/>                    <property name="dataRegionName" value="Default_Region"/>                    <property name="indexedTypes">                        <list>                            <value>java.lang.Long</value>                            <value>org.cord.ignite.data.domain.Student</value>                        </list>                    </property>                </bean>

Add dependent dependencies

<dependency>    <groupId>org.apache.ignite</groupId>    <artifactId>ignite-core</artifactId>    <version>${ignite.version}</version></dependency><dependency>    <groupId>org.apache.ignite</groupId>    <artifactId>ignite-spring</artifactId>    <version>${ignite.version}</version></dependency><!-- 使用索引的话需要用到ignite-indexing这个模块 --><dependency>    <groupId>org.apache.ignite</groupId>    <artifactId>ignite-indexing</artifactId>    <version>${ignite.version}</version></dependency>

In general, the way to import data into ignite clusters is to use cache.put(...) methods, but when there is a lot of data to import, the efficiency of put is not enough, for a large number of data import can use the Ignite stream processor:

DataLoader.java

......    /**导入数据到ignite*/    public void loadData(){        //查询student集合        List<Student> result = studentDao.findAllStudents();        //分布式id生成器        IgniteAtomicSequence sequence = ignite.atomicSequence("studentPk", 0, true);        //根据缓存名获取流处理器，并往流处理器中添加数据        try(IgniteDataStreamer<Long, Student> streamer = ignite.dataStreamer(CacheKeyConstant.STUDENT)) {            result.stream().forEach(r -> streamer.addData(sequence.incrementAndGet(), r));            //将流里面的剩余数据压进ignite            streamer.flush();        }    }......

After importing the data, you can see the data stored in the monitor program:

A stream can increase the speed of loading data because the flow is essentially a batch process. Ignite is guaranteed to be consistent through a consistent hash, and each time a cache record is deposited into the cluster, Ignite calculates which node the cache maps to, based on a consistent hashing algorithm, and then stores that record on that node. In a stream processor, the stream processor stores the data that is mapped to the same node in bulk to the corresponding node, which can significantly improve the efficiency of the data load.

2, data query

The most direct way to query caching is to use the cache.get(...) method, which can only deal with the simple key-value cache, if the index type is set (Indexedtypes), then the cache will become SQL table , this time need to use SQL query, When querying using SQL, there are generally a variety of query conditions, the fields corresponding to these query conditions need to be pre-set index. Ignite there are two kinds of indexes, one is a normal index, and one is a composite index, to use @QuerySqlField annotations. and the query to use the API is mainly SqlFieldsQuery and SqlQuery , the former is a domain query, that is, query part of the field result set, and the latter is a common query.

Therefore, if you want to use SQL queries, you need to set the index type (indexedtypes) in the cache definition before loading the data, add annotations to the related attributes in the corresponding entity class for the fields that may be used in the query, and set the index if necessary. When the cache is defined to set the index type, the cache is no longer the normal kv form of the cache, but with the characteristics of the database table, this time ignite into a distributed memory database, and its SQL-related functions are based on the H2 SQL engine implementation.

1) Set the cache index type

Setting the index type when Java code defines the cache

Here the long is the primary key, and string is the entity class as an example:

Use to CacheConfiguration.setIndexedTypes(Long.class, String.class) set the index

Setting the index type in the XML configuration

indexedTypesYou can also set properties

<bean class="org.apache.ignite.configuration.CacheConfiguration">......    <property name="indexedTypes">        <list>            <value>java.lang.Long</value>            <value>org.cord.ignite.data.domain.Student</value>        </list>    </property>......</bean>

2) annotations @QuerySqlFieldThree ways to use

Enable entity class properties for query domain

    @QuerySqlField    private String test;

After adding the annotation, the test field can be sql accessed in the statement, which does not create an index on the property column.

Enable the query domain and set a normal index for the column

    @QuerySqlField(index = true)    private String test;

Enable query fields and set up composite indexes

    @QuerySqlField(orderedGroups = {@QuerySqlField.Group(            name = "student", order = 0)})    private String name;    @QuerySqlField(orderedGroups = {@QuerySqlField.Group(            name = "student", order = 1)})    private String email;

Where the Name property specifies the name of the combined index, and order represents the sequence of the field in the combined index.

The combined index is similar to a normal database and also follows the leftmost principle , that is, whether the combined index is used to be subject to the least-left principle.

3) using Sqlfieldsquery for domain queries

There are two pre-defined fields in the SQL syntax _key and _val :

_key: Represents all keys in the cache

_val: represents all value objects in the cache

List<List<?>> res = cache.query(new SqlFieldsQuery("select _VAL,name from \"Test\".student")).getAll();System.out.format("The name is %s.\n", res.get(0).get(0));

4) Use SqlQueryMake a normal query

NormalController.class

    @RequestMapping ("/sqlquery") public @ResponseBody String sqlquery (httpservletrequest request, httpservletrespons        E response) {ignitecache<long, student> Tempcache = Ignite.cache (cachekeyconstant.student); /** Plain Query */String sql_query = "name =?"        and email =? ";        Sqlquery<long, student> csqlquery = new Sqlquery<> (Student.class, sql_query);        Csqlquery.setreplicatedonly (True). Setargs ("student_44", "student_44gmail.com");        List<cache.entry<long, student>> tempresult = Tempcache.query (csqlquery). GetAll ();        if (Collectionutils.isempty (Tempresult)) {return "result is empty!";        } Student Student = Tempresult.stream (). Map (T-t.getvalue ()). FindFirst (). get ();        System.out.format ("The beginning of student[student_44] is%s\n", Student.getdob ());        /** aggregate function Query *//**[count]*/String sql_count = "SELECT count (1) from student"; Sqlfieldsquery countquery = new SqLfieldsquery (Sql_count);        Countquery.setreplicatedonly (TRUE);        list<list<?>> countlist = Tempcache.query (countquery). GetAll ();        Long Count = 0; if (!        Collectionutils.isempty (countlist)) {count = (Long) countlist.get (0). Get (0);        } System.out.format ("Count of Cache[student] is%s\n", count);        /**[sum]*/String sql_sum = "Select sum (studid) from student";        Sqlfieldsquery sumquery = new Sqlfieldsquery (sql_sum);        Sumquery.setreplicatedonly (TRUE);        list<list<?>> sumlist = Tempcache.query (sumquery). GetAll ();        Long sum = 0; if (!        Collectionutils.isempty (sumlist)) {sum = (Long) sumlist.get (0). Get (0);        } System.out.format ("Sum of cache[student.id] is%s\n", sum);    Return "All executed!"; }

The results of the operation are as follows:

the beginning of student[student_44] is Thu Sep 28 00:00:00 GMT+08:00 2017count of cache[student] is 500sum of cache[student.id] is 125250

3, data collocated and associated query

Data is collocated mainly for data stored in partition mode, so-called data collocated, is to provide a constraint, the related data stored on the same grid node, so that the data query or distributed computing when the data is not required to move, which will improve the overall performance.

The following is an example of the collocated of the three caches in X, Y, z, see the sample project for the complete code ignite-example .

where x, Y, Z is the three partition mode cache,y with the X., that is, the data of Y is stored, according to its XId properties, the data stored in the corresponding x node, in the same way, Z and Y collocated, that is, the data of z is stored in its Yid property corresponding to the node of the Y. To form a constraint, so that the allocation of data can be artificially controlled.

To use data collocated, you have to mention an API, that AffinityKey is, when a cache is collocated with another cache, its cache key is AffinityKey typed.

Start with data initialization:

CollocatedController.java

    Private String init () {if (Init.get ()) {return "already execute init.";                }//define three cache Cacheconfiguration<long, x> XCF = new Cacheconfiguration<long, x> ("X")        . Setcachemode (cachemode.partitioned). Setindexedtypes (Long.class, X.class); Cacheconfiguration<affinitykey<long>, y> ycf = new Cacheconfiguration<affinitykey<long>, Y> (        "Y"). Setcachemode (cachemode.partitioned). Setindexedtypes (Affinity.class, Y.class); Cacheconfiguration<affinitykey<long>, z> zcf = new Cacheconfiguration<affinitykey<long>, Z> (        "Z"). Setcachemode (cachemode.partitioned). Setindexedtypes (Affinity.class, Z.class);        Ignite.destroycache ("X");        Ignite.destroycache ("Y");        Ignite.destroycache ("Z");        Ignite.getorcreatecache (XCF);        Ignite.getorcreatecache (YCF); Ignite.getorcreatecache (ZCF);        Ignitecache<long, x> XC = Ignite.cache ("X");        Ignitecache<affinitykey<long>, y> YC = Ignite.cache ("Y");        Ignitecache<affinitykey<long> z> Zc = Ignite.cache ("Z");        Load data y y;        Z Z;            for (long i = 0; i < i++) {xc.put (I, New X (I, string.valueof (i)));            y = new Y (i, string.valueof (i), i);            Yc.put (Y.key (), y);            z = new Z (i, string.valueof (i), i);        Zc.put (Z.key (), z);        } init.set (True);    Return "all executed."; }

And after the cache is collocated, how to verify the success of the collocated? This is going to use Affinity.class the mapKeyToNode() method, its role is based on the given key, to find the node to store the key information, the specific use of the following methods:

@RequestMapping("/verify")public @ResponseBodyString verifyCollocate(HttpServletRequest request, HttpServletResponse response) throws Exception {    if(!init.get()){        init();    }    Affinity<Long> affinityX = ignite.affinity("X");    Affinity<Long> affinityY = ignite.affinity("Y");    Affinity<Long> affinityZ = ignite.affinity("Z");    for (long i = 0; i < 100; i++) {        ClusterNode nodeX = affinityX.mapKeyToNode(i);        ClusterNode nodeY = affinityY.mapKeyToNode(i);        ClusterNode nodeZ = affinityZ.mapKeyToNode(i);        if(nodeX.id() != nodeY.id() || nodeY.id() != nodeZ.id() || nodeX.id() != nodeZ.id()){            throw new Exception("cache collocated is error!");        }    }    System.out.println("cache collocated is right!");    return "all executed.";}

/verifyafter execution, no exception thrown, in the monitoring program to check the storage situation:

The data distribution of the three caches is exactly the same, consistent with the results of the validator (no exception thrown), indicating that the cache was collocated successfully.

When the data is collocated successfully, the associated query can be used, which can be analogous to the multi-table in the database:

@RequestMapping ("/query") public @ResponseBodyString query (httpservletrequest request, httpservletresponse response)    {if (!init.get ()) {init ();    } ignitecache<long, X> XC = Ignite.cache ("X");    Ignitecache<affinitykey<long>, y> YC = Ignite.cache ("Y");    Ignitecache<affinitykey<long> z> Zc = Ignite.cache ("Z"); String SQL1 = "from y,\" x\ ".    X "+" where Y.xid = x.id "+" and Y.info =? "; String sql2 = "from z,\" y\ ".    Y "+" where Z.yid = y.id "+" and Z.info =? "; String sql3 = "from z,\" y\ ". Y,\ "X\".    X "+" where Z.yid = y.id and Y.xid = x.id "+" and Z.info =? ";    int i = Intstream.range (1, +). Skip ((int) (100*math.random ())). FindFirst (). Getasint ();    SYSTEM.OUT.PRINTLN ("Query X and Y:");    System.out.println (Yc.query (new Sqlquery<affinitykey<long>, y> (Y.class, SQL1). Setargs (i)). GetAll ()); System.out.println ("**************************************************************************************");    SYSTEM.OUT.PRINTLN ("Query Y and Z:");    System.out.println (Zc.query (new Sqlquery<affinitykey<long>, z> (Z.class, SQL2). Setargs (i)). GetAll ());    System.out.println ("**************************************************************************************");    SYSTEM.OUT.PRINTLN ("Query X and Y and Z:");    System.out.println (Zc.query (new Sqlquery<affinitykey<long>, z> (Z.class, Sql3). Setargs (i)). GetAll ());    System.out.println ("**************************************************************************************"); Return "all executed";}

The results of the implementation are as follows:

query X and Y:[Entry [key=AffinityKey [key=83, affKey=83], val=org.cord.ignite.example.collocated.Y@605e8969]]**************************************************************************************query Y and Z:[Entry [key=AffinityKey [key=83, affKey=83], val=org.cord.ignite.example.collocated.Z@562dbd4]]**************************************************************************************query X and Y and Z:[Entry [key=AffinityKey [key=83, affKey=83], val=org.cord.ignite.example.collocated.Z@7ff851ce]]**************************************************************************************

If there is no collocated cache, you need to enable the non-collocated distributed association When you associate the query:SqlQuery.setDistributedJoins(true)

The data collocated can also use annotations @AffinityKeyMapped annotations, which are used AffinityKey .class similar to the use, and a complete example can be found inAffinityMappedController.class

At this point, Ignite's data processing related content ended.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

apache log post data apache derby data types cassandra time series data example mapreduce simplified data processing on large clusters pci data acquisition and signal processing controller apache in memory data grid storing time series data in sql server

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Apache Ignite Series (iii): Data processing (loading, data collocated, data query)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support