1. Overview
Apache Ignite, like Apache Arrow, is a memory distributed management system in the Big Data category. Arrow is described in Apache arrow memory data, which unifies the data formats of various ecosystems in the big data domain, avoiding the resource overhead associated with serialization and deserialization (which can save around 80% of CPU resources). Today to give you an analysis of the Apache Ignite related content.
2. Content
Apache Ignite is a memory-centric data platform with strong consistency, high availability, powerful SQL, k/v, and its corresponding application interface (API). The structure distribution diagram is as follows:
Across multiple nodes in the Ignite cluster, there are three data patterns in the Ignite memory, local, replicated, and partitioned, respectively. This increases the extensibility of the ignite, ignite can automate the control of how data is partitioned, and users can insert custom methods, or part of the data to provide efficiency.
Ignite and other relational databases have similar behavior, but are slightly different in terms of handling constraints and indexes. Ignite supports primary and level two indexes, but only one-level indexes support uniqueness. In the persistence aspect, the Ignite cure exists in the memory and the disk can work well, but the persistence to the disk can be disabled, generally uses the ignite as a memory database.
Since Ignite is a full-featured data grid, it can be used either in pure memory mode or with ignite native persistence. It can also integrate with any third-party database, including RDBMS and NoSQL. For example, in and Hadoop, HDFs, Kafka, etc., developed based on the big Data platform of the SQL engine, to operate HDFs, Kafka such large data storage media.
2.1 Memory and Disk
Apache Ignite is based on a cured memory architecture that stores and processes data and indexes on memory and disk when the Ignite persistent storage feature is turned on. When cured memory and ignite persistent storage are turned on, the following benefits are:
2.1.1 Memory Benefits
- External memory
- Avoid significant GC pauses
- Automated debris cleanup
- Predictable memory consumption
- High SQL performance
2.1.2 Disk Advantages
- Optional persistence
- Supports SSD media
- Distributed storage
- Supporting Things
- Cluster instantaneous start
2.2 Persistence Process
Ignite persistent storage is a distributed, acid-compatible, SQL-compliant disk storage. It is an optional disk layer that can store data and indexes on disk media such as SSDs and can be transparently integrated with ignite-cured memory. Ignite's persistent storage has the following advantages:
- SQL operations can be performed in the data, regardless of whether the data is in memory or on disk, which means that ignite can be used as a memory-optimized distributed SQL database
- Without having to say that all the data and indexes remain in memory, persistent storage can store large data collections on disk, and then keep only a subset of the data that is frequently accessed in memory
- Cluster is instantaneous start, if the entire cluster down, do not need to pre-load data to the memory of the data "warm up", only need to connect all the nodes of the cluster together, the entire cluster will work properly
- Data and indexes are stored in similar formats in memory and disk to avoid complex format conversions, and datasets are simply moved between memory and disk
The persistence process is as follows:
2.3 Distributed SQL Memory database
Distributed SQL database functionality is available in Apache Ignite, a memory database that scales horizontally, is fault tolerant, and is compatible with standard SQL syntax that supports all SQL and DML commands, including SELECT, INSERT, delete, and so on. Depending on the cured memory architecture, datasets and indexes can be stored both in memory and on disk, so that distributed SQL operations can be performed across different tiers of storage to achieve memory-level performance that can be cured to disk. You can use native APIs such as Java, Python, C + +, and so on to manipulate SQL to interact with ignite data, or you can use Ignite's JDBC or ODBC driver, which has a true cross-platform connectivity. The specific architecture, as shown in:
3. Code practices
After understanding the role of Apache, we can write a big data SQL engine by simulation to implement the query of Kafka topic. First you need to implement a kafkasqlfactory class, the implementation code is as follows:
/** * TODO * * @author Smartloli. * * Created by Mar 9, 2018 */public class Kafkasqlfactory {private static final Logger LOG = Loggerfactory.getlogg ER (kafkasqlfactory.class);p rivate static Ignite Ignite = null;private static void getinstance () {if (Ignite = = null) {Igni Te = Ignition.start ();}} private static Ignitecache<long, topicx> processor (list<topicx> collectors) {getinstance (); Cacheconfiguration<long, topicx> topicdatacachecfg = new Cacheconfiguration<long, TopicX> (); Topicdatacachecfg.setname (Topiccache.name); Topicdatacachecfg.setcachemode (cachemode.partitioned); Topicdatacachecfg.setindexedtypes (Long.class, Topicx.class); Ignitecache<long, topicx> TopicDataCache = Ignite.getorcreatecache (topicdatacachecfg); for (Topicx topic:collectors) {topicdatacache.put (Topic.getOffsets (), topic);} return Topicdatacache;} public static string sql (String sql, list<topicx> collectors) {try {ignitecache<long, topicx> Topicdatacache = Processor (ColleCtors); Sqlfieldsquery qry = new sqlfieldsquery (SQL); querycursor<list<?>> cursor = topicdatacache.query (qry); for (list<?> Row:cursor) { System.out.println (Row.tostring ());}} catch (Exception ex) {Log.error ("Query Kafka topic have error, MSG is" + ex.getmessage ());} finally {close ();} Return "";} private static void Close () {try {if (ignite! = null) {Ignite.close ()}} catch (Exception ex) {Log.error ("Close Ignite has Error, MSG is "+ ex.getmessage ());} Finally {if (ignite! = null) {Ignite.close ();}}}}
Then, the simulation writes a producer to produce the data, and queries the data set, the implementation code looks like this:
public static void Ignite () {list<topicx> collectors = new arraylist<> (); int count = 0;for (int i = 0; i < 1 0; i++) {topicx td = New Topicx (); if (Count > 3) {count = 0;} Td.setpartitionid (count); td.setoffsets (i); Td.setmessage ("Hello_" + i); Td.settopicname ("test"); Collectors.add (TD) ; count++;} String sql = "Select Offsets,message from Topicx where Offsets>6 and PartitionID in (0,1) limit 1"; Long stime = System. Currenttimemillis (); Kafkasqlfactory.sql (SQL, collectors); System.out.println ("Cost time [" + (System.currenttimemillis ()-stime)/1000.0 + "]ms");}
The execution results are as follows:
4. Summary
Apache ignite as a whole, it basically integrates some of the concepts that are now distributed, including distributed storage, distributed computing, distributed services, streaming computing, and so on. And, its support for the Java language, well integrated with the JDK, can be very friendly to the existing JDK API, when you open a thread pool, you do not need the relationship is a local thread pool or a distributed thread pool, just submit the task. Apache Ignite integrates with traditional relational databases such as RDBMS, Hadoop, Spark, Kafka, and mainstream big data suites, providing a very flexible and usable component API.
5. Concluding remarks
This blog is to share with you here, if you study in the process of learning what is the problem, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June encouragement!
Apache Ignite Anatomy