Hazelcast Cluster Service (1)--hazelcast Introduction

Last Update:2016-12-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is Hazelcast?

"Distributed", "Cluster Service", "Network format memory Data", "Distributed Cache", "Elastic Scalable Service"-these shiny nouns are the perfect choice for ITER. In the Javaer world, there is an open source project that only needs to introduce a jar package, simple configuration and coding to achieve the above high-end skills, he is hazelcast.

hazelcast is by Hazelcast Company (yes, this company is also called hazelcast! Development and maintenance of open source products that can provide distributed clustering and distributed cache services for a variety of applications running on a JVM-based environment. Hazelcast can be embedded in any use of Java, C + +,. NET developed products (c + +,. NET provides only client access). Hazelcast has now been updated to version 3.X, and most of the data structures in Java are implemented in a distributed manner. For example, Javaer familiar map interface, when creating a map instance through Hazelcast, call on node A, Map::p ut ("A", "A_data") method to add data, Node B uses map::get ("A") You can obtain data with a value of "A_data" . The Hazelcast provides a distributed implementation of the Map, Queue, MultiMap, Set, List, Semaphore, atomic , and other interfaces, provided based on topic implemented Message Queuing or subscription \ Publish mode, provides a distributed ID generator ( idgenerator ), provides distributed event drivers ( distributed events ), and provides distributed computing ( Distributed Computing ); distributed Queries ( distributed query ) are provided. In general, data results or model are often used in independent JVMs, and Hazelcast provides implementations of distributed clusters.

Hazelcast has an open source version and a commercial version. The open source version complies with the Apache License 2.0 Open Source agreement for free. The major difference between a commercial version and a specific license is that the commercial version provides high-density storage of data. We all know that the JVM has its own specific GC mechanism, whether the data is in the heap or the stack, as long as the data blocks found invalid references, it is possible to be recycled. While Hazelcast 's distributed data is stored in the JVM's memory, frequent read and write data can result in a large amount of GC overhead. Using the commercial version of Hazelcast will have high-density storage features, greatly reducing the memory overhead of the JVM, thus reducing GC overhead.

Many open source products use Hazelcast to build microservices clusters, such as our vert.x, and prefer to use hazelcast to build distributed services. Interested to see my share--http://my.oschina.net/chkui/blog/678347, this article explains how Vert.x uses Hazelcast to assemble a cluster.

Report:

Hazelcast Source: Https://github.com/hazelcast/hazelcast
Questions about Hazelcast can go to https://github.com/hazelcast/hazelcast/issues or http://stackoverflow.com.

Hazelcast feature autonomous Clusters (no centering)

Hazelcast does not have any central node (the node in this article can be understood as a standalone JVM running on any server, the same as the same), or hazelcast does not need to specifically designate a central node. During the run, it selects a node in the cluster itself as the central point to manage all the nodes.

Data by application distributed storage

The data of Hazelcast is distributed storage. He will store the data as much as possible on the nodes that need to use the data to achieve the purpose of data de-centering. In traditional data storage models (MYSQL, Mongdb, Redis , and so on) data are stored separately from the application, and when the performance of the database needs to be improved, the performance of a single database application needs to be continuously hardened. Even now a large number of databases support cluster mode or read/write separation, but the basic idea is that some libraries support the writing of data, the other library constantly copy updates copies of the data. The downside of doing this is that there is a lot of dirty reading, and the second is consuming a lot of resources to pass the data--it consumes extra resources to read and write frequently from the data source, which increases exponentially as the amount of data grows or the master-slave service is created more and more.

The use of Hazelcast can effectively solve data center problems. He stores the data in each node, and the more nodes are scattered. Each node has its own application service, and the Hazelcast cluster stores the data in a decentralized way based on the data usage of each application, where the data is kept as close to the application as possible. These data in the cluster share the entire cluster's storage space and compute resources.

Anti-single point of failure

The nodes in the cluster are not centralized, and each node may exit at any time or enter at any time. Therefore, the data stored in the cluster will have a backup (you can configure the number of backups, or you can turn off data backup). Such a way is somewhat similar to Hadoop, where a piece of data is stored at one node and there must be at least one backup present at the other node. When a node exits, the data stored on the node is replaced by the backup data, and the cluster re-creates the new backup data.

Ease of

All Hazelcast functions simply refer to a jar package, except that he does not rely on any third party packages. This makes it easy and efficient to embed it in a variety of application servers without worrying about additional problems (jar package collisions, type collisions, and so on). He only offers a range of distributed features, without the need to bind any frames to use, so it works for any scenario.

In addition to the above features,Hazelcast also supports the server/client model, supports script management, enables fast integration with Docker , and more.

Simple use examples

With so many concepts in front, it is necessary to have a little dry. Here is a minimalist example of using hazelcast . All the code in this article is on GitHub: Https://github.com/chkui/hazelcast-demo.

First, the hazelcast jar package is introduced.

Maven (pom.xml):

<Dependency>    <groupId>Com.hazelcast</groupId>    <Artifactid>Hazelcast</Artifactid>    <version>${hazelcast.vertsion}</version></Dependency>

Gradle (build.gradle):

Compile com.hazelcast:hazelcast:${hazelcast.vertsion}

Create a built-in hazelcast node First:

//Org.palm.hazelcast.getstart.HazelcastGetStartServerMaster Public classHazelcastgetstartservermaster { Public Static voidMain (string[] args) {//Create a Hazelcastinstance instanceHazelcastinstance instance =hazelcast.newhazelcastinstance (); //Create a cluster mapMap<integer, string> clustermap = Instance.getmap ("MyMap"); Clustermap.put (1, "Hello hazelcast map!"); //Create a cluster queuequeue<string> clusterqueue = Instance.getqueue ("Myqueue"); Clusterqueue.offer ("Hello hazelcast!"); Clusterqueue.offer ("Hello hazelcast queue!"); }}

The code above uses a hazelcast instance to create a node. The example then creates a distributed map and a distributed queue, and adds data to the data structures. Running this main method, you will see the following on the console:

Members [1] {
Member [192.168.1.103]:5701 This
}

Then create another node:

//Org.palm.hazelcast.getstart.HazelcastGetStartServerSlave Public classHazelcastgetstartserverslave { Public Static voidMain (string[] args) {//Create a Hazelcastinstance instanceHazelcastinstance instance =hazelcast.newhazelcastinstance (); Map<integer, string> clustermap = Instance.getmap ("MyMap"); Queue<String> clusterqueue = Instance.getqueue ("Myqueue"); System.out.println ("Map Value:" + clustermap.get (1)); System.out.println ("Queue Size:" +clusterqueue.size ()); System.out.println ("Queue Value 1:" +Clusterqueue.poll ()); System.out.println ("Queue Value 2:" +Clusterqueue.poll ()); System.out.println ("Queue Size:" +clusterqueue.size ()); }}

The function of this node is to read the data from the Map and Queue and output it. The run will see the following output

Members [2] {
Member [192.168.1.103]:5701
Member [192.168.1.103]:5702 This
}

August 06, 2016 11:33:29 pm Com.hazelcast.core.LifecycleService
Info: [192.168.1.103]:5702 [Dev] [3.6.2] address[192.168.1.103]:5702 is STARTED
Map Value:hello Hazelcast map!
Queue Size:2
Queue Value 1:hello hazelcast!
Queue Value 2:hello hazelcast queue!
Queue size:0

At this point, a cluster of 2 nodes is created. The first node adds {key:1,value: "Hello hazelcast map!"} to the map instance, adding ["Hello hazelcast!", "Hello Hazelcast queue!" to the queue instance, The second node reads and prints the data.

In addition to using the Hazelcast service directly to build the cluster, Hazelcast also provides a client application package that differs from the server side. The biggest difference between a client and a server is that he does not store data and cannot modify the data in the cluster. Currently, the client has multiple versions of C + +,. Net, and Java.

Use the client first to introduce the client jar package.

Maven (pom.xml):

<Dependency>    <groupId>Com.hazelcast</groupId>    <Artifactid>Hazelcast-client</Artifactid>    <version>${hazelcast.version}</version></Dependency>

Gradle (build.gradle):

Compile com.hazelcast:hazelcast-client:${hazelcast.vertsion}

Create a client node.

 Public classhazelcastgetstartclient { Public Static voidMain (string[] args) {ClientConfig clientconfig=NewClientConfig (); Hazelcastinstance instance=hazelcastclient.newhazelcastclient (ClientConfig); Map<integer, string> clustermap = Instance.getmap ("MyMap"); Queue<String> clusterqueue = Instance.getqueue ("Myqueue"); System.out.println ("Map Value:" + clustermap.get (1)); System.out.println ("Queue Size:" +clusterqueue.size ()); System.out.println ("Queue Value 1:" +Clusterqueue.poll ()); System.out.println ("Queue Value 2:" +Clusterqueue.poll ()); System.out.println ("Queue Size:" +clusterqueue.size ()); }}

Then start hazelcastgetstartservermaster::main, and then start hazelcastgetstartclient::main. You can see the client output:

Members [1] {
Member [192.168.197.54]:5701
}

August 08, 2016 10:54:22 am Com.hazelcast.core.LifecycleService
Information: hazelcastclient[hz.client_0_dev][3.6.2] is client_connected
Map Value:hello Hazelcast map!
Queue Size:2
Queue Value 1:hello hazelcast!
Queue Value 2:hello hazelcast queue!
Queue size:0

At this point, the client functionality is also created. You can see that the console output of the client is much less than the server, because the client does not have to host the data processing capabilities of the server or maintain various node information.

Example Run parsing

Below we are based on the output of the console to see what hazelcast started. (The output below may vary depending on the environment or the IDE)

Class:com.hazelcast.config.XmlConfigLocator
Info:loading ' Hazelcast-default.xml ' from Classpath.

The output here represents the configuration file that was loaded at hazelcast startup. If the user does not provide a valid configuration file,hazelcast uses the default profile. Subsequent articles will detail the configuration of the hazelcast .

Class:com.hazelcast.instance.DefaultAddressPicker
Info:prefer IPv4 Stack is true.
Class:com.hazelcast.instance.DefaultAddressPicker
info:picked address[192.168.197.54]:5701, using socket serversocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any Local is True

This section of the output illustrates the current hazelcast network environment. The first is to detect IPv4 available and check to the current IPV4 address is 192.168.197.54. Then use IPV6 to enable the socket. In some environments where IPv6 cannot be used, it is mandatory to specify the use of IPv4 to increase the JVM startup parameters:-djava.net.preferipv4stack=true .

Class:com.hazelcast.system
Info:hazelcast 3.6.2 (20160405-0f88699) starting at address[192.168.197.54]:5701
Class:com.hazelcast.system
Info: [192.168.197.54]:5701 [Dev] [3.6.2] Copyright (c) 2008-2016, Hazelcast, Inc. All rights Reserved.

This output illustrates that the initialization port number for the current instance is 5701. hazelcast uses port 5701 by default. If the port is found to be occupied, it will be +1 to see if 5702 is available, and if it is still unavailable, it will continue to be probed back until 5800. Hazelcast uses a port of 5700 to 5800 by default and throws a startup exception if none is used.

Class:com.hazelcast.system
Info: [192.168.197.54]:5701 [Dev] [3.6.2] configured Hazelcast serialization Version:1
Class:com.hazelcast.spi.OperationService
Info: [192.168.197.54]:5701 [Dev] [3.6.2] backpressure is disabled
Class:com.hazelcast.spi.impl.operationexecutor.classic.ClassicOperationExecutor
Info: [192.168.197.54]:5701 [Dev] [3.6.2] starting with 2 generic operation threads and 4 partition operation threads.

This section illustrates how data is serialized and which threads are enabled. There are 2 serialization methods for transferring data between nodes in Hazelcast , which are described in detail in subsequent articles. hazelcast will control multiple threads to perform different tasks, have responsibility for maintaining node connections, and are responsible for data partition management.

Class:com.hazelcast.instance.Node
Info: [192.168.197.54]:5701 [Dev] [3.6.2] Creating Multicastjoiner
Class:com.hazelcast.core.LifecycleService
Info: [192.168.197.54]:5701 [Dev] [3.6.2] address[192.168.197.54]:5701 is starting
Class:com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThreadingModel
Info: [192.168.197.54]:5701 [Dev] [3.6.2] Tcpipconnectionmanager configured with Non Blocking io-threading model:3 input Threads and 3 Output threads
Class:com.hazelcast.cluster.impl.MulticastJoiner
Info: [192.168.197.54]:5701 [Dev] [3.6.2]

In the above section of the output,Creating Multicastjoiner represents the use of a multicast protocol to assemble a cluster. 6 are also created to maintain non-congested information output \ output.

Members [1] {
Member [192.168.197.54]:5701
Member [192.168.197.54]:5702 This
}

Class:com.hazelcast.core.LifecycleService
Info: [192.168.197.54]:5701 [Dev] [3.6.2] address[192.168.197.54]:5701 is STARTED
Class:com.hazelcast.partition.InternalPartitionService
Info: [192.168.197.54]:5701 [Dev] [3.6.2] Initializing cluster partition table arrangement ...

MEMBERS[2] indicates that the current cluster has only 2 nodes. 2 nodes are on this device with IP 192.168.197.54, 2 nodes occupy 5701 ports and 5702 ports respectively. The This on the back of the port indicates that this is the current node, and that this is not marked as a node for other access clusters. The last Internalpartitionservice output indicates that the cluster initializes the "Data Shard", and the concept and principle of "data sharding" are described later.

This is the startup process that Hazelcast performs by default, and we can see that during initialization we can modify some hazelcast behavior in a targeted way:

Use the default configuration document Hazelcast-default.xml to start the cluster. so we can customize this configuration file to affect the behavior of Hazelcast .
Enable IPV4 or IPV6 to build the cluster, so you can know that the communication of the Hazelcast cluster is based on TCP, UDP, and needs to open the socket to support cluster interaction. so we can specify the communication scheme to use.
Hazelcast will start multiple threads to perform different tasks, some responsible for maintaining the data, some responsible for cluster communication, and some responsible for some underlying operations. so we can configure and manage these threads.
hazelcast uses mulitcast(Multicast protocol) to assemble the cluster by default, so in a LAN environment he can complete the cluster setup without having to configure himself. so we can specify the use of TCP/IP or other communication protocols.
Hazelcast will explore the ports that it can use by default, using ports between 5700 and 5800 that are not occupied. so we can configure how these ports are used.
Hazelcast Initializes a scheme called "Data sharding" to manage and store data. so we can adjust and control these data shards.

All of the above sections of the red font can be affected by the configuration file. The relevant configuration instructions (to be continued) are described in detail in subsequent articles.

-----------------------------------The parting line of a blind man-----------------------------------

If you are not interested in the fundamentals of Hazelcast, you do not have to look down on the "Run Structure" and "Data sharding principles" and go directly to hazelcast Basic configuration (http://my.oschina.net/chkui/blog/732408) Learn how to use the Hazelcast bar.

HAZELCAST Operating Structure

Hazelcast's official online list of 2 modes of operation, one is peer-to-peer (point-to-point) mode, one is in the point-to-point mode of the expansion of the C/S mode. Is the extension structure of the peer-to mode.

In peer mode, all nodes (node) are service nodes in the cluster, providing the same functionality and computing power. Each node shares the overall performance of the cluster, and each additional node can linearly increase the cluster capability.

On the basis of the peer service cluster, we can add many clients to the cluster, thus forming the C/s mode of the cluster, providing the service cluster as the S-end, and the client of the access as the end of the terminal. These clients do not share the performance of the cluster, but use the various resources of the cluster. The structure is the client access to the cluster situation.

The client can be provided with special caching capabilities that tell the cluster to keep the number of nodes that it frequently uses in the "closest" node.

Concept and principle of hazelcast shards

Hazelcast Storage and management of all data entering the cluster through sharding, the goal of using sharding is to ensure that the data can be read and written quickly, the data is not lost due to the node exit, and the node can expand the storage capacity linearly. The following is a theoretical explanation of how hazelcast is managed for sharding.

Sharding

Each data shard (shards) of the Hazelcast is referred to as a partition (partitions). Partitions are memory segments, each of which contains hundreds of to thousands of data entries, depending on the capacity of the system's memory, by default,Hazelcast divides the data into 271 partitions, and each partition has a backup copy. When a cluster member is started, the 271 partitions will be started together.

Shows the partitioning situation when the cluster has only one node.

From the partitioning of a node, it can be seen that when only one node is started, all 271 partitions are stored in one node. Then we start the second node. The following partitioning method will appear.

In a two-node diagram, the primary partition is marked with black text, and a blue text marks the copy partition (backup partition). The first member has 135 primary partitions (the black part), all of which have a copy of the second member (the blue part), and, similarly, the first member will have a copy of the data for the second member.

When more members are added,hazelcast migrates the master data and backup data one by one to the new member, eventually achieving data balance between the members and backing up each other. When hazelcast occurs, only the smallest number of partitions is moved. The partition distribution of 4 member nodes is rendered.

Several of the above illustrations illustrate how the Hazelcast is performed on the partition. Typically, the distribution of partitions is unordered, and they are randomly distributed across nodes in the cluster. Most importantly, Hazelcast will distribute the partition before the member evenly and create a backup between the members uniformly.

In the Hazelcast 3.6 version, a new cluster member was added: "Lite members", which is characterized by the non-possession of any partition. The goal of the thin member is for the high Density Operations task (computationally-heavy task executions. CPU-intensive operations) or register for monitoring (listener). Although "thin members" do not have their own partitions, they can also access the partitions of other members of the cluster.

In general, when a node in a cluster sends a change (enter or exit), it will cause the partition to move around the node and rebalance to ensure that the data is stored evenly. However, if the "thin node" entry or exit does not result in repartitioning, because the thin node does not save any partitions.

Data partition Management

After the partition is created, Hazelcast will store all the data in each partition. It distributes the data to each partition through hashing operations. Get the stored Data key value (such as map) or value values (such as topic, List), and then do the following:

Converts the set key or value to byte[];
The converted byte[] is hashed;
The result of the hash calculation and the number of partitions (271) are modeled (same remainder operation, MoD operation,% operation).

Because byte[] is the same modulo operation as 271, the result of the calculation must be between 0~270, and according to this value can be assigned to the partition to hold the data.

Partition table

When a partition is created, all members of the cluster must know what node each partition is stored in. So the cluster also needs to maintain a partitioned table to track this information.

When the first node is started, a partitioned table is created with it. The table contains the ID of the partition and the cluster node to which he belongs. The goal of a partitioned table is to have all nodes in the cluster (including "thin Nodes") gain access to the data store information, ensuring that each node knows where the data is. The oldest node in the cluster (usually the first member to start) sends a partitioned table to all nodes periodically. In this way, all nodes in the cluster will be notified when ownership of the partition changes. There are many situations where the ownership of a partition changes, for example, a new node is added, or the node leaves the cluster. If the first node in the cluster is shut down, then the node that is started will inherit the task of sending the partitioned table and continue sending the partitioned table to all members.

Original address: https://my.oschina.net/chkui/blog/729698

Hazelcast Cluster Service (1)--hazelcast Introduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hazelcast Cluster Service (1)--hazelcast Introduction

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support