Infinispan ' s gridfilesystem--based on memory grid file system

Source: Internet
Author: User
Tags file system hash infinispan advantage

Brief introduction

Infinispan is a successor to the JBoss cache caching framework, an Open-source data grid platform for accessing distributed state cluster nodes. The Gridfilesystem (grid file system) is a completely new experimental API that enables the Infinispan back-end grid data to be displayed like a "File system." This series of APIs inherits the JDK's File,inputstream and OutputStream classes and creates the corresponding: Gridfile,gridinputstream and Gridoutputstream classes. There is also a Help class Gridfilesystem, which is also included in this framework. These APIs are available in Infinispan version 4.1.0 (from the 4.1.0.ALPHA2 version).

The Gridfilesystem contains two Infinispan buffers: one for the metadata cache (usually full replication) and the other for the actual data cache (usually distributed). The previous replication buffer makes each node have metadata information locally, and tasks such as listing a file list do not have to be invoked using RPC remote procedures. The latter is a distributed buffer that requires an extensible mechanism for storing the data when storage space is used up. All files are divided into chunks, each of which is stored as a cache entry.

The feature we are focusing on in this article is the Infinispan distributed pattern. This pattern adds a "distributed" feature, which is a technique based on hash consistency. The Jbosscache framework only supports "copy" mode (that is, every node in the cluster replicates all the data to the other node).

The full replication technology can be used well for small clusters, or for a relatively small amount of storage data per node. In a cluster, when each node replicates data to other nodes, the average data storage capacity of each node is related to the size of the cluster and the capacity of the data. The advantage of this replication is that it usually reads data only on local nodes, because each node owns the data, and it does not need to be load balanced when new nodes in the cluster are joined or if an existing node needs to be removed.

On the other hand, the memory grid file system is a better solution when you need to quickly access large data sets, and you can't tolerate retrieving data from disks such as databases.

In the previous article, we discussed Replcache, which uses a distributed technology based on hash consistency to implement a grid data container. In a way, Replcache is the archetype of the infinispan distribution pattern.

In Infinispan, data can be stored in a grid, regardless of whether there are redundant backups. For example, only the Infinispan configuration item is set to distributed cache mode (the distribution cache pattern), the Numowners (owner number) is set to 1, and data d is stored in the grid. In this case, based on the hash consistency algorithm, Infinispan only selects one server node to store the data d. If we set Numowners to 2, then Infinispan will select two servers to store data d, and so on.

The advantage of Infinispan is that it provides aggregated grid memory. For example, suppose we have 5 hosts, each host has 1GB of memory, and then we set the parameter numowners to 1, so we have a total of 5GB memory capacity-which obviously lowers the expense. Even with redundant backups-for example, setting Numowners to 2-our configuration also has a 2.5GB of memory capacity.

However, there is a problem: if we have a lot of 1K of data items, but only a small number of 200MB size of data items, which will result in uneven distribution of data. Some servers consume almost all of the memory heap because they store 200MB of data items, while other servers may not have the memory to use.

Another problem is that if the current data item is larger than the available memory heap for a given single server: for example, when we try to store a 2GB data item, the operation fails because the data item is greater than the 1GB memory heap of the server node, and the Infinispan method is invoked Cache.put (k,v , it can cause outofmemoryexception (memory overflow) errors.

To solve these problems, we want to divide a data item into chunks (chunks) and store them in the nodes of the cluster. We set the size of a block to 8K: if we divide the 2GB data items into 8K sized chunks, we will eventually get 250,000 8K blocks in the grid. Storing 250,000 equal chunks in a grid is certainly more balanced than storing a small number of 200MB of data items.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.