ES distributed Document System _bulk API's unique JSON format and underlying performance optimization relationship

Source: Internet
Author: User

Tag: class translates to Java Virtual machine AC size is the system log

1. Bulk API Peculiar JSON format
{"Action": {"meta"}}\n
{"Data"}\n
{"Action": {"meta"}}\n
{"Data"}\n

2. If you use a better JSON array format
[{
' Action ': {
"Meta"
},
"Data": {
}
}]
Allow any line break, the entire readability is very good, ES to the standard format of the JSON string, to follow the following process to deal with
(1) Parse the JSON array into a Jsonarray object, this time the entire data will appear in memory an identical copy, a copy of the data is JSON text, a copy of the data is Jsonarray object
(2) Parse each JSON in the JSON array to route the document in each request
(3) Create a request array for routing to multiple requests on the same shard
(4) Serialization of this request array
(5) Send the serialized request array to the corresponding node

3. Consume more memory, more JVM GC overhead
Bulk size has the best sizing problem, generally recommended in thousands of, then file size around 10MB. Suppose that now 100 bulk requests sent to a node, and then each request 10mb,100 a request is 1000MB=1GB, and then each request JSON copy a copy of the Jsonarray object, the memory will be doubled, the result occupies 2GB of memory, Even more than that, because after making a jsonaray, there may be more data structures, 2gb+ memory consumption.
Consuming more memory squeezes the memory usage of other requests, such as the most important search requests, parsing requests, and so on, which can cause the performance of other requests to plummet. In addition, more memory, resulting in more garbage collection of Java virtual machines, more frequent, each need to reclaim more garbage objects, causing ES Java virtual machine to stop working threads more time.

4, now the peculiar format
(1) Do not convert it to Jsonarray object, no copy of the same data in memory, cut JSON directly according to line break
(2) For each of the two sets of JSON, read meta, document routing
(3) Send the corresponding JSON directly to node
The biggest advantage is that you do not need to parse the JSON array into a Jsonarray object, creating a copy of the big data, wasting memory space, and ultimately ensuring performance as much as possible.

ES distributed Document System _bulk API's unique JSON format and underlying performance optimization relationship

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.