NEO4J data insertion Test

Last Update:2018-06-06 Source: Internet

Author: User

Tags neo4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CPU: I32.4Ghz4 core, 8 GB memory mode 1: using its Native Interface operation JVM:-Xms1024m-Xmx1024m-Xmn512m-XX: PermSize128m-XX: MaxPermSize256m4000 node (50 attributes), 4000 relationship: 1 second, in the meantime, the cpu usage is 25%, the 761M8000 node (50 attributes), the relationship is 8000: 2 seconds, during which the cpu usage is 25%, 82

CPU: I3 2.4 Ghz 4-core, memory 8G Mode A: using its Native Interface operation JVM:-Xms1024m-Xmx1024m-Xmn512m-XX: PermSize = 128 m-XX: maxPermSize = 256 m 4000 nodes (50 attributes), 4000 links: 1 second, cpu usage 25%, 761 M 8000 nodes (50 attributes), 8000 links: 2 seconds, cpu usage 25%, 82

CPU: I3 2.4 Ghz 4-core, 8 GB memory
Method 1: Use its Native Interface

JVM:-Xms1024m-Xmx1024m-Xmn512m-XX: PermSize = 128 m-XX: MaxPermSize = 256 m

4000 nodes (50 attributes), 4000 links: 1 second, during which the cpu usage is 25%, 761 M

8000 nodes (50 attributes), 8000 links: 2 seconds, during which the cpu usage is 25%, 829 M

16000 nodes (50 attributes), 16000 links: 5 seconds, during which the cpu usage is 25,983 MB

24000 nodes (50 attributes), 24000 links: 9 seconds, during which the cpu usage is 25%, 1079 M

32000 nodes (50 attributes), 32000 links: 14 seconds, during which the cpu usage is 25%, 1187 M

40000 nodes (50 attributes), 40000 links: after execution for more than 1 minute, the message outOfMemery: java heap space is reported directly.

Memory usage:

Conclusion: when the transaction insertion interface is used during insertion, more than 30 thousand nodes and relationships can be inserted at last in the JVM 1 GB memory configuration, and memory overflow occurs when more nodes are inserted.

Method 2: Use the BatchInserter Interface

JVM: use the default JVM settings.

40000 nodes (50 attributes), 40000 relationships: 6 seconds, CPU usage 25%, memory 288 M

80000 nodes (50 attributes), 80000 relationships: 17 seconds, CPU usage 25%, memory 288 M

120000 nodes (50 attributes), 120000 links: 31 seconds, CPU usage 25%, memory 289 M

200000 nodes (50 attributes), 200000 relationships: 56 seconds, CPU usage 25%, memory 288 M

Analysis:

According to the official documentation, when a small amount of data is inserted (less than 5000 items are observed according to the test), we recommend that you use the transaction-Type Insert interface (usually the data operation interface of NEO4J), the speed is still acceptable; when the data volume is large, we recommend that you use the dedicated BatchInserters interface, which does not create transactions during insertion. It is estimated that the memory usage is very small. Basically, the memory remains unchanged during operations on different data volumes. It can be seen that when importing a large amount of data to NEO4J, there are two methods to achieve rapid insertion:

Shfa

This method divides a large number of data sets into 5000 or fewer sets, and inserts data using the transaction insert interface. In this way, the overall insertion time is based on the above test results, 100000 data records can be inserted within 30 seconds. The disadvantage is that you need to split the dataset into a small set. The advantage is that when you are already running a set of NEO4J databases, you only need to modify the relevant code and do not need to pause the database during the import.

Batch insert method

This method can achieve fast insertion regardless of the amount of data, achieving a balance between speed and memory. It is suitable for importing a large amount of data at one time during database initialization (or when a large amount of data needs to be imported; the disadvantage is that you need to pause the database when importing data and use the BatchInserters interface to import the data. This does not enable uninterrupted business operation.

Suggestion:

The large-to-small method is adopted. When the number of data inserted (imported) exceeds 1000, batch inserts can be used to quickly insert data, it can also ensure that the memory usage does not change much, resulting in OOM.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More