Performance summary of collecting files through FTP

Source: Internet
Author: User
Tags server memory gc overhead limit exceeded
Performance Summary of collecting file data through FTP 1, Introduction

In the process of data acquisition, use Excel or CSV format to upload the data to the server, and then through the Java file operation and flow operation, get the uploaded file information, and finally through the Csvreader read uploaded file information, finally through the JDBC interface to store file information in the database.

This article focuses on the use of Java in various open source framework to collect FTP uploaded data, how to quickly upload data, while ensuring that less system resources consumption.
2, Performance test

Performance testing is the ability to test a scanned storage file with mass data after optimizing the JDBC interface. 5 data is stored on a single time during testing. The bulk interface is used.

The system is configured as follows:

Application Server:

1, CPU Inteli3

2, Memory 8G

Database server:

1,cpu Intel (R) Xeon (r) CPU

2, Memory 4G

3, using the MySQL database

The performance indicators for batch testing are as follows:

Number of bars inserted

Use Time (MS)

Server memory consumption

Database server memory consumption

Encounter problems

100

833

35.5M

15.7%

1000

3971

54.5M

15.7%

10000

13399

111.5M

15.7%

20000

19639

154M

15.7%

30000

27308

188M

14.7%

100000

90039 (1.5min)

453M

15.7%

200000

204598 (3.4min)

847M

15.7%

300000

295438 (4.9min)

1014M

15.7%

400000

1093927 (18.2min)

1292M

15.7%

500000

1570964 (26.2min)

1433M

15.7%

600000

1668328 (27.8min)

1780M

15.7%

700000

2372982 (39.5min)

1855M

15.7%

800000

2865081 (47.8min)

2015M

15.7%

900000

3584745 (59.7min)

2129M

15.7%

A single pen 900,000 consumption time is relatively long

1000000

2095M

Java.lang.OutOfMemoryError:GC Overhead limit exceeded

3, Optimization plan 1

The data that is fetched into the array, inserted 10000 in a single batch when the database is inserted. The test results are as follows.

Number of bars inserted

Use Time (MS)

Server memory consumption

Database server memory consumption

Encounter problems

100

833

35.5M

15.7%

1000

3971

54.5M

15.7%

10000

13399

111.5M

15.7%

20000

22684

157M

15.7%

30000

30716

208M

15.7%

100000

95958 (1.5min)

463M

15.7%

200000

180388 (3.0min)

384M

15.7%

300000

275388 (4.5min)

420M

15.7%

400000

377242 (6.28min)

505M

15.7%

500000

527830 (8.8min)

552M

15.7%

600000

588103 (9.8min)

726M

15.7%

700000

723151 (12.1min)

880M

15.7%

800000

901322 (15.0min)

850M

15.7%

900000

1113462 (18.6min)

928M

15.7%

1000000

1325419 (22.0min)

806M

15.7%

1500000

2179144 (36.3min)

1097M

15.7%

2000000

3569986 (59.5min)

1562M

15.7%

3000000

4, Optimization plan 2

Using the Java streaming mechanism, insert 10000 in a single batch when inserting a database. The test results are as follows.

Number of bars inserted

Use Time (MS)

Server memory consumption

Database server memory consumption

Encounter problems

100

273

40.4M

15.8%

1000

1236

51.4M

15.8%

10000

9645

91.5M

15.8%

20000

21685

134M

15.8%

30000

27509

201M

15.8%

100000

89067 (1.5min)

198M

15.8%

200000

175856 (2.9min)

186M

15.8%

300000

265390 (4.4min)

196M

15.8%

400000

356549 (5.9min)

202M

15.8%

500000

453659 (7.6min)

202M

15.8%

600000

576562 (9.6min)

199M

15.8%

700000

730090 (12.2min)

200M

15.8%

800000

769520 (12.8min)

200M

15.8%

900000

1124001 (18.7min)

209M

15.8%

1000000

1341186 (22.4min)

202M

15.8%

1500000

2566401 (42.8min)

204M

15.8%

2000000

5, Optimization plan 3

Reduce the creation of objects and optimize the program. The test results are as follows.

Number of bars inserted

Use Time (MS)

Server memory consumption

Database server memory consumption

Encounter problems

10000

10022

99.6M

15.8%

100000

88925 (1.5min)

203M

15.8%

500000

444200 (7.4min)

201M

15.8%

1000000

1327358 (22.1min)

202M

15.8%

1500000

2588956 (43.1min)

202M

15.8%

2000000

6, Summary

Through the FTP mechanism to collect data, in the case of large amount of data, there are two optimization ideas, one is the use of bulk inserts to control the number of inserts a single time. The second is the best use of the underlying mechanism, open source software sometimes at design time to consider too many things, will increase the system's resource consumption.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.