Performance Summary of collecting file data through FTP
1, Introduction
In the process of data acquisition, use Excel or CSV format to upload the data to the server, and then through the Java file operation and flow operation, get the uploaded file information, and finally through the Csvreader read uploaded file information, finally through the JDBC interface to store file information in the database.
This article focuses on the use of Java in various open source framework to collect FTP uploaded data, how to quickly upload data, while ensuring that less system resources consumption.
2, Performance test
Performance testing is the ability to test a scanned storage file with mass data after optimizing the JDBC interface. 5 data is stored on a single time during testing. The bulk interface is used.
The system is configured as follows:
Application Server:
1, CPU Inteli3
2, Memory 8G
Database server:
1,cpu Intel (R) Xeon (r) CPU
2, Memory 4G
3, using the MySQL database
The performance indicators for batch testing are as follows:
Number of bars inserted |
Use Time (MS) |
Server memory consumption |
Database server memory consumption |
Encounter problems |
100 |
833 |
35.5M |
15.7% |
|
1000 |
3971 |
54.5M |
15.7% |
|
10000 |
13399 |
111.5M |
15.7% |
|
20000 |
19639 |
154M |
15.7% |
|
30000 |
27308 |
188M |
14.7% |
|
100000 |
90039 (1.5min) |
453M |
15.7% |
|
200000 |
204598 (3.4min) |
847M |
15.7% |
|
300000 |
295438 (4.9min) |
1014M |
15.7% |
|
400000 |
1093927 (18.2min) |
1292M |
15.7% |
|
500000 |
1570964 (26.2min) |
1433M |
15.7% |
|
600000 |
1668328 (27.8min) |
1780M |
15.7% |
|
700000 |
2372982 (39.5min) |
1855M |
15.7% |
|
800000 |
2865081 (47.8min) |
2015M |
15.7% |
|
900000 |
3584745 (59.7min) |
2129M |
15.7% |
A single pen 900,000 consumption time is relatively long |
1000000 |
|
2095M |
|
Java.lang.OutOfMemoryError:GC Overhead limit exceeded |
3, Optimization plan 1
The data that is fetched into the array, inserted 10000 in a single batch when the database is inserted. The test results are as follows.
Number of bars inserted |
Use Time (MS) |
Server memory consumption |
Database server memory consumption |
Encounter problems |
100 |
833 |
35.5M |
15.7% |
|
1000 |
3971 |
54.5M |
15.7% |
|
10000 |
13399 |
111.5M |
15.7% |
|
20000 |
22684 |
157M |
15.7% |
|
30000 |
30716 |
208M |
15.7% |
|
100000 |
95958 (1.5min) |
463M |
15.7% |
|
200000 |
180388 (3.0min) |
384M |
15.7% |
|
300000 |
275388 (4.5min) |
420M |
15.7% |
|
400000 |
377242 (6.28min) |
505M |
15.7% |
|
500000 |
527830 (8.8min) |
552M |
15.7% |
|
600000 |
588103 (9.8min) |
726M |
15.7% |
|
700000 |
723151 (12.1min) |
880M |
15.7% |
|
800000 |
901322 (15.0min) |
850M |
15.7% |
|
900000 |
1113462 (18.6min) |
928M |
15.7% |
|
1000000 |
1325419 (22.0min) |
806M |
15.7% |
|
1500000 |
2179144 (36.3min) |
1097M |
15.7% |
|
2000000 |
3569986 (59.5min) |
1562M |
15.7% |
|
3000000 |
|
|
|
|
4, Optimization plan 2
Using the Java streaming mechanism, insert 10000 in a single batch when inserting a database. The test results are as follows.
Number of bars inserted |
Use Time (MS) |
Server memory consumption |
Database server memory consumption |
Encounter problems |
100 |
273 |
40.4M |
15.8% |
|
1000 |
1236 |
51.4M |
15.8% |
|
10000 |
9645 |
91.5M |
15.8% |
|
20000 |
21685 |
134M |
15.8% |
|
30000 |
27509 |
201M |
15.8% |
|
100000 |
89067 (1.5min) |
198M |
15.8% |
|
200000 |
175856 (2.9min) |
186M |
15.8% |
|
300000 |
265390 (4.4min) |
196M |
15.8% |
|
400000 |
356549 (5.9min) |
202M |
15.8% |
|
500000 |
453659 (7.6min) |
202M |
15.8% |
|
600000 |
576562 (9.6min) |
199M |
15.8% |
|
700000 |
730090 (12.2min) |
200M |
15.8% |
|
800000 |
769520 (12.8min) |
200M |
15.8% |
|
900000 |
1124001 (18.7min) |
209M |
15.8% |
|
1000000 |
1341186 (22.4min) |
202M |
15.8% |
|
1500000 |
2566401 (42.8min) |
204M |
15.8% |
|
2000000 |
|
|
|
|
5, Optimization plan 3
Reduce the creation of objects and optimize the program. The test results are as follows.
Number of bars inserted |
Use Time (MS) |
Server memory consumption |
Database server memory consumption |
Encounter problems |
10000 |
10022 |
99.6M |
15.8% |
|
100000 |
88925 (1.5min) |
203M |
15.8% |
|
500000 |
444200 (7.4min) |
201M |
15.8% |
|
1000000 |
1327358 (22.1min) |
202M |
15.8% |
|
1500000 |
2588956 (43.1min) |
202M |
15.8% |
|
2000000 |
|
|
|
|
6, Summary
Through the FTP mechanism to collect data, in the case of large amount of data, there are two optimization ideas, one is the use of bulk inserts to control the number of inserts a single time. The second is the best use of the underlying mechanism, open source software sometimes at design time to consider too many things, will increase the system's resource consumption.