A system, whether large or small, is not a lack of import and export, a system if you have the import function, can bring a lot of convenience. Because the Itoo system requires import functionality for each system. Because I've been doing some research on the import, This public function falls on the head. I am mainly to introduce to you the whole idea of my introduction and later optimization scheme.
The overall idea of the import can be explained by this picture:
The first step is to convert Excel that fills the data into a list collection.
The second step is to bulk save the list collection to the corresponding table in the database.
The above is the process of importing filled Excel into the corresponding table in the database.
The above figure is mainly about if you convert an Excel into a list, the main thing to do is those operations.
Check if the Execel data is duplicated: Because the data in Excel is artificial, there will always be a time when you have accidentally filled out the duplicate data, this time you need to check whether the completed Excel has duplicate data, Go out and repeat the data in the back of the operation. The primary purpose of this step is to ensure that the data in Excel is not duplicated.
Check if Excel data already exists in database: This step is to ensure that the saved data and database data are not duplicated to avoid dirty data. This step is required to interact with the database.
working with data that has a primary foreign key relationship: This step also needs to interact with the data. Sometimes the imported data is related to other tables, in order to maintain the consistency of the data, the primary key of the primary foreign key relationship is saved in the database, and the name of the foreign key is filled out in Excel. So it needs to be converted internally.
The data is saved to the list in the behavior unit: The data is stored in the list, and the data is divided into one row of Excel, in the list, the data of each row is saved to the object, and the number of object is determined by the number of rows that match the criteria that can be saved to the database. .
The above is mainly about how to convert an Excel to a list of the steps required. Where the Excel data is checked for presence and processing of primary foreign key relationships, these two steps are required to interact with the database, and can be represented by the following diagram:
in the process of converting each row into a list, you need to interact with the database two times, and if there are 10 data, the number of times you need to interact with the database is 20 times. When the data volume is small, the imported data does not feel much, anyway, it's over. But when the volume of data increased by 20,000, it is true that the IO read and write too much of the system is too slow to play to the extreme. To give you an example, there is a time need to import 20,000 students of the data into the system, because I am familiar with the system, so the manager let me import in, When I got the data, I started working in a cheerful way, but it took a long time, the data were not finished, in the computer's CPU, all to the situation of the red. I have been able to optimize the import function silently when I see this condition. My idea of optimization can be explained by the following diagram:
First step: I first query the data table data in advance, and then put in the map.
The second step: check whether the Excel database exists in databases, directly with the advance query placed in the map data to compare.
The third step: when dealing with the primary foreign key relationship, the data in the map can be queried directly with advance query, and the data is not needed.
to do the above optimization, if you need to handle 3 primary foreign key relationships, with 1 columns of data to determine whether to repeat in the database, so only need to interact with the database 4 times, regardless of whether the data is 10 or 100,000. Requires IO to read and write only 4 times.
Now ask yourself, why do you have so much IO to read and write when you did it?
as the import function, is to complete the quasi, to make a good, did not think how to do can reduce the IO operation, how big data volume, this import function will not slow it? The emergence of these problems, all because of their lack of a heart to serve the people wholeheartedly, the lack of a global mind, the deepest in the heart of small-rich is not cut off, but also at the moment the impact of their own. To make a good software, first for others to start.
"Java" Itoo project to reduce the import of Io read and write ideas