About Program Batch Warehousing solution
Younger brother about a recent batch warehousing solutions to share with you, because it is the first blog, what is wrong, please advise
Recent projects have used a large concurrent write database operation, when using only a single piece of data to commit once, so that the insertion will be very slow, the database pressure is also very large, then their first idea is to change the submission method, 10 or 100 to submit once, but this will have a problem, If the data does not always reach the specified number, it cannot be committed. There will be delays in subsequent data processing.
Data completion solution using datasheet:
With the operation of the docking server configuration with the deployment process, Operation Dimension put forward the database write batch submission scheme, mention the use of MongoDB, but I have only heard of this thing has not been studied, to the current level of technology must study a month to understand, so I thought of the timing of the data is written to the file, The program is timed to be brushed into the database (business requirements allow the data to have a certain delay within 10 minutes), start my solution as a new data file, insert a record in the database, and identify the status of the file being written, the next time the newly generated file is overwhelmed this step, and update the state of the last write file to a writable database. The program timer reads the database regularly, writes the specified file, writes the corresponding file to the database, updates the record status to the warehouse, and this scheme has a problem I finished before I wanted to understand, in the future will use the cluster, so the program in the cluster access is a database , if cluster 1 has finished writing its own files and updated the database, then cluster 2 can not find a storage file in the database when starting the task, which results in the loss of the data because the cluster 2 file is not successfully stored. The final solution does not use the database, the program read each file, after the successful storage, the file renamed, plus a unified suffix, so if the program fails or when the machine, can be written in the state of the file name to facilitate the completion of data. This data can be guaranteed not to be lost.