About Data Batch Warehousing solutions

Source: Internet
Author: User

About Program Batch Warehousing solution

Younger brother about a recent batch warehousing solutions to share with you, because it is the first blog, what is wrong, please advise

Recent projects have used a large concurrent write database operation, when using only a single piece of data to commit once, so that the insertion will be very slow, the database pressure is also very large, then their first idea is to change the submission method, 10 or 100 to submit once, but this will have a problem, If the data does not always reach the specified number, it cannot be committed. There will be delays in subsequent data processing.

Data completion solution using datasheet:

With the operation of the docking server configuration with the deployment process, Operation Dimension put forward the database write batch submission scheme, mention the use of MongoDB, but I have only heard of this thing has not been studied, to the current level of technology must study a month to understand, so I thought of the timing of the data is written to the file, The program is timed to be brushed into the database (business requirements allow the data to have a certain delay within 10 minutes), start my solution as a new data file, insert a record in the database, and identify the status of the file being written, the next time the newly generated file is overwhelmed this step, and update the state of the last write file to a writable database. The program timer reads the database regularly, writes the specified file, writes the corresponding file to the database, updates the record status to the warehouse, and this scheme has a problem I finished before I wanted to understand, in the future will use the cluster, so the program in the cluster access is a database , if cluster 1 has finished writing its own files and updated the database, then cluster 2 can not find a storage file in the database when starting the task, which results in the loss of the data because the cluster 2 file is not successfully stored. The final solution does not use the database, the program read each file, after the successful storage, the file renamed, plus a unified suffix, so if the program fails or when the machine, can be written in the state of the file name to facilitate the completion of data. This data can be guaranteed not to be lost.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.