Berkeley DB data processing

Last Update:2014-09-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Design a structure, use Berkeley DB to complete Big data storage, backup, query function.

Existing reserves:

Basic operation of 1.Berkeley db.

2. Data is not lost after the data is dumped.

3. Storage of more than hundred GB of data.

In the case of data flow, individuals are called data streams, and conflicts with other terminology are not considered.

Each part features:

A: Responsible for depositing the data into Berkeley DB in order to convert the source data into a format that Berkeley DB can access. Convenient for subsequent use. Because A's disk is limited, and a is consuming a lot of system resources during the continuous insertion process, and once it goes down, the consequences are serious (perhaps the processing method is not found). At the same time, the use of a process to insert data in a, to use another process to retrieve a can also be processed separately. The retrieval efficiency is not necessarily high at this point. So make the corresponding changes.

Node[]: Periodically get the data on a. Get the rules based on the actual situation. A bit like the principle of map. After our tests, we did this step. There is only the first node in node that can access the data. Other node open environments will make an error. So there is a node in the merge that collects all the node data so that there is no problem opening the environment. The data can also be accessed.

Merge: As stated above, the merge is responsible for merging data. This Berkeley DB can be accessed. The goal is to be able to search on the merge.

Problem:

1.A Downtime causes node node jdb file data to be out of order. If a is a program exception and the Jdb file in the environment is lost, then the environment fails if you restart the program. Even if the contents of the environment are emptied, the re-generated files start from scratch. Can no longer be merged with the previous data. One scenario is if you change the file name of the newly generated files to the next one of the previous maximum filenames, and then merge them into the merge. Through testing, this scheme is not feasible, can not be two different environment files according to the file name in an orderly combination of an environment still want to visit him. After all, the memory database is not a hard disk. Another option is to use a merge to synthesize the files after the outage into merge2. To retrieve this, we traverse the next merge to access the database. (All of the computer communications, connections, etc. are considered for the time being using SSH). This method can theoretically, in fact, be researched.

2. I personally think that the biggest problem is the use of too much resources. Node can be considered to be removed. node's role is to prevent data loss on the merge and all data is lost. But in fact the data on the merge is hard to lose . If merge is used only for retrieval operations, there is basically no possibility of loss. Unless you manually delete the hard drive's files. So now it's thinking of changing the structure. Get the following structure and give a corresponding explanation

The function of a does not have to say much, as before. Node data. However, the condition for switching to the next node is that the Prenode disk is exhausted or a is down. We guarantee that the data stored on each node is a environment. When I go down, I switch to the next node. Laboratory computers can now make the program abnormal, terminating, the amount of data can reach at least 200G. A computer with good performance can make Berkeley DB manage terabytes of files without pressure. A's disk is best to be as large as possible, so you can consider if the small data does not have to delete the data on a.

Search entry requirements are not high, basic can run the program on the line, of course, you can directly use a node as a search portal.

Personally, the above plan is better than the first one. There are some hidden problems not found, after all, lack of experience.

Reprint description Source: http://www.cnblogs.com/ickelin/p/3975676.html

Berkeley DB data processing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Berkeley DB data processing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Berkeley DB data processing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support