Two days ago, I published some information about the XML database being developed. The following article demonstrates what an XML database is and how to use it.
Quick XML database failover
Http://www.cnblogs.com/chenxizhang/archive/2009/08/09/1542354.html
After the article is published, I have received feedback from many friends and actively participated in the discussion. I will not list it here. Several friends mentioned another open-source object-oriented database: db4o.
The database home page in: http://www.db4o.com/
Today, I took the time to understand the database and download its complete source code. I can't take a closer look at it, but its architecture is quite good, in the afternoon, I was very touched by an important aspect. I do not make too many comments for the moment, but I do not deny that my first impression is a bright spot.
Db4o is an object-oriented database. Although this concept is not new, I don't think it is important to be new. db40 can be developed, and it is really good, it is worth noting. I also think that practice and time are the only criteria for testing truth. Haha
I think many of my friends are concerned about the performance of storing data in XML. Will it be slow? I have repeatedly stressed the following points in my comments in the previous article:
1. In my experience, reading local disk files is usually faster than reading real relational databases.
2. the XML file itself is not slow. If it is slow, you need to consider better design. In addition, XML provides a good programming interface. In fact, XML has better advantages than other files. For example, if you want to read a flat file (such as an INI file) or a CSV file, it may be slower.
I did some exercises in comparison to the official learning materials provided by db4o (which is good, lianxiao first. Db4o has only one database file in binary format. However, even when writing and reading large volumes of data, the performance is also good. This is to see the importance of the algorithm. Why should I say this? It is usually not easy to extract specific parts from a file. The binary read/write problem is a mixed result.
Imagine if a file already has 1 GB, and I want to write another object (for example, to modify) in half of the time range ), this is not easy because all content is in a file. Frankly speaking, that is why I want to develop this XML database.I think if we still use one or a few binary files independently, it will be slightly different from the traditional relational database. In addition, there is still a bottleneck in how to optimize it.
Of course, I couldn't guess what db4o's Development Team was thinking at that time. Maybe this was a good solution at that time.
To illustrate this problem, I wrote a simple demo program to compare the large-capacity writing and reading of db4o and XML databases. I insert 100000 (100,000) objects to them respectively. Then, nearly 35000 (35,000) objects are read respectively.
Part 1: Business Entity class (this is from the document of db4o)
/// <Summary> /// this is a simple type. It is taken from the official db4o case. // </summary> public class Pilot {public Pilot () {} string _ name; int _ points; public Pilot (string name, int points) {_ name = name; _ points = points ;} public string Name {get {return _ name;} set {_ name = value;} public int Points {get {return _ points;} set {_ points = value ;}} public void AddPoints (int points) {_ points + = points;} override public string ToString () {return string. format ("{0}/{1}", _ name, _ points );}}
Part 2: Test code
Class Program {static void Main (string [] args) {// This test Program is mainly used to compare the operations of the database db4o and the differences in the database name/create a db4o database, and insert the 100000 (100,000) object Console. writeLine ("db4o database test result"); IObjectContainer db = Db4oFactory. openFile ("test. yps "); try {Stopwatch watch = new Stopwatch (); watch. start (); for (int I = 0; I <100000; I ++) {Pilot pilot1 = new Pilot ("Michael Schumacher", I); db. store (pilot1);} watch. stop (); Console. writeLine ("\ t insert 1000000 objects, time consumed: {0} seconds", watch. elapsedMilliseconds/1000); watch. start (); var query = from p in db. query <Pilot> () where p. points % 3 = 0 select p; Console. writeLine ("\ t reads {0} objects", query. toArray (). count (); watch. stop (); Console. writeLine ("\ t read time: {0} seconds", watch. elapsedMilliseconds/1000);} finally {db. close ();} Console. writeLine ("XMLDatabase test result"); // create an XMLDatabase and insert the 100000 (TEN) object using (XDatabase xdb = XDatabase. createInstance ("Test", "E: \ Temp") {try {Stopwatch watch = new Stopwatch (); watch. start (); XTable <Pilot> table = xdb. create <Pilot> ("Pilots", new [] {"Name"}); for (int I = 0; I <100000; I ++) {Pilot pilot1 = new Pilot ("Ares Chen", I); table. insert (pilot1);} xdb. submitChanges (); watch. stop (); Console. writeLine ("\ t insert 1000000 objects, time consumed: {0} seconds", watch. elapsedMilliseconds/1000); watch. start (); var query = from p in table. select () where p. points % 3 = 0 select p; Console. writeLine ("\ t reads {0} objects", query. toArray (). count (); watch. stop (); Console. writeLine ("\ t read time: {0} seconds", watch. elapsedMilliseconds/1000);} finally {}} Console. read ();}}
Part 3: Test Results
I reverse the execution order and execute it again. The result is as follows:
Part 4: file comparison
An XML database stores an XML file separately for each table (I am still considering how to store it in a distributed manner. For example, a table can have N XML files). After inserting these 1 million objects, about 4.08 MB
This file is a very intuitive XML file.
While db4o places all objects in a database file, and the file name is irrelevant.
The file size is about 9.07MB.
This file is a typical binary file. If you want to open it, it is roughly as follows:
[Note] the comparison above is for the purpose of technical research. In fact, I have analyzed it before. db4o uses a binary file to do this. This performance is already remarkable, and its algorithm should be very good. I still don't know if it supports multiple files, Because reading its documentation says it supports a maximum single database size of 250 GB (this is a bit tricky, but I'm sure it's impossible for someone to do that.) If we say that GB of data is all in one file, that would be a good deal.
Finally, it is useless to store all databases with one file. One of its biggest advantages is the convenience of backup and management, because there is only one file. The XML database being developed has a disadvantage in this regard, because my idea is to store different things through different directories, such as putting all the table data in Tables, schemas stores all the architecture files, Blobs stores all the large objects (examples), and a dedicated rels directory that records the relationships between these files. In this way, there will be some risks, because there are a lot of files. This requires more consideration.
The development of the XML database is continuing. If you have any suggestions or ideas, please leave a message. As I have other work to do in my daily life, I usually reply to it in the evening.
Author: Chen xizhang at 20:26:05
Published in: blog Park. For details, refer to the source.