Repository data storage
In Subversion1.2, there are two ways to store data in the repository. One is to store data in a Berkeley DB database, and the other is to use a normal file, using a custom format. Because Subversion developers call the repository a (versioned) file system, they accept the habit of calling the latter a fsfs[14], that is, a system that uses the local operating system file system to store versioned files of data.
When you build a repository, the administrator must decide whether to use Berkeley DB or FSFS. They each have their pros and cons and we'll describe them in detail. None of the two are more formal, and the program that accesses the repository is independent of which implementation is used. The accessors do not know how the repository stores the data, they simply read from the repository's API to the revision and transaction tree.
Table 5.1, "Repository data storage Comparison" compares the Berkeley DB and Fsfs repositories in general, and the next section details the details.
Table 5.1. Repository Data Storage Comparison table
features |
Berkeley DB |
FSFS |
Sensitivity to operational interruptions |
System crashes or permissions issues can cause the database to " jam" and require regular recovery. |
Not sensitive. |
Can read-only load |
No |
OK |
Storage Platform agnostic |
No |
OK |
Accessible from the network file system |
No |
OK |
Version Library size |
Slightly larger |
Slightly smaller |
Extensibility: Number of revision trees |
Database, no restrictions |
Many old local file systems have problems processing a single directory containing thousands of entries. |
Extensibility: Directories with large files |
More slowly |
More quickly |
Speed: Check out the latest code |
More quickly |
More slowly |
Speed: Large Commit |
Slower, but the time is allocated throughout the commit operation |
Faster, but the last longer delay may cause client operations to time out |
Group access rights processing |
is sensitive to the user's umask settings and is best accessed by only one user. |
Insensitive to Umask settings |
Function Maturity Time |
2001 Start use |
2004 Start use |
Berkeley DB
During the initial design phase of subversion, developers decided to use Berkeley DB for a variety of reasons, such as its open source protocol, transactional support, reliability, performance, simple APIs, thread safety, support cursors, and so on.
Berkeley DB provides true transactional support-perhaps its most powerful feature, with multiple processes accessing your Subversion repository without worrying about the occasional destruction of data from other processes. Transaction system-provided isolation for any given operation, the Subversion repository code sees only a static view of the database-not a database that has a changing impact on other processes-and is able to make decisions based on that view. If the decision happens to be in conflict with another process, the entire operation is rolled back, as if nothing had happened, and subversion gracefully again operates on the updated static view.
Berkeley db Another powerful feature is hot backup-the ability to back up a database environment without having to "go offline". We will discuss how to back up your repository in the "Repository Backup" section, and it is clear that the benefits of not stopping the system from fully backing up the repository are obvious.
Berkeley DB is also a trustworthy database system. Subversion leverages the convenience of Berkeley DB for logging, which means that the database first writes a log file on disk, describes the modifications it will make, and then makes those changes. This is to make sure that if anything goes wrong, the database system can revert to the previous checkpoint-a log file that does not have the wrong location and restart the transaction until the data is restored to a usable state. For more information about Berkeley DB log files, see the "Managing Disk Space" section.
But every Rose has thorns, and we must also record some known flaws in the Berkeley DB. First, the Berkeley DB environment is not cross-platform. You cannot simply copy a subversion repository created on UNIX to a Windows system and expect it to work properly. Although most of the format of the Berkeley DB database is not schema-constrained, there are some aspects of the environment that are not independent. Second, subversion using Berkeley DB cannot run on 95/98 systems-if you need to build the repository on a Windows machine, install it on the Windows2000 or windowsxp. In addition, the Berkeley DB repository cannot be placed in a network shared folder, although Berkeley DB promises to work properly on a network share if it follows a specific set of specifications, but virtually none of the known shared types meet this specification.
Finally, because the Berkeley DB Library is directly linked to subversion, it is more sensitive to interrupts than a typical relational database system. Most SQL systems, for example, have a master service process to coordinate access to database tables. If a program accessing the database is having problems for some reason, the database daemon perceives that the connection interruption will do some cleanup. Because the database daemon is the only process that accesses database tables, applications do not need to worry about conflicting access permissions. However, these conditions differ from Berkeley DB. Subversion (and the program using the Subversion Library) directly accesses the tables of the database, which means that if a program crashes, it will leave the database in a temporary inconsistent, inaccessible state. When this happens, the administrator needs to get Berkeley DB back to a checkpoint, which is a bit annoying indeed. In addition to the crash process, there are some situations that can cause the repository to get an exception, such as a program that conflicts with ownership or access to a database file. Because the Berkeley DB repository is very fast and extensible, it is well suited to use a single service process, accessed by a single user-such as Apache's httpd or svnserve(see Chapter 6th, Configuring the server )-More user access than multiple users through file:///
or svn+ssh://
URLs. If you are using the Berkeley DB Repository directly as a multiuser access, read the section "Support multiple repository access methods" first.
FSFS
In the middle of the 2004, another repository storage system slowly formed: a storage system that did not require a database. The FSFS repository stores revision trees in a single file, so all revisions in the repository are in a limited number of files in a subfolder. Transactions are created in a separate subdirectory, and once created, a separate transaction file is created and moved to the revision directory, which guarantees that the commit is atomic. Because a revision file is persistent and immutable, the repository can also be hot-backed, just like the Berkeley DB Repository.
The revision file format represents a revision of the directory structure, file contents, and other relevant information in the revision tree. Unlike the Berkeley DB database, this storage format is cross-platform and independent of the CPU architecture. Because there are no logs or files that use shared memory, the database can be accessed securely by the network file system and checked in a read-only environment. The lack of a database Australians also means that the overall volume of the repository can be slightly smaller.
The FSFS also has a different performance characteristics. When a large number of files are submitted, FSFS uses an O (N) algorithm to append entries, while Berkeley DB uses the (n^2) algorithm to rewrite the entire directory. On the other hand, Fsfs records the new version by writing changes compared to the previous version, which also means that getting the latest revision is slower than Berkeley db, and FSFS has a longer delay when committed, which in some extreme cases causes the guest guard to time out while waiting for a response.
The most important difference is the ability of the FSFS not to wedge when an error occurs. If a process with Berkeley DB has a licensing error or a sudden crash, the database will remain unusable until the administrator recovers. If the same situation occurs when the FSFS repository is applied, the repository will not be disturbed and, in the worst case, will leave some transactional data.
The only real disadvantage to FSFS is the immaturity of Berkeley DB, the lack of adequate use and stress testing, and many judgments about speed and scalability based on good guesses. In theory, it promises to lower the threshold for novice administrators and is less prone to problems. In practice, only time can prove.
Transferred from: http://www.blogjava.net/jasmine214--love/archive/2011/01/18/343160.html
Two ways to store svn Fsfs and bdb compare "Go"