Mnesia-a robust Distributed DBMS for telecommunication application systems

Source: Internet
Author: User

Original article: http://www.erlang.se/publications/mnesia_overview.pdf

Abstract: The mnesia DBMS and the application system with data run in the same address space. However, the application system cannot destroy the database content. Mnesia also provides fast access features and good fault tolerance. These two requirements are usually conflicting. Mnesia is implemented based on the features of the Erlang programming language. mnesia is also embedded in Erlang.
1. Introduce that data management in telecommunication systems is the same in many aspects (but not all) as traditional commercial DBMS (Database Manager System. Especially for many "nonstop systems", there are very high requirements on fault tolerance. In addition, there must be a need to run in the same address space as the application system, as a result, we have designed a new DBMS. This article describes the motives and related designs of the new DBMS called mnesia. Mnesia is implemented by the Erlang language and closely related to Erlang. Erlang provides necessary functions for telecom systems with fault tolerance. Mnesia is a multi-user Distributed DBMS written in Erlang to implement industrial telecom application systems. Erlang is also an ideal language for mnesia operations. Mnesia tries to cover all the issues related to data management in telecommunications systems. It has many uncommon features in traditional databases. Telecom applications have many feature requirements different from traditional DBMS. Our current application systems implemented using the Erlang language require many features that traditional DBMS cannot meet. mnesia is designed based on the following requirements:
  1. Quick and real-time key/value search;
  2. Complex non-real-time query mainly for operation and maintenance;
  3. Distributed Data caused by distributed applications;
  4. High Fault tolerance;
  5. Dynamic Reconfiguration );
  6. Complex objects.
Different from other DBMS, mnesia is designed for data management issues in telecom application systems. Mnesia also integrates many concepts in traditional databases with data management concepts in telecom applications. The former includes transactions and queries, the latter includes fast real-time operations, error tolerance configurability (replication) configurable degree of fault tolerance (by means of replication), and re-configuration of the system without downtime or suspension. The tight coupling between mnesia and Erlang makes it interesting. It turns Erlang into a database programming language. This brings many benefits. The most important thing is that the impedance mismatch problem caused by the difference between the data format in DBMS and the data format in programming languages does not exist now. Currently, mnesia has been applied in almost all Erlang-based projects in erisson, from prototype systems of small scale to large switch projects. The rest of this article is organized as follows: Section 2nd is a brief overview of DBMS, section 3rd lists typical dmbs functions, discusses telecommunications functions, and how mnesia provides these functions, section 4th includes some performance measurements, which are summarized in section 5th.

2. mnesia brief introduction mnesia is an extension of the programming language Erlang and also an Erlang application. DBMS components, such as lock manager, Transaction Manager, copy manager, log, primary storage and secondary storage, backup system, etc, these are implemented by the Erlang program. However, the query language is part of the Erlang syntax. The data model of mnesia is a hybrid type: the data is organized by the record table. The record table is similar to the relationship in a relational database, but the record attributes (including the primary key) it can be any complex combination of data structures (such as trees, functions, closures, code, and so on ). In this way, mnesia can also be seen as an object-relational DBMS. For example, we define a person's record:-record (person, {name, % atomic, unique primary key data, % unspecified Combination Structure Data married_to, % partner name, you can leave it unspecified (undefined) Children }). % if the child has this definition, we can use the following Erlang syntax to create a person's record: x = # person {name = klacke, Data = {male, 36,971 191 }, married_to = Eva, children = [Marten, Maja, Klara]}. bind Variable X to the record of the person. Bind the data domain to a tuple: {male, 36,971191 }. This is an example of a complex object. mnesia has no restrictions on the complexity of attributes. We can even use function objects as attribute values. Variable X is just an Erlang term. You can use the following statement to insert it into the database: mnesia: Write (X) A series of mnesia operations can be organized and executed together as an atomic transaction. To allow mnesia to execute a transaction, the programmer must first construct a function object and then submit it to the mnesia system. Let's explain through an example. Suppose we want to write an Erlang function divorce (name), which accepts a person's name and searches for this person from the database, set the married_to field of the spouse of the person to undefined value: divorce (name)-> F = fun ()-> case mnesia: Read (name) of []-> mnesia: Abort (no_such_person); pers-> partner = mnesia: Read (Pers # person. married_to), mnesia: Write (Pers # person {married_to = undefined}), mnesia: Write (partner # person {married_to = undefined}) end, mnesia: transaction (f ). the divorce/1 function is composed of two statements, the first language. F =... is used to create a function object. It does not execute anything, but builds an anonymous function. The second statement handed over the function to the mnesia system, which is responsible for executing the function in the upstream of a transaction, which is equivalent to the traditional transaction syntax. In fact, function f executes a read operation for the first time to find the person with the given name, and then performs the second read operation to find the spouse of the former, and finally performs two write operations, insert two new records (married_to has been set as undefined) to the database. The old values in the database will overwrite the new values. The divorce/1 function uses the transaction value as the return value. The transaction value is either {aborted, reason} or {atomic, value }, it depends on whether the transaction is abandoned or successfully executed. The query in mnesia is expressed by the list comprehension syntax [15]. Query [p. name | P <Table (person), length (P. children)> X] end is read as: form a P. name List, where p is obtained from the person table, and the length of the children list of each P exceeds X. It is also feasible and natural to mix User-Defined predicates into a query. For example, the following predicates are available: maturep ({sex, age, phone}) when age> 30-> true; maturep ({sex, age, phone})-> false; query can be: query [p. name | P <-table (person), maturep (P. data), length (P. children)> X] The end query extracts the names of all persons with more than X Children and whose second element value is greater than 30. You can also use an embedded logic language like datalog [16] to define rules: Oldies (name)-> P <-table (person), maturep (P. data), name = P. name. this rule is used as a virtual table, and applications can access the virtual table oldies. The virtual oldies table contains a subset of the actual person table. This is similar to the view concept in relational databases, but it is more powerful. An Optimized query compiler is responsible for compiling query statements. The compiler has been integrated into the Erlang compiler. Database tables can be copied to multiple sites (or nodes), and the node network can be a heterogeneous network. Replication is the mechanism we use to build Fault Tolerance systems. Access to database tables is location transparent, that is, the program does not need to know the data distribution location. A database table has a unique name and related attributes:
  • Type controls whether the database table is set or bag. The key value in the set is unique, while bag allows multiple objects to have the same key value;
  • The mnesia node where the replica of the ram_copies database table (replicas) is located only keeps the table in memory;
  • The mnesia node where the replica of the disc_copies database table (replicas) is located keeps the table in the memory, but all update operations on the table are recorded in the disk;
  • The mnesia node where the replica of the disc_only_copies database table (replicas) is located only keeps the table on the disk. Obviously, these replicas are slower to access than the replicas in the memory;
  • The index is used to specify the attributes in the record that need to be indexed. All records are automatically indexed for the primary key;
  • Whether or not SNMP needs to be operated through the SNMP protocol.
Description information of all tables is kept in the schema of the database. mnesia provides many functions for dynamic schema operations. Tables can be created, moved, copied, changed, and destroyed ...... in addition, all system activities are performed behind the scenes, this allows the application system to be available when it is being modified (thus allows the application to utilize the system as usual although the system itself is being changed ). You can create an entire distributed system through backup, which will be installed as fallback. (Backups can be constructed of the entire distributed system, this backups can be installed as fallbacks.) This means that if the system crashes, the database can be automatically rebuilt from fallback quickly.
3. DBMS features different DBMS have different characteristics. This section lists different DBMS features and discusses some important and necessary features in our telecommunications system. 3.1 The ability of complex values to operate complex values (such as list, set, and tree) in DBMS may be the most important feature of telecommunication DBMS. Telecommunications application systems used to process Traffic (Traffic) are usually driven by the external stimulus arriving at the system (stimuli), when such a stimulus (stimuli) uses the PDU (Protocol Data Unit) when the form arrives at the telecommunication system, the PDU is decoded and then perform a series of operations. After the PDU is decoded, the system usually extracts some data objects, which may be a subscriber record, this record is used to determine which operations should be performed in response to the received PDU. Among many telecom systems, one of the most important features of the data management system is that the search must be very efficient. DBMS allows data to be organized and stored in some way, which allows a simple search operation to access data. This requirement also makes modeling for telecommunications systems more difficult. It is generally not feasible to organize telecommunications data in the third paradigm (or even the first paradigm. This is also one of the reasons why China Telecom focuses on object-oriented database systems. Compared with relational databases, object-oriented databases allow data to be organized in a more flexible manner. Mnesia allows users to use any complex objects in the database as attribute values or even key values ). 3.2 data format and address space many databases use an internal, language-independent format to store data. This is unfortunate for the telecom system because of the fast search requirements mentioned above. Many oodbms (object-oriented DBMS) are closely coupled with a program language (such as C ++ or Smalltalk). This ability to operate regular program language objects in the database makes the impedance mismatch disappear. This not only makes DBMS operations easier, but also provides the opportunity to achieve efficient search, because relying on the program language in use, a search operation can immediately return a pointer to an object. For example, if we want to implement a routing table through a database table, it is unrealistic to convert the routing data from the external DBMS format to the format we need. In addition, it is also unrealistic to execute any context switchover and search for related data for each packet in another process in another address space (it is not realistic to perform any context switches and search the relevant data for each packet in A process executing in another address sapce ). This eliminates all DBMS that cannot be directly linked to the application system address space, and all DBMS that store data in a language-independent format even though linked to the application system address space. The biggest drawback of running the application system and DBMS in the same address space is that if the application system crashes due to program errors, DBMS may not be able to store important data to the second-level storage before the termination. This means that the entire DBMS must be restored before it is restarted, which is usually a very time-consuming process, and the time to host in a telecommunications system must be as short as possible. Both application systems and DBMS are DBMS implemented by Erlang to avoid this problem. An Erlang application system cannot crash in the way that affects DBMS. The application system and DBMS run in the same address space, but Erlang ensures that the crash of one application system will not affect the other application system. Erlang processes have the advantage of running in the same address space, but these processes cannot explicitly read and write the memory of other processes. 3.3 Fault Tolerance many telecommunication application systems are non-stop systems that provide continuous access even in the case of hardware or software errors. This requirement is not only for DBMS, but also for telecom application systems. This affects the design of the entire application system, and DBMS must provide a good design mechanism for the application system designer. Mnesia provides this mechanism to copy a database table to multiple nodes. All replicas of an mnesia table are equivalent. At the DBMS level, there is no concept of a master table or a backup table. If a table is replicated, all write operations in a transaction will be applied to all replicas. If some replicas are not accessible, write operations can also be executed successfully, and the missing replicas will be updated after they are restored. This mechanism makes it possible to design an uninterrupted system that runs continuously through collaboration between systems distributed across different geographic regions. Many other high-Fault-Tolerant Systems (such as clustra [11]) also provide Fault-Tolerant capabilities through this replication, but they do not have the ability to execute in the same address space as the application system. Mnesia can be partially recovered from a disaster, and all objects written to the disk can be safely separated from garbage (objects are coded in such away that it is possible to safely disinguish data from garbage ). This makes it possible to scan a damaged or crashed disk or file system, and then retrieve data from the crashed disk again. Distribution and location transparency mnesia are a real Distributed DBMS that can be copied and stored remotely. In such an environment, it is very important for DBMS programmers to access data without having to know the data location. That is to say, data location transparency is very important. On the other hand, since remote data access is very expensive, application system programmers need to explicitly find location information so that they can execute programs in the location where the data is located. Therefore, we need to provide both the location transparency and the ability to explicitly locate data locations. Different applications have different requirements. The mnesia application system can access these tables only by using the names of database tables (without considering the table location. The system can identify where data is copied. However, it also allows mnesia programmers to query the table location through the system and then remotely execute the code. They can send the code to a remote site, or the code is already there and loaded. 3.5 transactions and acid DBMS both have ACID properties, atomicity, consistency, isolation, and continuity. These features are implemented through transactions, writeahead logging, and recovery in mnesia. Many mnesia transactions include a series of operations on database tables that are only in memory (may be replicated). These transactions do not deal with the disk storage system at all, therefore, the persistence feature is not implemented for these transactions. An example of the transaction semantics required in a telecom system is that when a new subscriber needs to be added to the system: When we perform this operation, some resources will be allocated to the system, some data objects are written into the system memory, and all these operations are crucial as an atomic action. Otherwise, the system may be inconsistent: some resources may not be released. 3.6 The ability to bypass the Transaction Manager is very costly for telecom communication to process application systems. It is not feasible to simply access data through the transaction system, therefore, bypassing such transaction systems is very useful. A dbms suitable for telecommunications systems must be able to support atomic transactions composed of a series of database operations at the same time, as well as very lightweight locking on the same data (very light weight locking on the same data ). The preceding traffic processing system consists of many tables, many of which are rarely written but often read. For example, processing a single call is more common than adding a subscriber, and routing a PDU packet is more common than modifying a route table. When we execute a critical code (critical code) with high performance requirements, we do not want to be forced to use transactions, which only have read-only operations. On the contrary, when the route table is being modified, the route information is read from the route table. Some packets are acceptable because of this access conflict and loss. What is needed here is a very lightweight lock protection, so that the application system process can access the data table and determine that each data object is readable and will not be confused due to the current write operation. Mnesia supports this feature through the so-called dirty interface (dirty interface. It is possible to read, write, and search for mnesia tables that are not protected in a transaction. These dirty operations are real-time DBMS operations: regardless of the size of the database, these operations can be completed within the expected time. 3.7 query in addition to traffic processing, the telecommunications system also contains a large number of O & M code. For example, when deleting a subscriber from a vswitch system, we need to search for data related to the subscriber in several tables, which requires a query language. The operation and maintenance code has the following features:
  1. It does not have or has very low real-time requirements;
  2. It reads, searches, and operates a large amount of traffic data;
  3. It constitutes a large part of the code volumn of the system );
  4. It is rarely executed, depending on the quality of the software and the number of bugs.
In this way, a query language that is strongly executed in the target system and can access the communication table can reduce O & M code and declare (being declarative) and automatically adapt to table changes and network topology. (A powerful query language which executes on the target system and has complete access to all traffic tables, can remedy by making the O & M code smaller and by being declarative and by being able to automatically adapt to chages in table layout or network topology .) because an optimized compiler is used to determine the query execution sequence, O & M code can become more efficient. The mnesia query language is based on list comprehension. This idea has been applied in several other functional DBMS, such as [15. The list comprehension syntax can be perfectly combined with the Erlang language. The schema alteration Erlang language has an extended support that enables the application system to modify the code in execution without stopping the process. This makes it possible to modify the release and organization of Erlang data without stopping services. That is to say, it can modify the mnesia database schema at runtime without stopping the system. Since mnesia is used to create a nonstop system, all system activities such as backup, schema modification, dump data tables to second-level storage, and copy copies (replicas) and the application system can still access and modify database tables as usual.
4. Implementation problems mnesia is fully implemented by Erlang. The Erlang programming environment is an ideal tool for implementing Distributed DBMS, the complete implementation of mnesia also includes all aspects of the system from the underlying storage management to the query optimization compiler. The implementation code is very small and there are about 20 thousand lines of Erlang code. The permanent storage mechanism is implemented by the file system of the operating system. The disadvantage is that the performance depends on disk operations. The good side is that the portability is good. Since mnesia is mainly used as a memory DBMS (primary memory DBMS), we think portability is more important. The tables and indexes in the primary memory (primary memory) Use a linear hash list to implement [13]. The second-level storage table (secondary storage tables) is implemented by the named file (named files. Each file is organized into a linear hash list. (A linear hash list with a medium chain length of the hash bucket set to a small value ). The query operation in the linear hash list is very efficient, and the insertion efficiency is good. The file and table sizes can be dynamically scaled. Space management of each file is implemented by the buddy algorithm. Mnesia Lock Management uses many traditional technologies. Locking is dynamic and the transaction gets a lock as needed. Conventional two-phase locking (regular TwoPhase locking) is also useful, and the prevention of deadlock is through the traditional waitdie [14]. The timestamp of the waitdie algorithm is obtained through Lamport clock, which is maintained by the Transaction Manager on each node. When a transaction is restarted, its Lamport clock is under maintenance, thus making mnesia live lock free as well. the lock manager also implements Multi-granularity locking. When a transaction is committed, the Transaction Manager uses the traditional two-phase commit. Relational Database operators can be used to evaluate simple queries, and recursive queries can be evaluated through SLG [3. Because mnesia runs on distributed Erlang, its implementation is very simple. In a distributed application system, many isolated Erlang nodes run on different machines. Eralng is responsible for communication between processes running on different nodes. Distributed Erlang can transparently traverse machines with different endianism structures, so that an mnesia system can be composed of many heterogeneous computer systems. Processes and nodes are easily started, monitored, and stopped by processes on other nodes. This eliminates the difficulty of implementing many communications between mnesia and application systems (this makes much of communication implementation difficulties disappear for mnesia as well as for applications ).
5. Performance Discussion we provide some measurements on mnesia in this section, and the chart clearly shows:
  • Compared with the dirty interface, the cost of using the transaction system is quite large. The correct explanation of this phenomenon is: Dirty interfaces are fast and transaction systems are slow;
  • Replication costs are quite high. The LAN used by the computer in the test is normal 10 Mbit/sec;
The computer in the test is three sun ultrasparcs running Solaris 2.5. All the transactions are initialized by a 167mhz ultrasparc, and the other two are 143 MHz.
Replica quantity

(Number of replicas)
1 2 3
Divorce/1 1877 5009 13372
Divorce/1 using wread 1225 4703 12185
Dirty divorce/1 181 592 1121
Table 1 different configurations execute the Wallclock for divorce/1 function, unit: Millisecond (MS) the first row of data is the result from the divorce/1 function running in section 2nd, the second row of data is the running result after we replace read/1 in the function with the mnesia function wread/1. The wread function sets a write lock instead of a read lock) read data. If we know in advance that the next operation will write the same object, this function is more efficient, so that the lock does not need to be updated from the read lock to the write lock. The last row of data comes from reading and writing these replication tables using dirty functions (dirty functions), which uses lightweight locks and bypasses the transaction system.

6 conclusion

There are a lot of DBMS options available now, including many available commercial systems and countless research systems. It seems that using commercial DBMS is a good choice, however, considering the factors mentioned in section 3rd, there is no suitable commercial DBMS available. We believe that our main contribution lies in:

  • By combining many existing technologies, we have implemented a complete Distributed DBMS. Many research organizations only study some aspects of DBMS. We have implemented a completely Distributed DBMS, few such systems exist;
  • We have demonstrated that Erlang is not only suitable for telecommunications systems but also for implementing a DBMS system, such as mnesia. As we know, this is the first time someone has implemented a Distributed DBMS using symbolic programming language;
  • We provide a comprehensive DBMS solution, at least in terms of data management in the telecommunications system.

Today, the mnesia system has been used in Ericsson to build real software products. mnesia is no longer a prototype system, and it is mature enough to be labeled with products. You can use http://www.ericsson.se/erlangto understand the system.

References

  1. Armstrong, J. L ., williams, M. C ., wikstrom, C. and virding, S. R ., con current programming in Erlang, 2: nd ed. prentice Hall (1995)
  2. Bernstein, P. A., hadzilacos, V., Goodman, N. concurrency control and recovery in database systems Addison Wesley, 1987.
  3. Chen, A. W., Warren, d.s. query evaluation under the wellfounded se mantics Proc. ACM SIGACT-SIGMOD-SIGART Symp. on principles of Database SYS. whashington, 1993.
  4. Case, K. mccloghrie, M. Rose, S. waldbusser. Management Information Base for version 2 of the Simple Network Management Protocol (SNMPv2), Jan, 1996.
  5. Copeland, G., Maier, D. Making Smalltalk A Database System Proceedings of the 1984 ACM sigmod International Conference on management of data. pp. 316325. Boston 1984.
  6. Eswaran, K. P ., grey, J. N ., lorie, r.a. and traiger, I. l. the notions of consistence and predicate locks in a database system communications of ACM, 19 (11): 624633, November 1976.
  7. Faehndrich, M., morrisett, G., nettles, S., wing, J. Extensions to standard ml to support transactions ACM sigplan Workshop on Ml and its applications, June 2021,199 2.
  8. Goetz, G. query evaluation techniques for large databases acmcs 2 (25): 73170, June 1993.
  9. Grey, J. N. Notes on Database operating system: an advanced course lecture notes in computer science, Springer Verlag, Berlin. 1 (60): 393481,197 8.
  10. Grey, J. N ., lorie, R. A ., putzolo, G. r. and traiger, I. l. granularity of locks and degrees of consistency in a shared database IBM, Research Report limit 1654, September 1975.
  11. Hvasshovd, so ., torbjornsen, O ., bratsberg, S. E ., holager, P. the clustra Telecom Database: high availability, high throughput, and realtime response Proceedings of the 21st International Conference on very large databases, Zurich, Switzerland, pp. 469477, September 1995.
  12. Lamport, L. Time, clocks and the ordering of events I a distributed system ACM transactions on programming languages and systems, 21 (1): 558565, July 1878.
  13. Larsson, PA Larsson. Dynamic hash tables communications of the ACM, 31 (4), 1988
  14. Rosenkrantz, D. J ., stearns, R. e. and Lewis, P. m. system level concurrency control for distributed databases ACM transactions on database systems, 3 (2): 178198, June 1978.
  15. Trinder, p.w. And wadler, P. List comprehensions and the relational Cal culus Proceedings of the Glasgow 1988 Workshop on functional programming, Rothesay, August 1988, pp 115123.
  16. Ullman, J. Principles of database and Knowledgebase systems, Vol 2. Computer Science Press, 1989.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.