Originally from: Http://www.infoq.com/cn/articles/MarkLogic-NoSQL-with-Transactions
The Java platform is painstakingly working to provide the developer with the database persistence functionality for almost the entire life cycle. Have you tried the early JDBC specification, EJB, O/R mappings such as Hibernate, or the recent JPA specification, you are unlikely to have encountered a relational database on this path. Perhaps you already understand the difference between object-oriented modeling and how relational databases store data (sometimes developers call it impedance mismatch).
Recently, however, the NoSQL database has arrived, and in many cases it provides a more natural fit from a modeling standpoint. Especially for document-oriented databases (such as MarkLogic, MongoDB, Couchdb, and so on), their rich JSON and/or XML persistence models effectively eliminate this impedance mismatch. When this becomes a boon to developers and productivity, developers in some cases have begun to believe that they must sacrifice some other, already accustomed features, such as ACID transaction support. The reason for this is that many NoSQL databases do not provide such functionality because of the tradeoff between greater flexibility and the extensibility that traditional relational databases do not. For many people, the root cause of this tradeoff is the cap theorem.
Cap theorem
Eric Brewer proposed a hypothetical concept in 2000 and is now called the cap theorem by the technical community. He discusses three system properties in a distributed database environment:
- consistency : all nodes see the same data at the same time;
- availability : Ensure that every system access request receives a successful or failed response;
- Delimited tolerance : the loss or failure of any information in the system does not affect the system's continued operation.
The consensus around the cap theorem is that for the three functions above, a distributed database system can only provide up to 2. As a result, most NoSQL databases refer to it as the basis for processing database updates using the final consistency model (sometimes called base-or basic availability, soft state, eventual consistency).
A common misconception, however, is that due to the cap theorem, it is not possible to create a distributed database with an ACID transaction capability. As a result, many people assume that distributed NoSQL databases and acid transactions are a pair that can never be fused. But this is not the case, in fact Brewer himself clarified some of his statements, particularly with regard to the concept of consistency, as it applies to acid.
It turns out that acid properties are important and their applicability has either been resolved or is being addressed by the updated database technology market. In fact, the authors of Big table white papers and implementations, such as Google's authority for large-scale data storage in distributed Web, are already implementing distributed database transaction capabilities through spanner projects.
As a result, business is back to the scope of NoSQL discussions. If you're a Java developer looking for NoSQL agility and scale, and still want acid transactional functionality, that's good news. In this article we will explore a NoSQL database: MarkLogic, how it provides a multi-statement transaction capability to Java developers without sacrificing other nosql advantages, such as agility, cross-hardware scale-out capabilities. Before proceeding, let's review the acid concept.
Acid Support
Let's look at the book definition of acid abbreviations first. We will define each term and discuss each of the important contexts:
- atomicity : This feature is fundamental to the concept of transactions, which describes the need for the database to facilitate the combination of data and manipulate the data in a "full or none" manner. Thus, for example, transactions arising from the borrowing of one account and the loan of another account must ensure that they occur (or do not occur) as a unit. This functionality is guaranteed not only during normal operation, but also in unexpected error conditions.
- Consistency : This attribute is closely related to atomicity, which means that the transaction must transition the database from one valid state to another (from a system standpoint). Thus, for example, if referential integrity or security constraints are defined for the data, consistency guarantees that the transaction does not violate any one constraint.
- Isolation : This feature applies to the behavior observed when concurrency occurs around database events. Its purpose is to ensure that a particular user's database operation isolates another operation. For this particular acid attribute, there are often multiple concurrency control options (that is, isolation level), different database control options may vary, and the same database system sometimes has different options. MarkLogic relies on a modern technology called multi-version concurrency control (MVCC) to enable isolation capabilities.
- Persistence : It ensures that once a transaction has been committed to the database, they will persist even if normal database operations are interrupted unexpectedly (network interrupts, power outages, and so on). Essentially this guarantees that once the database has committed data, it will not "lose" the data.
For a fully acid-capable database, all of the above properties typically work together, relying on concepts such as logs and transaction checkpoints to prevent data corruption and other undesirable side effects.
NoSQL and java-Basic write operations
Now let's put aside those book definitions and start a bit of concrete work on the form of these attributes in Java code. As mentioned earlier, our example NoSQL database is marklogic. We will start some housework first.
When using Java encoding (or even any other language) to establish a dialog with the database, the first thing we do is open a connection, which is handled by the databaseclient object in MarkLogic. To obtain such an object, we use Factory mode and invoke the databaseclientfactory object, as shown in the following example:
Open a connection on localhost:8072 with username/password//credentials of Admin/admin using DIGEST authenticationdata Baseclient client = databaseclientfactory.newclient ("localhost", 8072, "admin", "admin", authentication.digest);
Once a connection is established, there is another level of abstraction. The Java class Library provided by MarkLogic includes many features that are logically grouped to better organize these features. One of the ways we do this is to divide the functionality into some manager classes at the databaseclient level. For our first example, we will use the Xmldocumentmanager object to perform a basic insert operation. To get the Xmldocumentmanager instance, I use the factory method again, but this time from databaseclient, the example is as follows:
Get a document manager from the client Xmldocumentmanager docMgr = Client.newxmldocumentmanager ();
When working with data, MarkLogic is considered a "document-oriented" NoSQL database. This means that from the Java point of view, it is no longer dependent on the O/R mapping to serialize complex objects to relational database rows and columns, and objects can be simply serialized to a language-neutral and self-describing document or object format and no longer require complex mapping. Specifically, this means that as long as your Java object can be serialized to XML (for example, through JAXB or other tools) or JSON (for example, through Jackson or another class library), it can be persisted to the database as is, without the need for a predefined model in the database.
Let's look at the code:
//Establish a context object for the Customer class Jaxbcontext Customercontext = Jaxbcontext.newinstance ( Com.marklogic.samples.infoq.model.Customer.class); Get a new customer object and populate it customer customer = new Customer (); Customer.setid (1L); Customer.setfirstname ("Frodo"). Setlastname ("Baggins"). Setemail ("[email protected]"). Setstreet ("Bagshot Row, Bag End"). Setcity ("Hobbiton"). Setstateorprovince ("the Shire"); Get a handle for round-tripping the serialization jaxbhandle Customerhandle = new Jaxbhandle (customercontext); Customerhandle.set (customer); Write the object to the DB docmgr.write ("/infoq/customers/customer-" +customer.getid () + ". Xml", customerhandle); System.out.println ("Customer" + customer.getid () + "is written to the DB");
The above example uses JAXB, which is a way to store Pojo in MarkLogic (others include jdom, raw XML strings, JSON, and so on). JAXB needs us to create a context such as the Javax.xml.bind.JAXBContext class, which is the first line of code. For our first example, we used a JAXB annotated customer class, created an instance and set some data (note: This is just an example for demonstration purposes, so don't comment on the quality of the modeling). After that, we go back to marklogic details. To save the customer object, we first get a handle. Because we chose the Jaxb method, we created the Jaxbhandle using the previously instantiated context. Finally, we use the Xmldocumentmanager object to write the document to the database and make sure to give it a URI (that is, key) for identification.
When the above operation is complete, a customer object is saved to the database. The following shows the objects in the MarkLogic query console:
It is noteworthy that (in addition to our first customer is a famous Hobbit), we did not create any tables, nor configured and used any O/R mappings.
A transaction example
OK, we've seen a basic write operation, but what about the capabilities of the transaction? Let's look at a simple use case.
For example, we have an e-commerce site called ABC Business Network. On this site, you can buy almost anything that is a, B, or C in the first letter. As with many modern e-commerce sites, it is important for users to see the latest, accurate inventory. After all, to buy artichokes, tambourine or classic cars, consumers need to know what's in your warehouse.
To meet the above requirements, we can enable the ACID properties to ensure that when the product is purchased, the inventory reflects the purchase action (i.e. inventory reduction), from the database point of view is required to "all or no action." Therefore, whether the purchase transaction is successful or not, we can guarantee that the inventory status is accurate.
Let's take a look at the code:
Client = databaseclientfactory.newclient ("localhost", 8072, "admin", "admin", authentication.digest); Xmldocumentmanager docMgr = Client.newxmldocumentmanager (); Class[] classes = {Com.marklogic.samples.infoq.model.Customer.class, com.marklogic.samples.infoq.model.Invent Oryentry.class, Com.marklogic.samples.infoq.model.Order.class}; Jaxbcontext context = jaxbcontext.newinstance (classes); Jaxbhandle jaxbhandle = new Jaxbhandle (context); Transaction Transaction = client.opentransaction (); try {//Get the Artichoke inventory String artichokeuri= "/infoq/inven Tory/artichoke.xml "; Docmgr.read (Artichokeuri, Jaxbhandle); Inventoryentry artichokeinventory = Jaxbhandle.get (Inventoryentry.class); System.out.println ("Got the entry for" + Artichokeinventory.getitemname ()); Get the Bongo Inventory String bongouri= "/infoq/inventory/bongo.xml"; Docmgr.read (Bongouri, Jaxbhandle); Inventoryentry bongoinventory = Jaxbhandle.get (Inventoryentry.class); System.out.println ("GotThe entry for "+ Bongoinventory.getitemname ()); Get the airplane inventory String airplaneuri= "/infoq/inventory/airplane.xml"; Docmgr.read (Airplaneuri, Jaxbhandle); Inventoryentry airplaneinventory = Jaxbhandle.get (Inventoryentry.class); System.out.println ("Got the entry for" + Airplaneinventory.getitemname ()); Get the Customer docmgr.read ("/infoq/customers/customer-2.xml", Jaxbhandle); Customer customer = Jaxbhandle.get (Customer.class); System.out.println ("Got the Customer" + customer.getfirstname ()); Prep the order String itemname=null; Double itemprice=0; int quantity=0; Order order = New Order (). Setordernum (1). Setcustomer (customer); lineitem[] items = new Lineitem[3]; Add 3 artichokes itemname=artichokeinventory.getitemname (); Itemprice=artichokeinventory.getprice (); quantity=3; Items[0] = new LineItem (). SetItem (ItemName). Setunitprice (Itemprice) setquantity (quantity). Settotal (itemprice* Quantity); System.out.println ("Added Artichoke line item."); Decrement ArtichokE inventory Artichokeinventory.decrementitem (quantity); System.out.println ("decremented" + quantity + "artichoke (s) from inventory."); Add a bongo itemname=bongoinventory.getitemname (); Itemprice=bongoinventory.getprice (); Quantity=1; ITEMS[1] = new LineItem (). SetItem (ItemName). Setunitprice (Itemprice) setquantity (quantity). Settotal (itemprice* Quantity); System.out.println ("Added Bongo Line item."); Decrement Bongo Inventory Bongoinventory.decrementitem (quantity); System.out.println ("decremented" + Quantity + "Bongo (s) from inventory."); Add an airplane itemname=airplaneinventory.getitemname (); Itemprice=airplaneinventory.getprice (); Quantity=1; ITEMS[2] = new LineItem (). SetItem (ItemName). Setunitprice (Itemprice). setq Uantity (quantity). Settotal (itemprice*quantity); System.out.println ("Added Airplane line item."); Decrement Airplane inventory Airplaneinventory.decrementitem (quantity); System.out.println ("decremented" + Quantity + "airplane (s) from inventory."); Add all line items to the order Order.setlineitems (items); Add some notes to the order order.setnotes ("Customer may either has a dog or is possibly a talking dog."); Jaxbhandle.set (order); Write the order to the DB docmgr.write ("/infoq/orders/order-" +order.getordernum () + ". Xml", jaxbhandle); System.out.println ("Order is written to the DB"); Jaxbhandle.set (artichokeinventory); Docmgr.write (Artichokeuri, Jaxbhandle); System.out.println ("Artichoke inventory is written to the DB"); Jaxbhandle.set (bongoinventory); Docmgr.write (Bongouri, Jaxbhandle); System.out.println ("Bongo inventory is written to the DB"); Jaxbhandle.set (airplaneinventory); Docmgr.write (Airplaneuri, Jaxbhandle); System.out.println ("Airplane inventory is written to the DB"); Commit the whole thingtransaction.commit (); } catch (Failedrequestexception fre) {transaction.rollback (); throw new RuntimeException ("Things didn't not go as planned.") , fre);} CAtch (Forbiddenuserexception fue) {transaction.rollback (); throw new RuntimeException ("You don ' t has permission to do Su CH things. ", FUE); } catch (Inventoryunavailableexception iue) {transaction.rollback (); throw new RuntimeException ("It appears there ' s not E Nough inventory for something. Want to does something about it ... ", IUE); }
In the example above, we do a lot of things in a transactional context:
- Read the relevant customer and inventory data from the database;
- Create an order for the specified customer, which includes three items;
- For each commodity, reduce the corresponding inventory quantity;
- Commit all things as a transaction (or rollback on failure).
The code, semantically speaking, is still a whole or nothing unit of work, even if there are multiple update operations. If any part of the transaction goes wrong, it will be rolled back. In addition, those queries (acquiring customer and inventory data) are also within the viewable scope of the transaction. This also highlights another concept of the marklogic transaction function, which is multi-version concurrency control (MVCC). It means that the database queries (such as querying inventory) are valid at that point in time. In addition, because this is a multi-statement transaction, MarkLogic also does something that it usually does not do during a read operation, establishes a document-level lock (usually the read operation is unlocked), and therefore prevents "stale read (Stale read)" Scenarios in concurrent transactions.
So when we run the code successfully, we have the following output:
Got the entry for artichoke Got the entry for Bongo Got the entry for airplane Got the customer Rex Added Artichoke line I Tem. decremented 3 Artichoke (s) from inventory. Added Bongo Line item. Decremented 1 Bongo (s) from inventory. Added Airplane Line item. Decremented 1 airplane (s) from inventory. Order is written to the DB artichoke inventory is written to the DB Bongo inventory is written to the DB airplane Inven Tory is written to the DB
The result in the database will be an order with three items, while reducing the number of items in stock. To illustrate, here is the order XML and one of the inventory items that has been reduced (aircraft).
Now we can see that the number of aircraft in stock has dropped to 0, because we have only one in stock. Now we run the program again, forcing a transaction exception (albeit artificial), because the inventory is not enough. In this case, we choose to discard the entire transaction, and the error appears as follows:
Got the entry for artichoke Got the entry for Bongo Got the entry for airplane Got the customer Rex Added Artichoke line I Tem. decremented 3 Artichoke (s) from inventory. Added Bongo Line item. Decremented 1 Bongo (s) from inventory. Added Airplane Line item. Exception in thread "main" java.lang.RuntimeException:Things do not go as planned. At Com.marklogic.samples.infoq.main.TransactionSample1.main (transactionsample1.java:148) caused by: Java.lang.RuntimeException:It appears there ' s not enough inventory for something. Want to does something about it ... at Com.marklogic.samples.infoq.main.TransactionSample1.main (Transactionsa mple1.java:143) caused by:com.marklogic.samples.infoq.exception.InventoryUnavailableException:Not enough inventory . Requested 1 but only 0 available. At Com.marklogic.samples.infoq.model.InventoryEntry.decrementItem (inventoryentry.java:61) at Com.marklogic.sampl Es.infoq.main.TransactionSample1.main (transactionsample1.java:103)
This is a cool thing, the database is not updated and the entire transaction is rolled back. This is called a multi-statement transaction . If you are from a relational world, you are accustomed to this behavior. In the NoSQL world, however, this is not always the case. And MarkLogic does provide that capability.
The above example omits some of the other details of the real-world scenario, because we may choose other actions (such as ordering) for lack of inventory. However, in many business scenarios, atomicity requirements are very real, and without the ability to multi-statement transactions, it can be very difficult and error-prone.
Optimistic lock
In the example above, the logic is simple and very predictable, and in fact validates all four properties of acid. However, the attentive reader may have noticed that I mentioned that "MarkLogic also did something that it would not normally do in the read operation." As a side effect of MVCC, read operations are usually unlocked . Its implementation is to make the document visible to the read operation at a specific point in time, even if a modification occurs at this point. It is as if the document retains a copy of the read request and does not require a lock to prohibit the write operation. However, in some cases, a single document may be locked while reading. For example, in the above example, a read operation is performed in the context of a transaction. Why do we do this? In high concurrency applications, transactions occur in milliseconds or even shorter, and we want to make sure that when we read an object and possibly modify it, the other threads do not change its state until we complete the operation. In other words, we want to isolate our affairs. So when we execute the read in the transaction block, we express the intention to modify, so we have a lock to ensure the consistency of the entire transactional process.
However, most developers know that even a single file, even when there is no real lock contention between concurrent operations, is a cost to the lock. In fact, this overlap is less likely to occur by designing the speed at which we know the behavior and operations of the application. However, we still want to have fault protection, just in case there is such overlap. So what do we do when we want to perform a transactional update but just want to read the state of an object and don't want to have locking overhead during the read? The first is to place the read operation outside the transaction context so that it does not lock implicitly. The second is to use the documentdescriptor object. The purpose of this object is to take a snapshot of the state of an object at a point in time so that the service can determine whether the object has been modified after the object has been read and before the request is modified. This can be achieved by obtaining the document descriptor for the read operation and then passing the descriptor to the subsequent modification operation. Here is the sample code:
jaxbhandle jaxbhandle = new Jaxbhandle (context);//Get the Artichoke inventory String artichokeuri= "/infoq/inventor Y/artichoke.xml ";//Get a document descriptor for the URI documentdescriptor desc = docmgr.newdescriptor (Artichokeuri); Read the document but now using the Descriptor information Docmgr.read (DESC, jaxbhandle); Etc ... try {//etc ...//Write the order to the DB docmgr.write ("/infoq/orders/order-" +order.getordernum ( ) + ". Xml", jaxbhandle); System.out.println ("Order is written to the DB"); Etc ..... jaxbhandle.set (Artichokeinventory); Docmgr.write (DESC, updatehandle); Note:using the descriptor again//etc ..... transaction.commit (); }//etc ... catch (failedrequestexception fre) {//Do something about the failed request}
Doing so ensures that no corresponding lock is created for any read operation, and that the lock is used only for the modification operation. In this case, however, there is still another thread that is technically "sneaking in" and the possibility of modifying the same file as we begin to read and modify the file. But using the above technique, if this happens, it throws an exception to let us know. This is the optimistic lock, which is technically not locked in the process of reading because we are optimistic that there will be no change until we make the subsequent changes. When we do this, we tell the database that we believe most of the time will not have a quarantine violation, but if there is a problem, we hope to be able to observe. The advantage is that we do not add locks during read operations. But in rare cases (which we hope to be), when we have read an object and before modifying it, another thread modifies the same object, MarkLogic will track the revision number in the background and throw the failedrequestexception exception.
Another thing to be aware of is the need to explicitly declare an optimistic lock, essentially telling the service to follow the "version" behind the scenes. Here is an example of a complete service configuration and a practice of optimistic locking.
Software developers using software versioning tools such as CVS, SVN, and git are familiar with this behavior when working with module code. Most of the time we "Check out" the module code, but without locking it, we know that other people usually do not work at the same time in the same module. However, if we attempt to commit a change and the database considers it to be an "obsolete" copy, it will tell us that we cannot do so, because after we read it, someone else has already made the change.
Summarize
The above examples are simple, but the topic of acid transactions, optimistic locking is never easy, and NoSQL databases are not usually associated with them. However, the purpose of the MarkLogic service is to provide developers with powerful features that are easy to use, without sacrificing their own powerful features. For more detailed information, please visit this website. For an example of a multi-statement transaction used in this article, visit GitHub.