Your microservices architecture requires a variety of data models. Should you choose a hybrid persistence or a multi-model database?
Over the past decade, large-scale distributed systems have exploded. This trend has led to a tremendous amount of creativity in the field of databases, which undoubtedly has no precedent in the history of the software industry. The result is a healthy and competitive database market that we can picking on a large number of platforms. But how should we choose?
In this article, we'll explore how to choose the database schema to validate against the application. (yes, there can be more than one choice!) , we'll also look at the choice of data patterns to help determine which technologies will be selected in the data tier.
Cloud architecture, NoSQL and MicroServices architectures
As developers begin to create scalable Web applications, historically dominant relational databases in the data architecture are beginning to show great pressure. We have developed very popular social applications and are starting to connect more and more devices to the Internet of Things (IoT). The large amount of data read and written by the user has led to the need to extend the data tier, resulting in a new type of database to meet these high scalability requirements.
In many cases, these new database "NoSQL" or "non-relational" solutions are based on a data model that differs from the traditional relational database model. NoSQL databases include documents, key-value pairs (key-value), column-based databases, and even graph databases. Typically, these databases sacrifice some of the common features of relational databases, such as strong consistency, acid transaction characteristics, and join connections.
At the same time, as with the changes in database technology, SOA (service-oriented architecture) has evolved into the architecture of microservices architectures at the beginning of this century, and many enterprises are beginning to abandon heavyweight SOA architectures such as Enterprise Service Bus (ESB) and tend to use "de-centralized" architectural methods. The charm of a microservices architecture is that its development, management, and extension services are relatively independent. This gives us a lot of flexibility in implementation, including infrastructure technologies such as databases.
For example, we assume that we are doing development work for a microservices architecture and look forward to the need for large scale scalability. Whether this project is a new application or a refactoring of existing applications, we have the opportunity to make new choices for the database.
Hybrid persistence (Polyglot persistence)
One of the key benefits of the MicroServices architecture style is the persistence of encapsulation. We can choose different persistence technologies according to the needs of each service. The method of choosing data storage based on the characteristics of each data type, known as hybrid persistence, was first promoted by people such as Martin Fowler. Hybrid persistence and microservices architectures are a perfect fit.
Shows a series of microservices, and how we choose different data patterns for each service. I do not want to select the appropriate use case for each type of database in this article. My intention is to highlight the advantages of various types of databases and why the hybrid persistence approach is commendable.
The team that develops service a, because the service is a core application based on large scale data management, may use tabular model databases such as Apache Cassandra. For example, a retail app inventory app may be good for Apache Cassandra. Cassandra provides a range of coordination mechanism tools, such as tunable consistency, batch processing, and lightweight transaction mechanisms, which can be used as an alternative to the complete acid transaction mechanism.
Service B supports the way in which values are looked up with well-known keywords, such as descriptive data for the product catalog. This is a good example of a key-value storage model, where we look for a series of data with a well-known key value, such as a product ID. Many memory caches use key values for data patterns to support large-scale fast reads.
Service C may focus primarily on semi-structured content, such as a Web site's form or page, and document storage may be well suited to that type of data. There are many similarities between document storage and key-value storage, but one key difference is that document-type data supports adding structures to data, such as indexing specific properties to support fast retrieval.
Service d may involve complex relationship navigation between data, such as customer data and contact history with customers in various departments of the organization. This may involve relationships between data types that are owned by other services. This is an interesting case, as it begins to have the opposite of the constraints of the respective data types for the services mentioned above. In this case, you can choose to create a graph with read-only access to the underlying table for your service, and then handle all the changes through the front door-that is, through the "front door" to invoke APIs for other services that have these data types.
Finally, we may have a legacy system or service that uses relational database technology, or we have a service that manages data that is less or infrequently changed. Relational databases may be perfectly suited to these scenarios.
Should a single service use hybrid persistence?
It is also possible that we can design a service that requires a variety of database support. For example, we can create a hotel service that uses the key-value storage mode as an index, mapping between hotel names and IDs, and storing descriptive data about the hotel in Cassandra.
Note that the name mapping to the ID can be implemented in Cassandra with a normalized design method, where a single table maintains the mapping of names to IDs. This uses more storage space, but reduces the operational complexity of managing individual key-value stores.
That's my recommendation. For a microservices, you should stick with a single data model (database) whenever it works. If you find that a single service requires two different database support, consider whether the granularity of the service may become too large. You may want to consider splitting the service into smaller services.
Tradeoffs of hybrid persistence limitations
The main disadvantage of hybrid persistence is the cost of supporting multiple technologies, both in the initial development phase and in future operations.
The main development cost is the need to train each developer to master each new database technology. This is very important, especially in the frequent mobile teams of developers.
Another cost is the operational cost of supporting multiple databases. This becomes a problem, especially when the database is centrally managed and the team has to maintain a high level of mastery over a variety of technologies, but this is not too prominent in a devops environment because the development team needs to support the databases that they select in the production environment.
Multi-model database (Multi models Databases)
As a complement to alternative options or hybrid persistence models, database vendors have started to build and promote multi-model databases. The term "model" refers to the core abstractions provided by the data store, such as tables (relational and non-relational), Columnstore, key values, documents, or diagrams. We can consider a multi-model application as an application that uses multiple data storage types, whereas a multi-model database is a database that supports multiple abstract models.
The DataStax Enterprise Edition (DSE) is a typical example of a multi-model database that supports Cassandra's partitioned row storage (tabular) model, and also supports an abstraction layer based on the graph above it (the DSE diagram). It is also easy for DSE to build the corresponding key values and document models on top of the core model, as shown in. In this way, we can modify the above hybrid persistence method to use an underlying database engine to provide the corresponding services for all of our services, while using a separate Cassandra Keyspaces to maintain clear boundaries between data owned by different services.
Here are some of the features it can achieve:
Table: Our main application service A can deal directly with the DSE database through the Cassandra Query Language (CQL).
Key-value pairs: Although both Apache and Cassandra distributed version DataStax do not provide an explicit key-value pair API, like service B can go through table design to support a single key value and column method to access
Cassandra, for example:
CREATE TABLE hotel.hotels (key uuid PRIMARYwww.qicaiyulept.cn key,value text); Or select a BLOB type
1
2
Document type: Cassandra supports document-style data by using JSON files, which can be used in service C. Note Because Cassandra needs to define the schema schema for the table, you cannot insert any new JSON columns, which is probably an attribute that is typically related to a document database.
Figure: For data with high correlation like service D, the DSE graph is a highly extensible graphical database built on the DSE database. The DSE diagram supports the powerful functionality and expressiveness of the Gremlin API from the Apache Tinkerpop project.
Advantages and limitations of multi-model databases
When considering whether to invest in a multi-model database (or a multi-model feature of a database you already use), consider the same development and operational costs that we discussed earlier about hybrid persistence.
Using a multi-model database can make operations simple. Even if different development teams work with different APIs and different interaction patterns and back-end database platforms, we only need to manage a platform to improve efficiency.
One of the issues to consider when choosing a multi-model database is how to support various models. A common approach is to base the database engine on a single native underlying model, while other models are built on top of it. The hierarchical data model is more capable of presenting the characteristics of the underlying basic model.
For example, in phase 16th of the ThoughtWorks technology radar, the features of the DSE database based on Cassandra are discussed, and the tradeoffs are also mentioned:
The Cassandra-based DSE map database positioning is a large-scale data set, compared with the neo4j of our long-term favorite began to show some limitations. This is a matter of trade-offs; For example, you will lose the properties of the acid transaction and the mode freedom of the NEO4J runtime, but you can access Cassandra's underlying tables, as well as the integration of analytics workloads and spark, with powerful tinkerpop/ Gremlin Query Language can be used, this is indeed a choice worth considering.
If you consider the various data types in your Web application, you may find that different data types have different requirements for consistency, and that the number of data types that actually require immediate consistency is relatively small.
The ThoughtWorks point cited above also mentions another important factor in considering a multi-model database?-? the integration and interaction between different models and data engines, as well as the use cases for various operations and analysis of data access. The DSE supports data analysis by using Spark (DSEwww.tkcyl1.com analysis) to access graph data, and the DSE search engine provides the ability to create various query indexes on the data in the DSE database.
Four steps for microservices data model operations
Now that we've explored the pros and cons of hybrid persistence and multi-model approaches, how do we decide which data models are suitable for massively scalable microservices applications? You can follow these steps:
1. Identify the primary data types in your application, create a service for each of these types, and have each service take control of the corresponding persistence layer. When possible, multiple model databases are used for all services, allowing services to be different in the model that interacts with the data.
2, using tabular (such as the DSE database) as the network level of scalability and availability of the main model, and then based on the need to build layered key-value pairs and document Data model. It is important to consider the various methods of accessing data in operational and analytic use cases in order to plan ahead for the use of features such as search indexing and replication in the data Analysis Center.
3, use the method to represent (that is, the DSE chart) highly correlated data, especially when the relationship between entities has multiple or multiple attributes, and the number is more than the entity's own attributes, or need to catch many-to-many relationships between the same entities.
4. Retain the legacy investment in relational database technology without the need for change. For example, when your case is large-scale, low latency, and high availability, use a traditional relational database.
I hope this article provides a useful framework for readers to consider how and how to support a multi-data model in an application, and when to consider using a multi-model database.
Jeff Carpenter, a technology evangelist at DataStax, leverages his knowledge of system architecture, MicroServices, and Apache Cassandra to help developers and operations engineers build scalable, reliable, and secure distributed systems. Jeff is the author of << Www.aboyule.org/Cassandra: The second edition of the authoritative guide.
Should you choose a hybrid persistence or a multi-model database?