Data model and Query language------"Designing data-intensive Applications" Reading notes 2

Source: Internet
Author: User

Data models are the most important part of developing software because they have a profound impact on the application: not only how the software is written, but also how we solve the problem. The second reading note, we talk about the design of the data model.

1. Layering of data models

As a developer, there are many layered models in a complex application, but the basic idea is the same: each layer provides a clean data model that hides the underlying complexity. This abstraction allows different groups of people to work together effectively.

Each data model contains assumptions about how to use it. Some uses are easy, some are not, some are fast, some are poorly executed, some are natural, others are clumsy. It is important to choose the data model that is appropriate for your application because the data model has a profound impact on what the application can and cannot do on its top level.

(In this chapter, we will comb through all kinds of data models and query languages derived from different data models)

2. Data Model
    • Relational Data Model
      Currently, the most far-reaching data model for computer science is SQL, which is based on Edgar Codd, which presents a relational model for organizing tables (tables in SQL), where each tuple is called a row and the row is an unordered collection (rows in SQL). The goal of the relational data model is to hide the implementation details behind a cleaner interface.

    • Non-relational data model (NoSQL)
      The non-relational data model has some advantages over the relational data model. This includes:
    • Very large data capacity with very high read and write throughput.
    • Specialized query operations that are well supported
    • Data model is more flexible

Give me a chestnut:

Most application development is now done using object-oriented programming languages, which leads to criticism of the flexibility of the SQL data model: Data is stored in relational tables, and application code requires a clumsy transformation layer between the object and the database model for tables, rows, and columns. ( i.e. the ORM we use everyday)
LinkedIn is a popular career profile site, and we look at the differences between using different data models.

    • In the traditional SQL model, the most common normalization representation is to place location, education, and contact information in a separate table with a foreign key table reference to the user table, as shown in. The problem is obvious, and the dependencies between multiple tables greatly complicate the writing of the application.

    • The JSON model reduces the problem of matching between the application code and the storage layer, and it is more flexible. As shown, the JSON representation has better locality than the multi-table pattern. If you want to get information such as education or careers, in a multi-table model you need to perform multiple queries (querying each table through user_id) or performing a multi-table join operation. in the JSON data model, all the relevant information is in one place, and one query is enough to complete.

( Note: In the previous paragraph of the example, the IDs given by region_id and industry_id are not plain text strings "Greater Seattlearea" and "philanthropy". There are several considerations: (1) Avoid ambiguity (2) can be unified Update (3) can be better localized to adapt to different languages. the advantage of using IDs is that because it doesn't make sense to humans, it doesn't need to change: The ID can remain the same, even if the information it identifies is changed. Anything that is meaningful to humans may need to change at some point in the future, and if the information is copied, all redundant copies need to be updated. This can lead to write overhead and the risk of inconsistencies. The list of regions and industries can be small and slow to change so that applications can simply keep them in memory. **)

The flexibility of a document-based data model:

Flexibility is critical when an application wants to change its data format. For example, suppose we store the full name of each user in a single field in the database, and now we want to store the names and surnames separately.

  • In the document database, you only need to start writing new documents with new fields and have code in your application that handles the reading of old documents.

    if (user && user.name && !user.first_name) {   user.first_name = user.name.split(" ")[0];}
  • In the relational database schema, the model is usually modified in such a way that:

    ALTERTABLEADDCOLUMN first_name text;UPDATESET‘ ‘1UPDATESET‘ ‘1);

    Running the UPDATE statement on a table with a large amount of data may be slow on any database, because each row needs to be rewritten. If this is unacceptable, the application can let first_name set the flexibility of the document database in such a way that it defaults to filling in the read.

Summary: The main advantages of a document-based data model are pattern flexibility, better performance in locality, and better performance advantages for programs that often require access to the entire document. For a particular application, it is closer to the data structure used by the application. Using the document model is a good choice if the data in your application has a structure similar to the document (that is, a one-to-many relationship tree, which usually loads the entire tree at once). Relational data models are more suitable for relational data models by providing better connection support, many-to-one, and many-to-many relationships, and if your application uses many-to-many relationships. By generating multiple requests in the database, you can simulate the connection in your application code, but this also moves the complexity into your application.

( the document database begins to support relational queries between tables, connection operations.) The relational database begins with the introduction of JSON and XML support. Hybrid data model may be the direction of database development )

3. Data Query Language

Do not know people have wood have to imagine a problem, why we have SQL language. The logic of using SQL statements itself can also be expressed in programming languages, why do you need to use another way of superfluous to express the data model?

In fact, this answer is not necessary, that is to say, we can directly use the programming language to interact with the data. (such as: MongoDB is the use of JS as the native Interactive language.) But most of the programming languages we use directly are imperative languages , and the query language of algebraic relational declarations like SQL has some advantages over the data model.

Give me a chestnut:

For example, if you have a list of animal species, you need to return the Sharkon the list:

As shown, the command language tells the computer to perform certain operations in a certain order. You can step through the code one line at a time, evaluate the condition, update the variable, and decide whether to cycle again. In a declarative query language like SQL or relational algebra, you only need to specify the pattern of the data you want, what conditions the results must meet, and how you want to transform the data (for example, sorting, grouping, and aggregation) rather than the specific implementation process. The query optimizer of the database system determines which indexes and which connection methods can be used, and the order in which the various parts of the query are executed.

    • Declarative query languages are generally more concise and easier to use than the imperative language APIs. But more importantly, it hides the implementation details of the database engine, which allows the database system to introduce performance improvements without requiring any changes to the query.
    • However, SQL is more functionally limited and has limited flexibility, which gives the database more automatic space for optimization.
    • Declarative languages are generally suitable for parallel execution because they specify only the pattern of the result, not the algorithm used to determine the result.
4. Summary

The data model is a huge theme for all the different data models. are now widely used, and their respective fields are very good. A model can be modeled with another model, for example, document-type data can be represented in a relational database, but the results are often clumsy. That's why we have different systems for different purposes, rather than a single one-size-fits-all solution.

Data model and Query language------"Designing data-intensive Applications" Reading notes 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.