The difference and relation between graph database and relational database

Last Update:2017-03-19 Source: Internet

Author: User

Tags data structures neo4j value store

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I recently used a graphical database to support a start-up project. In the process of using this kind of graphics database is actually very interesting. So here's a brief introduction to you.

The NoSQL database is believed to have been heard. They can often be used to deal with a series of problems that traditional relational databases are difficult to solve. Typically, these NoSQL databases are divided into four Graph,document,column family and Key-value store. These four types of databases use different data structures, respectively, to record the information. So the scenarios they apply to are different.

The most special one is the graph database. It can be said that it is very different from a series of other NoSQL databases: a rich relationship representation, complete transactional support, but no pure scale-out solution.

In this article, we will make a brief introduction to the industry's very popular graphical database neo4j.

Introduction to Graphical databases

I believe that you and I, in the use of relational database often encounter a series of very complex design problems. For example, each actor in a movie often has a supporting role in the main character, but also has the Director, special effects and other people's participation. Often these people are abstracted into the person type, which corresponds to the same database table. At the same time a director can also be an actor in other movies or TV dramas, more likely a singer, or even an investor in some film and television company (yes, I do use Vicki as a template for this example). And these studios are often a series of films, the management of TV dramas. This interrelated relationship is often complex, and there are often several different relationships between the two entities:

When trying to model these relationships using a relational database, we first need to establish a series of tables representing various entities: a table representing people, a table for a movie, a table for a TV show, a table for a movie company, and so on. These tables often need to be linked by a series of related tables: These relate tables to record which movies a person has been in, what dramas they have played, what songs they have sung, and what companies they are investing in. At the same time we need to create a series of related tables to record who is the protagonist in a movie, who is a supporting role, who is the director, who is the special effects and so on. As you can see, we need a large number of association tables to document this complex set of relationships. After more entities are introduced, we will need more and more association tables, which makes the relational database-based solution cumbersome and error-prone.

The crux of all this is that the relational database is designed for the basic idea of entity modeling. The design philosophy does not provide direct support for the relationships between these entities. When it is necessary to describe the relationship between these entities, we often need to create an association table to record the association between these data, and these association tables are often not used to record data other than the exception key. In other words, these association tables also simply simulate the relationship between entities through the functionality already available in a relational database. This simulation leads to two very bad results: the database needs to indirectly maintain the relationship between the entities through the association table, resulting in inefficient execution of the database, and a sharp increase in the number of associated tables.

How low is the effectiveness of this implementation? Take the example of building investment relationships between people and movies. A design that uses an associated table is often as follows:

If we now want to find all the investors in a movie through this relationship, what do relational databases often do? First, perform a table scan operation in the associated table (assuming no index support) to find records that match the values of all film fields to the target movie ID. Next, you can find the corresponding record from the person table by the primary key value of the person recorded by the person field in these records. If there are fewer records, this step uses the clustered Index seek operation (assuming that the operator is used). The time complexity of the entire operation changes to O (NLOGN):

You can see that relationships organized by association tables are not performing well at run time. If the datasets we need to manipulate contain a lot of relationships and are primarily working on these relationships, you can imagine how bad the performance of the relational database will become.

In addition to performance, the management of the number of associated tables is also a very annoying problem. We just gave an example with four entities: people, movies, TV dramas, film companies. Real-life examples are not so simple. In some scenarios, we often need to model more entities to fully describe the relationships within a given domain. This relationship may include the controlling relationship of the film and television companies, the complex shareholding relationship between the holding companies and the loan and collateral relationship between the companies, and the relationship between the people, the relationship between the individual and the brands, and the relationships between the brands and their respective companies.

As you can see, traditional relational databases are overwhelmed by the need to describe a large number of relationships. It can take on more entities but the relationship between entities is slightly simpler. For this kind of entity relationship is very complex, often need to record data in the relationship, and most of the operation of the data is related to the situation, the original support of the relational graphics database is the right choice. It can not only bring us operational performance improvements, but also greatly improve the efficiency of system development, reduce maintenance costs.

In a graph database, the main composition of the database mainly has two kinds, the node set and the connection node relationship. A node set is a collection of nodes in a graph that is closer to the most commonly used tables in a relational database. And the relationship is the unique composition of the graph database. Therefore, for a person accustomed to the development of relational database, how to correctly understand the relationship is the key to correct use of the graphical database.

But don't worry, after you understand how the graph database abstracts the data, you'll feel that the data abstraction is actually very close to the relational database. Simply put, each node still has a label that identifies its own entity type, as well as the set of nodes it belongs to, and records a series of attributes that describe the node's attributes. In addition, we can also connect each node through a relationship. So the abstraction of each node set is actually somewhat similar to the abstraction of individual tables in a relational database:

But when it comes to representing relationships, relational and graphical databases are very different:

As you can see, when you need to represent many-to-many relationships, we often need to create an association table to record many-to-many relationships for different entities, and these association tables are often not used to record information. If there are multiple relationships between two entities, then we need to create multiple association tables between them. In a graphical database, we only need to indicate that there is a different relationship between the two, for example, with Directby relationship to the director of the film, or the Actby relationship to designate the actors involved in the film shooting. At the same time, in actby relationships, we can also indicate whether it is starring in the movie by attributes in the relationship. And from the name of the relationship shown above, it can be seen that the relationship is forward. If you want to establish a two-way relationship between two node sets, we need to define a relationship for each direction.

In other words, the relationships in a graph database can provide a richer representation of relationships with the ability to include attributes, relative to the various relational tables in a relational database. Therefore, compared to the relational database, the user of the graphical database will have an additional weapon when abstracting things, that is, a rich relationship:

So when defining data representation for a graphical database, we should abstract the things that need to be presented in a more natural way: first defining the set of nodes for those things, and defining the individual properties of the node set. Next, identify the relationships between them and create the corresponding abstractions for those relationships.

So the data that is hosted in a graphics database will eventually have a structure similar to the one shown:

Design a high-quality diagram

After learning the basics of the graphical database, we're going to start experimenting with the graphical database. The first thing we need to figure out is how do we define a well-designed diagram for our graphical database? In fact, this is not difficult, you just need to understand the design of the graph database to use a series of points.

The first is to distinguish between node sets, nodes, and relationships in the graph. In the past design of relational database, we often use a table to abstract a class of things. As with the concept of human beings, we often abstract a table and add a record that represents two people in a table, Alice and Bob:

In the graph database, there are two concepts: node set and node. In general, data presentation in a graphical database does not use a node set, but rather a separate node:

If you need to add support for books to the diagram, these books will still be represented as a node:

In other words, although the concept of a node set is often in a graph database, it is no longer the most important abstraction for the graph database. Even from the point of view that some graphics databases have allowed software developers to use schemaless nodes, they have weakened the concept of node sets. In turn, the angle of our thinking should be the individual of the nodes and the series of relationships that exist between these individuals.

So can we arbitrarily define the data that each node has? No. One of the most common criteria here is that schemaless is good for you. Weak-type languages, for example, can provide greater development flexibility for software developers than strongly typed languages, but they are often less maintainable and rigorous than strongly typed languages. In the same way, flexibility and maintainability are also required when using schemaless nodes.

This allows us to add a variety of relationships to the nodes, rather than having to worry about the need to record some foreign keys by changing the schema of the database, as in a relational database. This in turn allows software developers to add a variety of relationships between nodes:

Therefore, in a graphical database, the concept of node set is not the most important category of the concept. For example, in some graphical databases, the IDs of individual nodes are not organized according to the node set, but are given by the order in which they are created. When you debug, you may find that the ID of the first node in a node set is 1, and the ID of the second node is 3. The node with the ID of 2 is in the other node set.

So how do we define an appropriate diagram for the business logic? Simply put, a single thing should be abstracted as a node, while nodes of the same type are recorded in the same node set. There may be some differences in the data contained in nodes within a node set, such as a person may have different responsibilities and thus be associated with different relationships and other nodes. For example, a person may be an actor, a director, or an actor and director. In a relational database, we may need to create different tables for actors and directors. In the graphical database, these three types of people are the data in this node set, but the difference is that they are connected to different nodes through different relationships. In other words, in a graphical database, the node set is not as small as the table in the relational database.

Once the individual node sets have been abstracted, we need to find out the possible relationships between these nodes. These relationships are more than just cross-node sets. Sometimes, these relationships are the relationships between nodes within the same node set, or even the same node that points to itself:

These relationships usually have a starting point and an end point. In other words, the relationships in the graphical database are often positive. If you want to create a relationship between two nodes, such as Alice and Bob, we need to create two know_about relationships between them. One of the relationships is directed to Bob by Alice, and the other relationship is directed by Bob to Alice:

It is important to note that although the relationship in the graph database is one-way, in some graphical database implementations, such as neo4j, we can find not only the relationship from a node, but also the individual relationships that point to a node. In other words, although the relationship in the diagram is one-way, the relationship can be found at both the start and end points.

The difference and relation between graph database and relational database

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More