Before delving into the graph database, first understand the basic concepts of the attribute map. A property graph is a Vertex that consists of a vertex (edge), a label (lable), a relationship type, and a property. Vertices are also called nodes, and edges are also called relationships (relationship); In graphs, nodes and relationships are the most important entities, all nodes are independent, and nodes are labeled, so nodes with the same label belong to a group, a set, and relationships are grouped by relationship type. A relationship of the same type belongs to the same collection. The relationship is directed, both ends of the relationship are the starting and ending nodes, and the direction is identified by a directed arrow, and the bidirectional relationship between the nodes is identified by a two-directional relationship. A node can have 0, one, or more tags, but the relationship must set the relationship type and only one relationship type can be set. The query language of NEO4J graph database is cypher, and it is the de facto standard in graphic language to manipulate property graph.
The articles of my NEO4J series are included in: neo4j
First, the basic concept of graphic database
NEO4J creates a graph based on the attribute graph model, in which each entity has an ID (identity) Unique identifier, each node is grouped by a label (lable), each relationship has a unique type, and the basic concepts of the property graph model are:
- entities (Entity) refer to nodes and relationships (relationship);
- Each entity has a unique ID;
- Each entity has 0, one or more properties, and the attribute key of an entity is unique;
- Each node has 0, one or more labels, belonging to one or more groupings;
- Each relationship has only one type, which is used to connect two nodes;
- Path is an ordered combination of entities (nodes and relationships) between the starting node and the terminating node;
- token (token) is a non-empty string used to identify a label (lable), a relationship type (relationship type), or a property key;
- tags : Used for tagging nodes, multiple nodes can have the same label, a node can have multiple lable,lable to group nodes;
- Relationship type : Used to mark the type of relationship, multiple relationships can have the same relationship type;
- property key : Used to uniquely identify an attribute;
- The property is a key-value pair (Key/value pair), each node or relationship can have one or more properties; The property value can be a scalar type, or a list (array) of this scalar type;
Second, graphic example
In the following figure, there are three nodes and two relationships with a total of 5 entities; person and movie are lable,acted_id and directed are relationship types, name,title,roles and so on are the properties of nodes and relationships.
Entities include nodes and relationships, nodes have tags and attributes, relationships are forward, link two nodes, and have attributes and relationship types.
1, Entity
In the sample drawing, there are three nodes, namely:
Contains two relationships, respectively:
- Two relationship types: Acted_in and directed,
- Two relationships: The Connection Name property is the relationship between the Tom Hank node and the movie node, and the connection Name property is the Forrest Gump node and the movie node.
One of the relationships is as follows:
2, label (lable)
In the graphical structure, labels are used to group nodes, which is equivalent to the type of nodes, and nodes with the same label belong to the same grouping. A node can have 0, one or more tags, so a node can belong to more than one grouping. Query the group, can narrow the scope of the query node, improve the performance of the query.
In the example drawing, there are two tags in person and movie, two nodes are person, one node is movie, the label is a bit like the type of node, but each node can have more than one label.
3, Attribute (property)
A property is a key-value pair (Key/value) that provides information for a node or relationship. In general, each node is used by the Name property to name the node.
In the sample drawing, the person node has two properties name and Title,movie nodes have two properties: Title and released,
The relationship type Acted_in has a property: roles, which is an array, and a relationship of type directed has no attributes
Three, traversal (traversal)
Traversing a graph refers to the node that accesses the graph along the relationship and its direction. The relationship is forward, connecting two nodes, from the starting node along the relationship, step-by-step navigation (navigate) to the end of the node is called traversal, traversing through the nodes and the orderly combination of relationships called paths (path).
In the sample drawing, look for the movie that Tom Hanks was in, and the process is to look for the target node labeled movie, starting with the Tom Hanks node, along the acted_in relationship.
Traversed path
Four, the pattern of the graph database
Patterns usually refer to indexes, constraints, and statistics. The NEO4J database can be modeless, and if you create a schema for the database, you can gain the convenience of query performance improvement and modeling through the schema.
1, Index
The graphical database can also create indexes to improve the query performance of the graphics database. Like a relational database, an index is a redundant copy of the graphics data, improving the performance of the data search with additional storage space and performance at the expense of data writes, avoiding the creation of unnecessary indexes, which can reduce the performance penalty for data updates.
NEO4J creates an index on one or more properties of a graph node, after the index is created, when the graphical data is updated, NEO4J is responsible for automatic updating of the index, the data of the index is synchronized in real time, and the index is automatically applied when querying the indexed properties neo4j to get the improvement of query performance.
For example, use cypher to create an index:
Create index on:P erson (FirstName) CREATE index on:P Erson (FirstName, surname)
2, constraint
In the graphics database, you can create four types of constraints:
- Node attribute value Unique constraint (unique node property): If the node has the specified label and the specified attributes, then the attribute values of the nodes are unique
- Node attribute exists constraint (node property existence): The node created must have a label and a specified property
- Relationship attribute existence constraint (Relationship property existence): Created relationship exists type and specified property
- node key constraint: The specified attribute must exist in the node in the specified label, and the combination of property values is unique
For example, create a constraint using cypher:
CREATE CONSTRAINT on (book:book) ASSERT BOOK.ISBN is UNIQUE; CREATE CONSTRAINT on (book:book) ASSERT exists (BOOK.ISBN); CREATE CONSTRAINT on ()-[like:liked]-() ASSERT exists (like.day); CREATE CONSTRAINT on (N:person) asserts (N.firstname, n.surname) is NODE KEY;
3, statistical information
When querying a graphical database using cypher, the Cypher script is compiled into an execution plan that executes the execution plan to obtain the results of the query. In order to generate a performance-optimized execution plan, NEO4J needs to collect statistics to optimize the query. When the statistics change to a certain assignment, neo4j needs to regenerate the execution plan to ensure that the cypher query is performance-optimized, and neo4j stored statistics include:
- The number of nodes with a certain label.
- Selectivity per index.
- The number of relationships by type.
- The number of relationships by type, ending or starting from a node with a specific label.
By default, NEO4J automatically updates statistics, but updates to statistics are not real-time, and updating statistics can be a time-consuming operation, so neo4j runs in the background and updates statistics only if the changed data reaches a certain threshold.
Neo4j keeps the statistics up to date in different ways. For label counts For example, the number was updated whenever you set or remove a label from a node. For indexes, NEO4J needs to scan the full index to produce the selectivity number. Since This is potentially a very time-consuming operation, these numbers was collected in the background when enough data The index has been changed.
NEO4J the execution plan is cached, and the execution plan is not regenerated until the statistical information changes. With configuration options, neo4j is able to control the rebuild of the execution plan:
- dbms.index_sampling.background_enabled: Whether the index information is counted in the background, because the execution plan of the Cypher query is based on the statistical information generated, Timely updating of indexed statistics is important for generating performance-optimized execution plans;
- dbms.index_sampling.update_percentage: How much of the data in the index is updated before updating the index's statistics;
- cypher.statistics_divergence_threshold: When statistics change, NEO4J does not immediately update the execution plan of the cypher query, but only when the statistical information changes to a certain extent, NEO4J will not regenerate the execution plan.
Reference Documentation:
Graph Database Concepts
Chapter 3. Cypher
neo4j Second article: Graphic database