Graph database Neo4J

Last Update:2018-06-03 Source: Internet

Author: User

Tags neo4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to Neo4j data storage is generally an indispensable part of application development. The data generated and required during application running is persisted in a specific manner. A common task in application development is the mutual conversion between object models and data storage models in the application field. If the data storage type is compared with the domain object model

Introduction to Neo4j

Data storage is generally an indispensable part of application development. The data generated and required during application running is persisted in a specific format. A common task in application development is to convert object models and data storage formats in the application field. If the data storage format is similar to the domain object model, the ing relationships required for conversion are more natural and easier to implement. For a specific application, the domain object model is determined by the characteristics of the application. Generally, the most natural and intuitive method is used for modeling. Therefore, it is important to select a proper data storage format. Currently, the most common data storage format is relational database. Relational Database Modeling through the entity-relational model (E-R model), that is, the relationship between tables to modeling. In actual development, relational databases can be used for many implementations, including open-source and commercial ones. Relational databases are suitable for storing table data of the same type as data entries. If the relationships between different objects in the domain Object model are complex, you need to use the tedious Object-Relationship ing technology (ORM) for conversion.

For many applications, the domain object model is not suitable for converting to a relational database for storage. This is why NoSQL is becoming popular. NoSQL databases include key-Value Pair databases, document-oriented databases, and graphic databases. The Neo4j mentioned in this article is the most important graph database. Neo4j uses the graph concept in the data structure for modeling. In Neo4j, two basic concepts are node and edge. Nodes represent entities, while edges represent relations between entities. Nodes and edges can have their own attributes. Different entities are associated with different relationships to form complex object graphs. Neo4j also provides the ability to search and traverse object graphs.

For many applications, the domain object model itself is a graph structure. For such applications, it is best to use a graph database such as Neo4j for storage, because the minimum cost of model conversion is required. Take social network-based applications as an example. As an entity in applications, users are associated with each other through different relationships, such as relatives, friends, and colleagues. Different links have different attributes. For example, a colleague relationship includes the company name, start time, and end time. For such applications, the use of Neo4j for data storage is not only simple, but also low maintenance costs.

Neo4j uses the most common data structure such as "Graph" to model the data so that the data model of Neo4j is highly expressive. Data structures such as linked lists, trees, and scattered lists can be abstracted into graphs. Neo4j also has basic features of General databases, including transaction support, high availability and high performance. Neo4j has been applied in many production environments. The popular cloud Application Development Platform Heroku also provides Neo4j as an optional extension.

After briefly introducing Neo4j, the following describes the basic usage of Neo4j.

Basic use of Neo4j

Before using Neo4j, you must first understand the basic concepts in Neo4j.

Nodes and relationships

The most basic concept in Neo4j is node and relationship ). Nodes represent entitiesorg.neo4j.graphdb.NodeInterface. There can be different relationships between two nodes. Linkorg.neo4j.graphdb.RelationshipInterface. Each link consists of three elements: Start Node, end node, and type. The existence of the start and end nodes indicates that the relationship is directed, similar to the edge in the directed graph. However, in some cases, the link direction may not be meaningful and will be ignored during processing. All links are of a type to distinguish meaningful relationships between nodes. When creating a link, you must specify its type. The type of the link is determinedorg.neo4j.graphdb.RelationshipTypeInterface. Nodes and links can have their own attributes. Each attribute is a simple name-value pair. The property name isStringType, and the attribute value can only be of the basic type,StringTypes and basic types andStringType array. A node or link can contain any number of attributes. The method declaration for Attribute operations is on the Interfaceorg.neo4j.graphdb.PropertyContainer.NodeAndRelationshipAll interfaces are inherited fromPropertyContainerInterface.PropertyContainer InterfaceCommon methods include obtaining and setting attribute valuesGetProperty and setProperty.The following uses a specific example to describe the use of nodes and links.

This example is a simple song information management program used to record information about singers, songs and albums. In this program, entities include singers, songs, and albums. The relationship includes the publishing relationship between singers and albums, and the inclusion relationship between albums and songs. Listing 1 provides an example of using Neo4j to operate entities and relationships in a program.

List 1. Examples of nodes and links

Private static enum RelationshipTypes implements RelationshipType {PUBLISH, CONTAIN} public void useNodeAndRelationship () {GraphDatabaseService db = new EmbeddedGraphDatabase ("music"); Transaction tx = db. beginTx (); try {Node node1 = db. createNode (); node1.setProperty ("name", "singer 1"); Node node2 = db. createNode (); node2.setProperty ("name", "album 1"); node1.createRelationshipTo (node2, RelationshipTypes. PUBLISH); Node node3 = db. createNode (); node3.setProperty ("name", "1"); node2.createRelationshipTo (node3, RelationshipTypes. CONTAIN); tx. success ();} finally {tx. finish ();}}

In listing 1, two link types are defined first. The general way to define a link type is to create an enumeration type that implements the RelationshipType interface. PUBLISH and CONTAIN RelationshipTypes indicate the publishing and inclusion relationships, respectively. In Java programs, you can start the Neo4j database by embedding it. You only need to create the org. neo4j. kernel. EmbeddedGraphDatabase Class Object and specify the storage directory of the database file. When using the Neo4j database, the modification operation generally needs to be included in a transaction for processing. You can use the createNode method of the GraphDatabaseService interface to create a new node. The createRelationshipTo method of the Node interface can create a relationship between the current Node and another Node.

Another concept related to nodes and relationships is path. The path has a starting node, followed by several paired relationships and node objects. The path is the result of querying or traversing on the object graph. In Neo4j, The org. neo4j. graphdb. Path interface is used to represent the Path. The Path interface provides operations to process the nodes and links contained in the interface, including the startNode and endNode methods to obtain the start and end nodes, and the nodes and relationships methods to obtain the implementation of the Iterable interface that traverses all nodes and links. The following section describes how to query and traverse a graph.

Use Index

When the Neo4j database contains a large number of nodes, it is difficult to quickly find the nodes that meet the conditions. Neo4j provides the ability to index nodes and quickly locate corresponding nodes based on the index value. Listing 2 shows the basic usage of indexes.

Listing 2. Index usage example

Public void useIndex () {GraphDatabaseService db = new EmbeddedGraphDatabase ("music"); Index
 
  
Index = db. index (). forNodes ("nodes"); Transaction tx = db. beginTx (); try {Node node1 = db. createNode (); String name = "singer 1"; node1.setProperty ("name", name); index. add (node1, "name", name); node1.setProperty ("gender", "male"); tx. success ();} finally {tx. finish ();} Object result = index. get ("name", "singer 1 "). getSingle (). getProperty ("gender"); System. out. println (result); // output as "male "}

In Listing 2, you can use the index method of the GraphDatabaseService interface to obtain the implementation object of the org. neo4j. graphdb. index. IndexManager interface for managing indexes. Neo4j supports indexing of nodes and links. You can use the forNodes and forRelationships methods of the IndexManager interface to obtain the indexes on nodes and links respectively. Index by org. neo4j. graphdb. index. the add method is used to add the node or link to the Index, and the get method is used to search for the Index according to the given value.

Graph Traversal

The most practical operation on a graph is graph traversal. Through traversal, you can obtain information related to the relationship between nodes in the graph. Neo4j supports very complex graph traversal operations. Before traversing, you must describe the Traversal method. The description information of the Traversal method is composed of the following elements.

Traversal path: Usually expressed by the link type and direction.
Traversal sequence: the common traversal sequence is depth first and breadth first.
Uniqueness of traversal: You can specify whether repeated nodes, links, or paths are allowed in the whole traversal.
The decision maker of the traversal process: used to determine whether to continue the traversal during the traversal process, and select the returned results of the traversal process.
Start node: the starting point of the traversal process.

The description of the traversal method in Neo4j is represented by the org. neo4j. graphdb. traversal. TraversalDescription interface. You can use the TraversalDescription interface to describe the different elements of the traversal process described above. Class org. neo4j. kernel. Traversal provides a series of factory methods for creating different implementation of the TraversalDescription interface. Listing 3 provides an example of traversal.

Listing 3. Examples of traversal operations

TraversalDescription td = Traversal. description (). relationships (RelationshipTypes. PUBLISH ). relationships (RelationshipTypes. CONTAIN ). depthFirst (). evaluator (Evaluators. prunewherelstrelationshiptypeis (RelationshipTypes. CONTAIN); Node node = index. get ("name", "singer 1 "). getSingle (); Traverser traverser = td. traverse (node); for (Path path: traverser) {System. out. println (path. endNode (). getProperty ("name "));}

In listing 3, a default Traversal description object is created through the description method of the Traversal class. The relationships method of the TraversalDescription interface can be used to set the type of the link that can pass through the time, and the depthFirst method is used to set the depth-first traversal mode. The more complex method is the evaluator method that represents the decision maker in the traversal process. The parameter of this method is the implementation object of the org. neo4j. graphdb. traversal. Evaluator interface. The Evalulator interface has only one evaluate method. The evaluate method parameter is the implementation object of the Path interface, indicating the current traversal Path. The return value of the evaluate method is org of the enumeration type. neo4j. graphdb. traversal. evaluation, indicating different processing policies. The processing policy consists of two aspects: the first is whether to include the current node, and the second is whether to continue traversing. The implementer of the Evalulator interface must make a decision based on the current path of the time and return the appropriate Evaluation value. Class org. neo4j. graphdb. traversal. Evaluators provides some practical methods to create implementation objects for commonly used Evalulator interfaces. The prunewherelstrelationshiptypeis method of the Evaluators class is used in listing 3. The implementation object of the Evalulator interface returned by this method is determined based on the type of the last link in the traversal path. If the link type meets the given conditions, the traversal will not continue.

The traversal operation in listing 3 is used to find all the songs published by a singer. The traversal process starts from the node indicating the artist and goes through the relationship of RelationshipTypes. PUBLISH and RelationshipTypes. CONTAIN in depth-first mode. If the last link of the current traversal path is the RelationshipTypes. CONTAIN type, it means that the last node of the path contains the song information and the current traversal process can be terminated. You can use the traverse method of the TraversalDescription interface to traverse a given node. The traversal result is represented by the org. neo4j. graphdb. traversal. Traverser interface. All paths contained in the result can be obtained from this interface. The ending node of the path in the result indicates the entity of the song.

Neo4j Development

After introducing the basic usage of Neo4j, the following describes the use of Neo4j through specific cases. As a database, Neo4j can be easily used in Web application development, just like relational databases such as MySQL, SQL Server, and DB2. The difference lies in how to model the data in the application to meet the requirements of backend storage. The same domain model can be mapped either to a E-R model in a relational database or to a graph model in a graphic database. For some applications, it is more natural to map to the graph model, because the relationships between objects in the domain model form a complex graph structure.

The example used in this section is a simple Weibo application. Weibo applications mainly have two types of entities: Users and messages. Users can pay attention to each other to form a graph structure. Users publish different Weibo messages. The entity of the Weibo message is also part of the figure. From this perspective, using a graphic database such as Neo4j can better describe the domain model of the application.

Just like using relational databases, when using Neo4j, you can use both the APIs of Neo4j and the third-party framework. The Spring Data Project in the Spring framework provides good support for Neo4j and can be used in application development. The Spring Data Project encapsulates CRUD operations in the Neo4j database, indexing and graph traversal operations, and provides more abstract and easy-to-use APIs, you can use annotations to reduce the amount of code that developers need to write. The sample code uses Spring Data to use the Neo4j database. The following describes how to use Spring Data and Neo4j databases through specific steps.

Development Environment

When using Neo4j for development, the development environment configuration is relatively simple. Download the jar package of Neo4j and the dependent jar package according to the address given in the reference resource, and add it to the CLASSPATH of the Java program. However, we recommend that you use Maven or Gradle to manage Neo4j dependencies.

Define a data storage model

As mentioned above, there are two types of entities in an application: user and message. The two entities must be defined as nodes in the object graph. The method given in Listing 1 for object creation is not intuitive, and there is no special class to represent the object, resulting in a high maintenance cost. Spring Data supports adding annotations to Java classes to declare nodes in Neo4j. You only need to add the org. springframework. data. neo4j. annotation. NodeEntity annotation to the Java class, as shown in Listing 4.

Listing 4. Using NodeEntity annotation to declare a node class

  @NodeEntity  public class User {  @GraphId Long id;  @Indexed  String loginName;  String displayName;  String email;  }

As shown in Listing 4, the User class is used to represent the User as a node in the figure. The fields in the User class are automatically attributes of nodes. The org. springframework. data. neo4j. annotation. GraphId annotation indicates that this attribute is used as the entity identifier and can only be of the Long type. Org. springframework. data. neo4j. annotation. Indexed indicates adding an index to the attribute.

The relationships between nodes are also declared using annotations, as shown in listing 5.

Listing 5. Using RelationshipEntity annotation to declare a link class

 @RelationshipEntity(type = "FOLLOW")  public class Follow {  @StartNode  User follower;  @EndNode  User followed;  Date followingDate = new Date();  }

In listing 5, the property type of the RelationshipEntity annotation indicates the type of the link, And the StartNode and EndNode Annotations indicate the start and end nodes of the link respectively.

You can also add references to the associated nodes in the class that represents the object, such as other fields in the User class given in Listing 6.

Listing 6. References to associated nodes in the User class

 @RelatedTo(type = "FOLLOW", direction = Direction.INCOMING)  @Fetch Set
 
   followers = new HashSet
  
   ();  @RelatedTo(type = "FOLLOW", direction = Direction.OUTGOING)  @Fetch Set
   
     followed = new HashSet
    
     ();  @RelatedToVia(type = "PUBLISH")  Set
     
       messages = new HashSet
      
       ();

As shown in Listing 6, the annotation RelatedTo indicates the node associated with the current node through a certain link. Because the link is directed, you can use the direction attribute of RelatedTo to declare the direction of the link. For the current user node, if the end node of the FOLLOW link is the current node, it indicates that the user corresponding to the Start Node of the link is a fan of the user corresponding to the current node, and "direction = Direction. INCOMING. Therefore, the followers field represents the set of current users, while the followed field represents the set of users that the current user is interested in. Annotations RelatedToVia and RelatedTo have similar functions, except that RelatedToVia does not care about the link direction and only cares about the type. Therefore, the messages field contains a set of messages published by the current user.

Data Operations

After defining the data storage model, you must create a class to operate the data. The objects of data operations are nodes and relational instances in the data model. The operations involved include common CRUD, that is, creating, reading, updating, and deleting, it also includes search through indexes and graph traversal operations. Since these operations are implemented in a similar way, Spring Data encapsulates these operations and provides simple interfaces for use. The core Data operation interface provided by Spring data is org. springframework. Data. neo4j. repository. GraphRepository. The GraphRepository interface inherits from three interfaces that provide different functions: org. springframework. data. neo4j. repository. the CRUDRepository interface provides methods such as save, delete, findOne, and findAll for basic CRUD operations; org. springframework. data. neo4j. repository. indexRepository provides methods such as findByPropertyValue, findAllByPropertyValue, and findAllByQuery for searching based on indexes. org. springframework. data. neo4j. repository. traversalRepository provides the findAllByTraversal method, which is used to describe Based on the TraversalDescription interface. To perform the traversal operation.

Spring Data provides the default implementation for the GraphRepository interface. In most cases, you only need to declare an interface that inherits from the GraphRepository interface. Spring Data will create an object of the corresponding implementation class at runtime. The UserRepository interface for operations on the User's node-class users is shown in listing 7.

Listing 7. Operate the UserRepository interface of the User class

 public interface UserRepository extends GraphRepository
 
   {  }

As shown in listing 7, the UserRepository interface inherits from the GraphRepository interface and declares that the User class is to be operated through the generic declaration. Operations on node classes are relatively simple, while operations on Relational classes are relatively complicated. In listing 8, The PublishRepository interface is provided to operate the publishing relationship.

Listing 8. PublishRepository interface for Publish operations

 public interface PublishRepository extends GraphRepository
 
   {  @Query("start user1=node({0}) " +             " match user1-[:FOLLOW]->user2-[r2:PUBLISH]->followedMessage" +             " return r2")  List
  
    getFollowingUserMessages(User user);  @Query("start user=node({0}) match user-[r:PUBLISH]->message return r")  List
   
     getOwnMessages(User user);  }

In listing 8, The getFollowingUserMessages method is used to obtain messages published by all other users that a user is interested in. This method is implemented through the traversal operation on the graph. Spring Data provides a simple query language to describe traversal operations. Add the org. springframework. data. neo4j. annotation. Query annotation to the method to declare the Traversal method used. Take the traversal declaration of the getFollowingUserMessages method as an example. "node ({0})" indicates the current node, and "start user1 = node ({0})" indicates that the traversal starts from the current node, use user1 to represent the current node. "Match" indicates the conditions that the selected nodes must meet. In the condition "user1-[: FOLLOW]-> user2-[r2: PUBLISH]-> followedMessage", first find the user followed by user1, it is represented by user2, and then the PUBLISH type is used to search for messages published by user2. "Return" is used to return the traversal result. r2 indicates the relationship of the PUBLISH type and the return value type List of the getFollowingUserMessages method. Correspondingly.

Use in applications

After defining the data operation interfaces, you can use these interfaces in the application service layer code. In listing 9, the operation method used to publish a new Weibo is provided.

Listing 9. How users publish new Weibo posts

 @Autowired  UserRepository userRepository;  @Transactional  public void publish(User user, String content) {  Message message = new Message(content);  messageRepository.save(message);  user.publish(message);  userRepository.save(user);  }

As shown in listing 9, the publish method is used to publish content-based Weibo posts to users. The domain userRepository is a reference to the UserRepository interface. The Spring IoC container automatically injects dependencies during runtime. The specific implementation of this interface is provided by Spring Data. In the publish method, create an object of the Message object class to represent the Message node, and then save the node to the database through the save method. The implementation of the publish method of the User class is shown in listing 10. The logic is to create a Publish class instance to indicate the publishing relationship and establish the relationship between the User and the message entity. Then, update the user object.

Listing 10. publish method of the User class

 @RelatedToVia(type = "PUBLISH")  Set
 
   messages = new HashSet
  
   ();  public Publish publish(Message message) {  Publish publish = new Publish(this, message);  this.messages.add(publish);  return publish;  }

After creating the relevant service layer class, you can expose the relevant REST service using JSON from the service layer, and then create the front-end display interface of the application based on the REST service. The implementation of the interface is irrelevant to Neo4j. The whole program is developed based on the Spring framework. Spring Data provides an independent configuration file namespace for Neo4j to facilitate the configuration of Neo4j in the Spring configuration file. Listing 11 shows the Spring configuration files related to Neo4j.

Listing 11. Spring configuration file of Neo4j

In listing 11, the config element is used to set the data storage directory of the Neo4j database, and the repositories element is used to declare the package name of the subinterface for operating nodes and GraphRepository interfaces in Neo4j. Spring Data scans the Java package at runtime and creates the corresponding implementation object for the interfaces contained in the package.

The complete code of the sample application is stored on GitHub. For more information, see references.

Use Neo4j native API

If you do not use the Neo4j support provided by Spring Data, you can use the native API of Neo4j for development. However, because the native API of Neo4j has a low abstraction level, it is not very convenient to use. The following shows the basic usage of native APIs in the scenarios where users publish Weibo in an example application. For details, see list 12.

Listing 12. Using Neo4j native API

 public void publish(String username, String message) {     GraphDatabaseService db = new EmbeddedGraphDatabase("mblog");     Index
 
   index = db.index().forNodes("nodes");     Node ueserNode = index.get("user-loginName", username).getSingle();     if (ueserNode != null){         Transaction tx = db.beginTx();         try {             Node messageNode = db.createNode();             messageNode.setProperty("message", message);             ueserNode.createRelationshipTo(messageNode, RelationshipTypes.PUBLISH);             tx.success();         } finally {             tx.finish();         }     }  }

From list 12, we can see that the basic usage of the native API is to first find the node that represents the user through the index of the Neo4j database, and then create a node that represents the Weibo message, finally, establish a relationship between the two nodes. These steps are completed using the basic APIs of Neo4j.

Compared with the method using Spring Data in listing 10, we can find that the code using native APIS is much more complicated, while using Spring Data is much simpler. Therefore, Spring Data is recommended in actual development.

Summary

Relational databases have been the primary choice for most applications for data storage for a long time. With the development of technology, more and more NoSQL databases are becoming popular. For application developers, instead of blindly using relational databases, they should choose the most suitable storage method based on the characteristics of the application. The Neo4j database uses "graphs" as the description of the relationship between data, which is very suitable for applications where the data itself is organized in an image structure. This article gives a detailed introduction to the use of the Neo4j database, which can help developers understand and use the Neo4j database.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More