https://www.ibm.com/developerworks/cn/java/j-lo-neo4j/
NEO4J is a high-performance NoSQL graphics database. NEO4J uses graph-related concepts to describe data models, saving data as nodes in diagrams and relationships between nodes. The relationship between data in many applications can be modeled directly using the concept of nodes and relationships in the diagram. For such applications, using NEO4J to store data is very natural and better than using relational databases. This article has carried on the thorough introduction to the neo4j, and unifies the concrete example to carry on the detailed explanation, may let you have the deep understanding to the neo4j, thus may choose the neo4j in the application development appropriately as the storage way.
4 reviews
Fu Cheng, senior software engineer
June 20, 2013
Develop and deploy your next application on the IBM Bluemix cloud platform.
Get started with your trial
NEO4J Introduction
Data storage is generally an integral part of application development. The data generated and required in the application run is persisted in a specific format. A common task in application development is to convert between the domain object model of the application itself and the data storage format. If the data storage format is similar to the domain object model, the mapping required for conversion is more natural and easier to implement. For a particular application, the domain object model is determined by the characteristics of the application itself and is generally modeled in the most natural and intuitive way. So it is important to choose the appropriate data storage format. The most common data storage format currently is the relational database. A relational database is modeled by the entity-relationship model (E-R model), which is modeled by the relationship between tables and tables. There are many implementations of relational databases that can be used in real-world development, including open source and commercial. relational databases are suitable for storing tabular data of the type isomorphism of data entries. If the relationships between different objects in the domain object model are complex, you need to use tedious object-relational mapping techniques (object-relationship mapping,orm) to transform them.
For many applications, the domain object model is not intended to be stored in a relational database format. That's why non-relational databases (NoSQL) are popular. There are many types of NoSQL databases, including key-value pairs of databases, document-oriented databases, and graphical databases. The neo4j described in this article is the most important graphical database. NEO4J uses the concept of graph in data structures to model. The two most basic concepts in neo4j are nodes and edges. Nodes represent entities, and edges represent relationships between entities. Both nodes and edges can have their own properties. Different entities are linked together through a variety of relationships to form complex object graphs. NEO4J also provides the ability to find and traverse on the object graph.
For many applications, the domain object model itself is a graph structure. For such applications, it is most appropriate to use a graphical database such as NEO4J, because the cost of the model conversion is minimal. Using social network-based applications as an example, users are associated with different relationships as entities in the application, such as kinship, friend relationships, and colleague relationships. Different relationships have different properties. For example, a colleague relationship includes attributes such as the name of the company, the start time, the end time, and so on. For such applications, the use of neo4j for data storage, not only to achieve simple, and later maintenance costs are relatively low.
NEO4J uses the most common data structure of "graph" to model and make NEO4J's data model very strong in expressive ability. Data structures such as linked lists, trees, and hash tables can be abstracted into graphs. NEO4J also has the basic features of general databases, including transactional support, high availability, and high performance. NEO4J has been used in many production environments. The popular cloud application development platform Heroku also provides neo4j as an optional extension.
After a brief introduction of NEO4J, the basic usage of neo4j is described below.
Back to top of page
NEO4J Basic Use
Before using neo4j, you need to first understand the basic concepts in neo4j.
Nodes and relationships
The most basic concepts in the
neo4j are node and relationship (relationship). Nodes represent entities, represented by   Org.neo4j.graphdb.Node
interfaces. Between two nodes, you can have different relationships. The relationship is represented by org.neo4j.graphdb.Relationship
interface. Each relationship consists of three features, such as the start node, the terminating node, and the type. The presence of the starting and terminating nodes shows that the relationship is oriented, similar to the edges in the directed graph. In some cases, however, the direction of the relationship may not be meaningful and will be ignored during processing. All relationships are of a type and are used to differentiate between nodes with different meanings. When you create a relationship, you specify its type. The type of the relationship is represented by the org.neo4j.graphdb.RelationshipType
interface. Nodes and relationships can have their own properties. Each property is a simple name-value pair. The name of the property is , string
type, and the value of the property can only be the base type, string
type, and the base type and string An array of type
. A node or relationship can contain any number of properties. Methods for manipulating properties are declared in interface org.neo4j.graphdb.PropertyContainer
. Node
and relationship
interfaces are inherited from Propertycontainer
Interface. Common methods in Propertycontainer interfaces
include getting and setting for property values, GetProperty and SetProperty. The
below illustrates the use of nodes and relationships through specific examples.
This example is a simple song information management program used to record information about singers, songs, and albums. In this program, entities include singers, songs, and albums, and relationships include the release relationship between the singer and the album, and the inclusion relationship between the album and the song. Listing 1 shows an example of using neo4j to manipulate entities and relationships in a program.
Listing 1. Examples of use of nodes and relationships
private static enum Relationshiptypes implements RelationshipType { PUBLISH, contain } public void Usenodeandrelationship () { Graphdatabaseservice db = new Embeddedgraphdatabase ("Music"); Transaction tx = Db.begintx (); try { Node Node1 = Db.createnode (); Node1.setproperty ("name", "singer 1"); Node Node2 = Db.createnode (); Node2.setproperty ("Name", "album 1"); Node1.createrelationshipto (Node2, relationshiptypes.publish); Node node3 = Db.createnode (); Node3.setproperty ("Name", "Song 1"); Node2.createrelationshipto (Node3, relationshiptypes.contain); Tx.success (); } finally { tx.finish (); } }
In Listing 1, you first define two types of relationships. A general practice for defining relationship types is to create an enumeration type that implements the RelationshipType interface. The PUBLISH and contain in Relationshiptypes respectively represent the publishing and containment relationships. In a Java program, you can launch the NEO4J database in an embedded way, simply by creating an object of the Org.neo4j.kernel.EmbeddedGraphDatabase class and specifying the storage directory for the database file. When you use the NEO4J database, the actions that are modified typically need to be included in a transaction for processing. You can create a new node by using the CreateNode method of the Graphdatabaseservice interface. The Createrelationshipto method of the node interface can create a relationship between the current node and another node.
Another concept associated with nodes and relationships is the path. The path has a starting node, followed by a number of paired relationships and node objects. A path is the result of a query or traversal on an object graph. The Org.neo4j.graphdb.Path interface is used in neo4j to represent the path. The Path interface provides some operations on the nodes and relationships contained therein, including the Startnode and Endnode methods to obtain the starting and ending nodes, as well as the nodes and relationships methods to get the iterable interfaces that traverse all nodes and relationships. Is. The query and traversal on the graph are described in detail in the following subsections.
Working with Indexes
When the NEO4J database contains more nodes, it is difficult to quickly find nodes that meet the criteria. NEO4J provides the ability to index nodes, which can be quickly found based on index values. Listing 2 shows the basic usage of the index.
Listing 2. Examples of use of indexes
public void Useindex () { Graphdatabaseservice db = new Embeddedgraphdatabase ("Music"); Index<node> index = Db.index (). Fornodes ("nodes"); Transaction tx = Db.begintx (); try { Node Node1 = Db.createnode (); String name = "singer 1"; Node1.setproperty ("name", name); Index.add (Node1, "name", name); Node1.setproperty ("Gender", "male"); Tx.success (); } finally { tx.finish (); } Object result = Index.get ("name", "singer 1"). Getsingle () . GetProperty ("gender"); SYSTEM.OUT.PRINTLN (result); Output is "male"}
In Listing 2, the implementation object of the Org.neo4j.graphdb.index.IndexManager interface that manages the index can be obtained through the index method of the Graphdatabaseservice interface. NEO4J supports indexing of nodes and relationships. The Fornodes and Forrelationships methods of the Indexmanager interface can be used to get the index of the node and the relation respectively. The index is represented by the Org.neo4j.graphdb.index.Index interface, where the Add method is used to add a node or relationship to the index, and the Get method is used to find in the index based on the given value.
The traversal of graphs
The most practical operation on the graph is the traversal of the graph. Through the traversal operation, you can get information about the relationship between the nodes in the diagram. The neo4j supports a very complex graph traversal operation. You need to describe the way you traverse before you traverse it. The descriptive information for the traversal is made up of the following several elements.
- Traversed path: typically represented by the type and direction of the relationship.
- The order of traversal: Common traversal order has depth first and breadth first two kinds.
- Uniqueness of traversal: You can specify whether a repeating node, relationship, or path is allowed throughout the traversal.
- The decider of the traversal process: used to determine whether to continue the traversal during traversal, and to select the return result of the traversal process.
- Start node: The starting point for the traversal process.
The description of the traversal method in neo4j is represented by the Org.neo4j.graphdb.traversal.TraversalDescription interface. The Traversaldescription interface method allows you to describe the different elements of the traversal process described above. Class Org.neo4j.kernel.Traversal provides an implementation of a series of factory methods used to create different traversaldescription interfaces. An example of traversal is given in Listing 3.
Listing 3. Example of a traversal operation
traversaldescription td = Traversal.description (). Relationships ( relationshiptypes.publish). Relationships (Relationshiptypes.contain). Depthfirst (). Evaluator (Evaluators.prune Wherelastrelationshiptypeis (Relationshiptypes.contain)); Node node = index.get ("name", "singer 1"). Getsingle (); Traverser traverser = td.traverse (node); for (Path path:traverser) {System.out.println (Path.endnode (). GetProperty ("name")); }
In manifest 3 , a default traversal descriptor object is first created through the description method of the traversal class. The relationships method of the Traversaldescription interface allows you to set the type of relationship that is allowed to traverse while the Depthfirst method is used to set the traversal using depth first. The more complex is the evaluator method of the decider that represents the traversal process. The parameter of the method is the implementation object of the Org.neo4j.graphdb.traversal.Evaluator interface. The Evalulator interface has only one method evaluate. The parameter of the Evaluate method is the implementation object of the path interface, which represents the current traversal path, and the return value of the Evaluate method is the enumeration type Org.neo4j.graphdb.traversal.Evaluation, which represents a different processing policy. The processing strategy consists of two aspects: the first is whether to include the current node, and the second is whether to continue the traversal. The Evalulator interface's implementation needs to make the appropriate decision based on the current path at the time of the traversal, returning the value of the appropriate Evaluation type. Class Org.neo4j.graphdb.traversal.Evaluators provides some practical ways to create an implementation object for a common Evalulator interface. The Prunewherelastrelationshiptypeis method for the evaluators class is used in Listing 3. The implementation object of the Evalulator interface returned by the method is judged based on the type of the last relationship traversing the path, and if the relationship type satisfies the given condition, the traversal is no longer resumed.
The traversal in Listing 3 is about finding all the songs that a singer publishes. The traversal process begins with the node representing the singer, along the two types of relationships, relationshiptypes.publish and Relationshiptypes.contain, traversing in a depth-first manner. If the last relationship of the current traversal path is the Relationshiptypes.contain type, the last node of the path contains the song information that can terminate the current traversal process. The Traverse method of the Traversaldescription interface can be traversed from a given node. The result of the traversal is represented by the Org.neo4j.graphdb.traversal.Traverser interface, where all the paths contained in the result can be obtained from the interface. The end node of the path in the result is the entity that represents the song.
Back to top of page
NEO4J Practical Development
After introducing the basic usage of neo4j, the use of neo4j is illustrated below through specific cases. As a database, neo4j can be easily used in WEB application development, just as with relational databases such as MySQL, SQL Server, and DB2, which are commonly used. The difference is how to model the data in the application to fit the needs of the back-end storage method. The same domain model can be mapped to a e-r model in a relational database, or to a graph model in a graphical database. For some applications, mapping to a graph model is more natural because the various relationships between objects in the Domain model form complex graph structures.
The example used in this section is a simple microblogging app. In Weibo applications, there are two main entities, users and messages. Users can focus on each other and form a graph structure. Users post different microblogging messages. The entity that represents the microblogging message is also part of the diagram. From this perspective, a graphical database such as neo4j can be used to better describe the domain model of the application.
As with relational databases, when using NEO4J, you can use either Neo4j's own APIs or a third-party framework. The Spring Data project in the spring Framework provides good support for neo4j, which can be used in application development. The Spring Data project encapsulates CRUD operations in the NEO4J database, using indexes, and traversing diagrams, providing a more abstract and easy-to-use API and reducing the amount of code developers write by using annotations. The code for the example uses the NEO4J database through Spring Data. Here's how to use the Spring data and the NEO4J database with specific steps.
Development environment
The configuration of the development environment when developing with neo4j is relatively straightforward. Just use the address given in the reference resource to download the jar package of the neo4j itself and the jar packages that you rely on, and add it to the Java program's CLASSPATH. However, it is recommended to use tools such as Maven or Gradle to manage neo4j dependent dependencies.
Defining a data storage model
As mentioned earlier, there are two types of entities in the app, namely, users and messages. Both of these entities need to be defined as nodes in the object graph. The way in which entities are created in Listing 1 is not intuitive, and there are no specialized classes to represent entities, and later maintenance costs are higher. Spring Data supports the way that annotations are added to generic Java classes to declare nodes in NEO4J. Just add the org.springframework.data.neo4j.annotation.NodeEntity annotation to the Java class, as shown in Listing 4.
Listing 4. Declaring a node class using nodeentity annotations
@NodeEntity public class User { @GraphId Long ID; @Indexed String loginName; String DisplayName; String email; }
As shown in Listing 4, the user class is used to represent users as nodes in the diagram. The fields in the User class automatically become the properties of the node. Note Org.springframework.data.neo4j.annotation.GraphId indicates that the property is an identifier for an entity and can only use the Long type. The annotation org.springframework.data.neo4j.annotation.Indexed indicates that an index is added to the attribute.
The relationships between nodes are also declared as annotations, as shown in Listing 5.
Listing 5. Declaring relationship classes with relationshipentity annotations
@RelationshipEntity (type = "Follow") public class Follow { @StartNode User follower; @EndNode User followed; Date followingdate = new Date (); }
In Listing 5, the attribute type of the relationshipentity annotation represents the type of the relationship, and the startnode and Endnode annotations represent the starting and terminating nodes of the relationship, respectively.
You can also add a reference to the associated node in the class that represents the entity, such as the other fields in the User class given in Listing 6.
Listing 6. References to associated nodes in the User class
@RelatedTo (type = "Follow", direction = direction.incoming) @Fetch set<user> followers = new hashset<user& gt; (); @RelatedTo (type = "Follow", direction = direction.outgoing) @Fetch set<user> followed = new Hashset<user > (); @RelatedToVia (type = "PUBLISH") set<publish> messages = new hashset<publish> ();
As shown in Listing 6, the annotation relatedto represents the node associated with the current node through a relationship. Because the relationship is directed, you can declare the direction of the relationship through the direction property of the relatedto. For the current user node, if the terminating node of the follow relationship is the current node, the user who corresponds to the starting node of the relationship is a fan of the user who corresponds to the current node, denoted by "direction = direction.incoming". So the followers field represents a collection of fans for the current user, while the followed domain represents a collection of users who are concerned by the current user. Annotations Relatedtovia and relatedto are similar, except that Relatedtovia does not care about the direction of the relationship, only the type. Therefore, the messages domain contains a collection of messages that are published by the current user.
Data manipulation
After you have defined the data storage model, you need to create the appropriate classes to manipulate the data. The objects of the data manipulation are instances of nodes and relationship classes in the data model, including common CRUD, namely, create, read, update, and delete, as well as search through indexes and traversal on diagrams. Because these operations are implemented in a similar manner, Spring Data encapsulates these operations and provides a simple interface for using them. The core interface for data operations provided by Spring data is org.springframework.data.neo4j.repository.GraphRepository. The Graphrepository interface inherits from three interfaces that provide different functions: The Org.springframework.data.neo4j.repository.CRUDRepository interface provides save, delete, methods, such as FindOne and FindAll, are used for basic CRUD operations; Org.springframework.data.neo4j.repository.IndexRepository provides methods, such as Findbypropertyvalue, Findallbypropertyvalue, and findallbyquery, are used to find The org.springframework.data.neo4j.repository.TraversalRepository provides a findallbytraversal method that is used to Traversaldescription the description of the interface to perform the traversal operation.
Spring Data provides the default implementation for the Graphrepository interface. In most cases, you only need to declare an interface to inherit from the Graphrepository interface, and Spring Data will create an object of the corresponding implementation class at run time. The interface for manipulating the user's node class userrepository is shown in Listing 7.
Listing 7. Manipulating the Userrepository interface of the User class
Public interface Userrepository extends graphrepository<user> { }
As shown in Listing 7, the Userrepository interface inherits from the Graphrepository interface, and the user class is declared to operate through the generic type. The operation of the node class is relatively simple, and the operation of the relationship class is more complex. The implementation of the interface Publishrepository to operate the publishing relationship is given in Listing 8.
Listing 8. Manipulating the Publishrepository interface of the Publish class
Public interface Publishrepository extends graphrepository<publish> { @Query ("Start User1=node ({0})" + "Match user1-[:follow]->user2-[r2:publish]->followedmessage" + "return R2") list<publish> Getfollowingusermessages (user user); @Query ("Start User=node ({0}) match User-[r:publish]->message return R") list<publish> getownmessages ( User user);
In Listing 8, the Getfollowingusermessages method is used to get messages published by all other users that a user is interested in. The implementation of this method is accomplished through the traversal operation on the graph. Spring Data provides a simple query language to describe traversal operations. Declare the traversal method you are using by adding org.springframework.data.neo4j.annotation.Query annotations to the method. As an example of the traversal declaration of the Getfollowingusermessages method, "node ({0})" represents the current node, "start User1=node ({0})" means that the current node is traversed, and User1 represents the current node. "Match" is used to denote the conditions that should be met by the selected node over time. The condition "User1-[:follow]->user2-[r2:publish]->followedmessage", first through the type of follow relationship to find the User1 of concern to the user, user2 to express; PUBLISH, find the message published by User2. "Return" is used to return the result of the traversal, R2 represents a relationship of type PUBLISH, corresponding to the return value type list<publish> of the Getfollowingusermessages method.
Use in the app
After you have defined the interfaces for data operations, you can use them in your app's service layer code. Listing 9 shows how to do this when a user posts a new microblog.
Listing 9. How users post new tweets
@Autowired userrepository userrepository; @Transactional public void Publish (user user, String content) { Message message = new message (content); Messagerepository.save (message); User.publish (message); Userrepository.save (user); }
As shown in Listing 9, the Publish method is used to publish content-like tweets to users. The domain userrepository is a reference to the Userrepository interface, which is automatically injected by the Spring IoC container at run time, and the specific implementation of the interface is provided by spring Data. In the Publish method, you first create an object of the message entity class that represents the messaging node, and then save the node to the database by using the Save method. The implementation of the Publish method for the user class is shown in Listing 10, where the logic is to create an instance of the publish class to represent the publishing relationship and to establish a relationship between the user and the message entity. Finally, update the user object.
Listing 10. The Publish method of the User class
@RelatedToVia (type = "PUBLISH") set<publish> messages = new hashset<publish> (); Public Publish Publish (Message message) { Publish Publish = new Publish (this, message); This.messages.add (publish); return publish; }
After you have created the relevant service layer classes, you can expose the relevant rest services using JSON from the service layer, and then create an app's front-end presentation interface based on the rest service. The implementation of the interface is not related to the neo4j, here is no longer to repeat. The entire program is developed based on the Spring framework. Spring Data provides a separate configuration file namespace for neo4j, which makes it easy to configure neo4j in the spring configuration file. Listing 11 shows the Spring configuration file associated with neo4j.
Listing 11. Spring configuration file for neo4j
<neo4j:config storedirectory= "data/neo-mblog.db"/> <neo4j:repositories base-package= " Com.chengfu.neomblog.repository "/>
In Listing 11, the config element is used to set the data save directory for the NEO4J database, and the repositories element is used to declare the package name of the sub-interface of the Graphrepository interface of the node in the operation neo4j and the relationship class. Spring Data is responsible for scanning the Java package at run time and creating a corresponding implementation object for the interface it contains.
The complete code for the sample app resides on GitHub, see reference resources.
Using the neo4j native API
If you do not use the NEO4J support provided by Spring Data, you can develop it using the native API of neo4j. Just because the abstraction level of NEO4J's native API is low, it is not very convenient to use. Here is an example of a scenario in which a user posts a microblog in the app to show the basic usage of the native API, as shown in Listing 12.
Listing 12. Using the neo4j native API
public void Publish (string username, string message) { Graphdatabaseservice db = new Embeddedgraphdatabase ("Mblog"); index<node> Index = Db.index (). Fornodes ("nodes"); Node Uesernode = Index.get ("User-loginname", username). Getsingle (); if (Uesernode! = null) { Transaction tx = Db.begintx (); try { Node Messagenode = Db.createnode (); Messagenode.setproperty ("message", message); Uesernode.createrelationshipto (Messagenode, relationshiptypes.publish); Tx.success (); } finally { tx.finish ();}} }
As you can see from listing 12, the basic usage of the native API is to first find the node that represents the user that needs to be manipulated through the index of the NEO4J database, then create a node that represents the microblogging message, and finally establish a relationship between the two nodes. These steps are done using Neo4j's basic API.
Compared with the same functionality as in Listing 10 using spring data, you can see that the code using the native API is much more complex, and that using spring data is much simpler. Therefore, it is recommended to use Spring Data in real-world development.
Back to top of page
Summary
Relational databases are the primary choice for the way most applications use data storage for a long time. With the development of technology, more and more NoSQL databases are becoming popular. For the application developers, should not always blindly use the relational database, but according to the characteristics of the application itself, choose the most appropriate storage method. The NEO4J database is described as "graph" as a way of describing the relationship between data, and is well suited for applications that are organized using the graph structure in the data itself. This article provides a detailed introduction to the use of the NEO4J database to help developers understand and use the NEO4J database.
Graphic database neo4j development in combat