1. Summary
It is very difficult for non-professional users to query complex graph database because of the complex pattern of graph database and different descriptive narration of information.
A good graph query engine should support multiple conversions-synonyms, acronyms, abbreviations, ontologies, and so on-and should be able to sort the search results very nicely.
Based on this problem, this paper proposes a new query framework to facilitate user query, and frees up the user group scratching for constructing query graph.
2. Application Background 2.1 application
Graph database is also a popular way of storing data. Data for applications such as knowledge maps, information networks, and social networks are stored in the graph database. Because the graph data is modeless or the pattern is too complex and the various descriptive ways of the information make the query based on the graph data become very difficult. For the average user is also a deterrent to hope.
Figure 2-1 A is part of the graph database, if the query is about 30 years old and is associated with "universityof California Berkley" and "mission:impssible" actors. The green and yellow parts of the easy-to-get figure 2-1 are the better results. Figure 2-1 B is a query that can express query semantics, but the existing Graph database query can only query to the green part or the one cannot be found.
The reason is that the information of the nodes does not match. The original query does not support semantic transformations or simply supports a transformation.
Figure 2-1 Graph Database G
The problem solved in this paper can be described as: given a query Q and database G, to find out all the graphs in the graph database transformed by the Q transformation function.
2.2 Abstract Definitions
Given a query q, a graph database G, and a series of conversion functions L, find the best K figure to match Q. The conversion function L contains all conversions in table 2-1.
Table 2-1 the conversion functions supported in this article
Note: The method in this article can easily add other conversion functions to meet different needs
3 Methods already available
The spark query simply requires the user to input keyword, without having to enter complex graph node relationships to get the query results. However, it can only extract string similarity matches, which can be changed to support other transformations.
NEMA supports graph structure and string similarity matching (jaccard).
4 methods of this article 4.1 offline operation
4.1.1 Metric function
In the formula below, V is the node, E is the edge, φ (·). Represents a match, such as φ (v) that represents the node in the graph database G that matches the query graph v. If v can be changed to φ (v) by the I conversion function, then fi (v,φ (v)) = 1. The inverse φ (v) = 0.
Junction matching cost:
Edge Matching Cost:
Graph matching function:
And the lower the value of the graph matching function p, the higher the quality of the graph φ (q) that matches the Q, that is, the query result should be the K-figure with the lowest p-value.
4.1.2 Number of parameters determined
Make w={α1,α2, ..., β1,β2...}, then
where T represents the training set.
4.1.3 Cold start
The purpose of the launch is to generate a good query training set so that the parameters of the matching function can be obtained. Cold start Step:
(1) Randomly select some sub-graphs from the graph database as query template Q ';
(2) Convert some nodes and edges in the query template into the conversion function to get query Q.
(3) Extract and Q exactly match the sub-graph QE;
(4) (q, Q ') and (q, Qe) constitute the training set.
4.2 Online Enquiry
The general graph query belongs to the Np-hard problem, which can be normalized to the sub-graph isomorphism problem, which proves that the problem is np-hard.
Therefore, this paper designs two inspiration-style to solve this problem.
4.2.1 Revelation Style 1
When the cost of the graph matching is added to a node, the matching score of each node can represent the cost of matching the graph.
Each knot is calculated as a formula:
The Mji (t) (UI) represents the score that the UJ node of the T iteration contributes to the UI matching of the nodes.
The intuitive understanding of the formula is shown in Figure 4-1. The left side of A and B represents the database, and the right side represents the query graph.
Fig. 4-1 The visual meaning of Revelation type 1
4.2.2 Revelation Style 2
A large number of nodes need to be calculated when using the Revelation 1. The calculation formula is obtained by the node matching cost. For random query node v the same conversion function matches the cost equally. Based on this, the nodes converted by the same node are condensed into one node calculation, which can effectively reduce the number of node score calculation. A diagram made up of concentrated nodes becomes a schematic diagram. If there is an edge between two nodes in the query graph, the corresponding nodes in the schematic are connected, regardless of the graph database. The matching cost of the edge is the upper bound of the matching cost of all such edges in the graph database.
The solution steps based on this discovery problem are:
(1) structure schematic diagram;
(2) using the Revelation Formula 1 on the schematic diagram;
(3) The score of the corresponding sub-graph in the original image is obtained by using the calculated results in the schematic diagram.
Loop until you find a K result.
The above is my personal understanding of the thesis schemaless and structureless Graph querying-vldb2014. Of course, only the main contents of the paper are introduced. For specific explanations, please see the ppt of the paper commentary, address http://download.csdn.net/detail/woniu317/7391539.
Graph database queries based on a variety of conversion semantics