1. Summary
Because of the complex pattern of graph database and different information description, it is very difficult for non-professional users to query complex graph database. A good graph query engine should support multiple conversions-synonyms, acronyms, abbreviations, ontologies, and so on-and should be able to sort the search results nicely.
Based on this problem, this paper proposes a new query framework to facilitate user query, and frees up the user group scratching for constructing query graph.
2. Application Background 2.1 application
Graph database is also a popular way of data storage, such as knowledge map, information network and social network applications such as data are stored in the graph database. Because the graph data is modeless or the pattern is too complex and the information is described in many ways, the query based on the graph data becomes very difficult. For the average user is also a deterrent to hope.
Figure 2-1 A is part of the graph database, if the query about 30 years old and with "universityof California Berkley" and "mission:impssible" related actors, easy to figure 2-1 in the green and yellow parts are better results. Figure 2-1 B is a query that can express query semantics, but the existing Graph database query can only be queried to the green part or one of them. The reason is that the information of the nodes does not match, and the original query does not support semantic transformations or support only one conversion.
Figure 2-1 Graph Database G
This paper solves the problem can be described as: given a query Q and database G, to find out all the graph database by the Q conversion function can be transformed.
2.2 Abstract Definitions
Given a query q, a graph database G, and a series of conversion functions L, find the best K figure to match Q. where the conversion function L includes all conversions in table 2-1.
Table 2-1 the conversion functions supported in this article
Note: The methods in this article can be easily added to other conversion functions to meet different needs
3 Methods already available
The spark query requires only the user to enter a keyword, without having to enter a complex graph node relationship to get the query results. However, it can only extract string-like matches and modify them to support other transformations.
NEMA supports graph structure and string similarity matching (jaccard).
4 methods of this article 4.1 offline operation
4.1.1 Metric function
In the formula below, V is the node, E is the edge, φ (·). Represents a match, such as φ (v) that represents the node in the graph database G that matches the query graph v. If v can be changed to φ (v) by the I conversion function, then fi (v,φ (v)) = 1, whereas φ (v) = 0.
Junction matching cost:
Edge Matching Cost:
Graph matching function:
And the lower the value of the graph matching function p, the higher the quality of the graph φ (q) that matches the Q, that is, the query result should be the K-figure with the lowest p-value.
4.1.2 Parameter Determination
Make w={α1,α2, ..., β1,β2...}, then
where T represents the training set.
4.1.3 Cold start
The purpose of the boot is to generate a good query training set, so that the parameters of a good matching function can be obtained. Cold start Step:
(1) Randomly select some sub-graphs from the graph database as query template Q ';
(2) Transform some nodes and edges in the query template to get query q;
(3) Extract and Q exactly match the sub-graph QE;
(4) (q, Q ') and (q, Qe) constitute the training set.
4.2 Online Enquiry
The general graph query belongs to the Np-hard problem, which can be attributed to the sub-graph isomorphism problem, which proves that the problem is np-hard. So this paper has designed two heuristics to solve this problem.
4.2.1 Heuristic 1
When the cost of a graph match is added to a node, the matching score of each node represents the cost of matching the graph, including the node.
Each node calculation formula:
where Mji (t) (UI) represents the score of the UJ node's contribution to the UI matching of the T-iteration. The intuitive understanding of this formula is referenced in Figure 4-1. Where the left side of a, B represents the database, and the right represents the query graph.
Figure 4-1 The visual meaning of heuristic 1
4.2.2 Heuristic 2
A large number of nodes need to be computed when using heuristic 1. It is obtained by the node matching cost formula, and the same cost is the same for any query node v through the same conversion function. Based on this, the nodes converted by the same node are condensed into one node calculation, which can effectively reduce the number of node score calculation. A diagram made up of concentrated nodes becomes a schematic diagram. If there is an edge between two nodes in the query graph, the corresponding node in the schematic is connected, regardless of the graph database. The matching cost of the edge is the upper bound of the matching cost of all such edges in the graph database.
The solution steps based on this discovery problem are:
(1) structure schematic diagram;
(2) using heuristic 1 calculation on the schematic diagram;
(3) The score of the corresponding sub-graph in the original image is obtained by using the calculated results in the schematic diagram.
Loop execution until a k result is found.
Above for my thesis schemaless and structureless Graph querying-vldb2014 of personal understanding, of course, only introduced the main content of the paper, detailed explanation please see the PPT, address http:// download.csdn.net/detail/woniu317/7391539.