Graph database query based on multiple transformation semantics

Source: Internet
Author: User


1. Summary

It is very difficult for non-professional users to query complex graph database because of the complex pattern of graph database and different descriptive narration of information.

A good graph query engine should support multiple conversions-synonyms, acronyms, abbreviations, ontologies, and so on-and should be able to sort the search results very nicely.

Based on this problem, this paper proposes a new query framework to facilitate user query, and frees up the user group scratching for constructing query graph.

2. Application Background 2.1 application

Graph database is also a popular way of storing data. Data for applications such as knowledge maps, information networks, and social networks are stored in the graph database. Because the graph data is modeless or the pattern is too complex and the various descriptive ways of the information make the query based on the graph data become very difficult. For the average user is also a deterrent to hope.

Figure 2-1 A is part of the graph database, if the query is about 30 years old and is associated with "universityof California Berkley" and "mission:impssible" actors. The green and yellow parts of the easy-to-get figure 2-1 are the better results. Figure 2-1 B is a query that can express query semantics, but the existing Graph database query can only query to the green part or the one cannot be found.

The reason is that the information of the nodes does not match. The original query does not support semantic transformations or simply supports a transformation.

Figure 2-1 Graph Database G

The problem solved in this paper can be described as: given a query Q and database G, to find out all the graphs in the graph database transformed by the Q transformation function.

2.2 Abstract Definitions

Given a query q, a graph database G, and a series of conversion functions L, find the best K figure to match Q. The conversion function L contains all conversions in table 2-1.

Table 2-1 the conversion functions supported in this article

Note: The method in this article can easily add other conversion functions to meet different needs

3 Methods already available

The spark query simply requires the user to input keyword, without having to enter complex graph node relationships to get the query results. However, it can only extract string similarity matches, which can be changed to support other transformations.

NEMA supports graph structure and string similarity matching (jaccard).

4 methods of this article 4.1 offline operation

4.1.1 Metric function

In the formula below, V is the node, E is the edge, φ (·). Represents a match, such as φ (v) that represents the node in the graph database G that matches the query graph v. If v can be changed to φ (v) by the I conversion function, then fi (v,φ (v)) = 1. The inverse φ (v) = 0.

Junction matching cost:


Edge Matching Cost:


Graph matching function:


And the lower the value of the graph matching function p, the higher the quality of the graph φ (q) that matches the Q, that is, the query result should be the K-figure with the lowest p-value.

4.1.2 Number of parameters determined

Make w={α1,α2, ..., β1,β2...}, then

where T represents the training set.

4.1.3 Cold start

The purpose of the launch is to generate a good query training set so that the parameters of the matching function can be obtained. Cold start Step:

(1) Randomly select some sub-graphs from the graph database as query template Q ';

(2) Convert some nodes and edges in the query template into the conversion function to get query Q.

(3) Extract and Q exactly match the sub-graph QE;

(4) (q, Q ') and (q, Qe) constitute the training set.

4.2 Online Enquiry

The general graph query belongs to the Np-hard problem, which can be normalized to the sub-graph isomorphism problem, which proves that the problem is np-hard.

Therefore, this paper designs two inspiration-style to solve this problem.

4.2.1 Revelation Style 1

When the cost of the graph matching is added to a node, the matching score of each node can represent the cost of matching the graph.

Each knot is calculated as a formula:

The Mji (t) (UI) represents the score that the UJ node of the T iteration contributes to the UI matching of the nodes.

The intuitive understanding of the formula is shown in Figure 4-1. The left side of A and B represents the database, and the right side represents the query graph.

Fig. 4-1 The visual meaning of Revelation type 1

4.2.2 Revelation Style 2

A large number of nodes need to be calculated when using the Revelation 1. The calculation formula is obtained by the node matching cost. For random query node v the same conversion function matches the cost equally. Based on this, the nodes converted by the same node are condensed into one node calculation, which can effectively reduce the number of node score calculation. A diagram made up of concentrated nodes becomes a schematic diagram. If there is an edge between two nodes in the query graph, the corresponding nodes in the schematic are connected, regardless of the graph database. The matching cost of the edge is the upper bound of the matching cost of all such edges in the graph database.

The solution steps based on this discovery problem are:

(1) structure schematic diagram;

(2) using the Revelation Formula 1 on the schematic diagram;

(3) The score of the corresponding sub-graph in the original image is obtained by using the calculated results in the schematic diagram.

Loop until you find a K result.

The above is my personal understanding of the thesis schemaless and structureless Graph querying-vldb2014. Of course, only the main contents of the paper are introduced. For specific explanations, please see the ppt of the paper commentary, address http://download.csdn.net/detail/woniu317/7391539.

Graph database queries based on a variety of conversion semantics

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.