Graph database query based on multiple conversion Semantics

Source: Internet
Author: User
Tags abstract definition


1. Summary

Because of the complex pattern and different information descriptions of graph databases, it is extremely difficult for non-professional users to query complex graph databases. A Good Graph query engine should support conversion of synonyms, acronyms, abbreviations, and ontology, and should be able to sort search results well.

Based on this problem, this paper proposes a new query framework to facilitate user queries, freeing the user group scratching its ears in order to construct a query graph.

2. Background 2.1 Application

Graph database is also a popular data storage method. application data such as knowledge graph, information network, and social network are stored in graph database. Because the pattern-less or pattern of graph data is too complex and multiple Descriptive methods of information make it very difficult to query graph data. It is even more frustrating for general users.

Figure 2-1 a is part of the graph database. If the query is about 30 years old and related to "Universityof California Berkley" and "Mission: Impssible, in Figure 2-1, the green and yellow sections are relatively good results. Figure 2-1 B is a query that can express the query semantics, but the existing graph database query can only find the green part or one. The reason is that the node information does not match, and the original query does not support semantic conversion or only supports one conversion.

Figure 2-1 Figure database G

The problem solved in this article can be described as: Given a query Q and database G, find out all the diagrams in the graph database that can be converted by the Q conversion function.

2.2 abstract definition

Given a query Q, a graph database G, and a series of conversion functions L, find the best k subgraphs matching Q. Here, the conversion function L includes all conversions in Table 2-1.

Table 2-1 conversion functions supported in this document

Note: The methods in this article can be easily added to other conversion functions to meet different needs.

3. Existing methods

You only need to enter keywords for Spark query, and you do not need to enter complex graph node relationships to obtain the query results. However, it can only extract string similarity matching. By modifying it, it can support other conversions.

It supports graph structure matching and string similarity matching (Jaccard ).

4. Method 4.1 offline operations

4.1.1 measurement functions

In the following formula, "v" indicates a node, "e" indicates an edge, and "?" indicates a matching node. For example, "Phi (v)" indicates a node in graph database G that matches the query graph v. If v can go through the I-th Conversion Function and change to Phi (v), fi (v, Phi (v) = 1; and vice versa (v) = 0.

Node matching cost:


Edge Matching cost:


Graph Matching functions:


The smaller the P value of the easy-to-obtain graph matching function, the higher the quality of the Q-matched graph. That is, the query result should be k subgraphs with the smallest P value.

4.1.2 parameter confirmation

Set W = {α 1, α 2 ,... ; β1, β2...}, Then

T indicates the training set.

4.1.3 cold start

The purpose of enabling this function is to generate a good query training set to obtain the parameters of a good matching function. Cold start steps:

(1) randomly select some subgraphs from the graph database as the query template Q ';

(2) convert some nodes and edges in the query template using the conversion function to obtain the query Q;

(3) extract the subgraph Qe exactly matched with Q;

(4) (Q, Q') and (Q, Qe) form a training set.

4.2 online query

Generally, graph queries belong to the NP-hard problem, which can be reduced to the subgraph homogeneous problem, thus proving that the problem is NP-hard. Therefore, two heuristic methods are designed to solve this problem.

4.2.1 heuristic 1

When the cost of graph matching is accumulated to a node, the matching score of each node can represent the graph matching cost including the node.

Calculation Formula for each node:

Among them, mji (t) (ui) indicates the contribution of the uj node in the t iteration to the node ui matching. For more information about the formula, see Figure 4-1. The left side of a and B indicates the database, and the right side indicates the query graph.

Figure 4-1 intuitive meaning of heuristic 1

4.2.2 heuristic 2

When using heuristic 1 for computing, a large number of nodes need to be calculated. The formula for calculating the cost of node matching is available. For any query node v, the cost of matching through the same conversion function is the same. Based on this, nodes converted from the same node are concentrated into one node, which can effectively reduce the number of node scores. A summary chart consists of a concentrated node. If an edge exists between two nodes in the query graph, the corresponding nodes in the summary graph are connected, regardless of the graph database. The Edge Matching cost is the upper bound of the matching cost of all such edges in the graph database.

The steps for solving this problem are as follows:

(1) construct a summary chart;

(2) Use heuristic 1 for calculation on the summary graph;

(3) Calculate the score of the corresponding subgraph in the source image using the result calculated in the summary graph.

Run cyclically until k results are found.

 

The above is my personal understanding of the thesis Schemaless and Structureless Graph Querying-vldb2014, of course, it only introduces the main content of the paper, for detailed explanations, please see the paper to explain the ppt, address http://download.csdn.net/detail/woniu317/7.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.