Experimental comparison between graph database and relational database--infocamere case of Italian Chamber of Commerce

Source: Internet
Author: User

Abstract: Infocamere is an IT company affiliated with the Italian Chamber of Commerce, primarily designed and developed with the latest and innovative IT solutions and services, through a network of connected chambers and its databases accessible to the public. With Infocamere, Italian and foreign companies, public institutions, industry associations, professional groups and simple citizens can easily access the latest official information and economic data for all businesses registered and operated in Italy. The Italian Chamber of Commerce serves and facilitates the exchange of information on public institutions in Italian companies through more than 300 branches throughout the country. Infocamere is the service system of the chambers of Commerce to help them achieve their business interests. Played a key role in the implementation of the Italian digital agenda, involving the digital transformation of national production systems, particularly in support of the digitization of small and medium-sized enterprises.

The experimental case is written by Luca Sinico (infocamere software developer)

Experiment Overview

Infocamere in the second half of 2016 to carry out the field of graphic database inspection, the purpose of the work is to investigate the main characteristics of the technology; Compare some of the available products and relational solutions on the market in terms of concepts and performance, and examine the adoption possibilities of some infocamere applications ' graphical databases. This work is based on the data set obtained by the Italian Enterprise registration summary and includes data on the enterprise equity participation. The node of the graph can be a natural person or a company, and collect data about denomination, company's share capital, registered country, unique fiscal identifier, etc. The edge of the graph represents the equity participation therein.

In our work, we examined two main types of graphical data models, the "property map" and "RDF". Although RDF represents an effective way to implement linked and Semantic web, although it organizes the data graphically, we find that the attribute map model (an "industry standard") is better able to meet our requirements. In fact, it allows you to define properties on edges. RDF does not allow direct use of it. In addition, the standard query language for RDF (SPARQL) presents some limitations for the query language that is typically provided by the DBMS that supports the attribute map model. Two simple examples are missing shortest path calculation functions and may represent the maximum depth level for variable-length path searches.

The data goes into the graphics database in process 2 as shown. Starting with a complex relational database that stores Italian business registrations, there are a number of headline searches that continue to be generated through user requirements or update operations. Title Search saves summary data obtained by combining different records from different tables, which is useful for some applications. Because of this, the data is placed on the relational database to support its operation. Since this second relational database is mainly focused on the aspects related to company equity participation, the graph database obtains data from it.

Inquire

The queries we develop can be used by the applications on these datasets, or the ability of the database management system can be slightly emphasized. In particular, we have developed some standard queries and some more specific queries.
In view of a particular company, determined by its "financial id", we ask its employees; it participates; or both; by limiting the search to only one depth level. However, we also require direct and indirect participation in a company (and also for colleagues). This corresponds to the exploration of graphs without depth limitations. Also, because the dataset makes up a graph (rather than a "simple" tree), the path between the two companies may be multiple. This allows us to request a complete list of direct paths to connect two companies, or it can be the shortest one. We also require the co-participation (or co-workers) of both companies. The graphical nature of the dataset also led to the development of another two queries: the first one returned with the retrieved participating node, and the retrieved depth value decreased; The second query calculates the associated companies for each depth level with a given node, but avoids counting them multiple times.
The development of queries is useful for investigative purposes and better data exploration experiences.

Comparison of graph database and relational database

We import the data set into the three most well-known graph database, namely Arangodb v3.0.10; Neo4j v3.0.6, and Orientdb v2.2.11 (both Community editions). We also imported the dataset into a well-known relational database: PostgreSQL v9.6.1. The choice of a relational database is not strongly constrained, because performance is primarily affected by SQL language performance. These products are already installed on virtual servers and are resource-neutral, so the results are also useful for companies with similar hardware availability. For each of these queries, we have selected three nodes representing three different workloads of DBMS three. In particular: A node represents a lightweight case that may have fewer returned results, or a shorter exploration depth value; A node represents the middle case; There is also a heavyweight case. We have executed these queries more than once, so we have also examined the performance differences between deferred-loading caches.

Since there is currently no standard query language for the graphical database, each graphics DBMS provides its own query language. This has prompted us to evaluate the expression and usability of the various query languages.

Results

The results we collect are summarized as follows:

    • The chart database provides a number of deliberately designed query languages that greatly help to describe graph traversal queries and help address some of the typical computational problems in this area. With SQL or stored procedures, the same query is difficult to implement in an efficient manner.
    • Although relational databases perform well for simpler queries, the analysis shows that the performance of three graph databases is typically one or two orders of magnitude higher than the relational database for heavyweight cases of graph mining queries (i.e., those with large numbers of nodes to be analyzed and the high level of values to traverse).
    • Arangodb shows good import and query performance, especially for lightweight and medium workloads.
    • One of the concerns about the Arangodb beta version is that memory consumes RAM greedy. However, Arangodb claims to have solved this "problem" with their new version 3.2 and the new ROCKSDB storage engine.
Experimental conclusion

Due to good feedback in the research work, good performance in terms of import and execution time, good documentation, ease of use and reasonable commercial prices, ARANGODB has shown great potential in some of the infocamere applications. In the end, we decided to use ARANGODB in the demo application we are developing.

Some additional details about the comparison experiment can be found here.

English original link

Experimental comparison between graph database and relational database--infocamere case of Italian Chamber of Commerce

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.