Neo4j getting started (iv): Cypher Query Optimization

Source: Internet
Author: User
Tags neo4j

Neo4j getting started (iv): Cypher Query Optimization

First of all, it's still the same, clear all the content in the current database, clean up and start learning a new chapter.

  1. Match (n)-[r]-(n1)
  2. Delete r, n, n1
  3. Match (n)
  4. Delete n
Next, use the characters and contacts in the second blog:
  1. CREATE (bradley: MALE: TEACHER {name: 'bradley ', surname: 'green', age: 24, country: 'us '})
  2. CREATE (matthew: MALE: STUDENT {name: 'Matthew ', surname: 'cooper', age: 36, country: 'use '})
  3. CREATE (lisa: FEMALE {name: 'lisa ', surname: 'adams', age: 15, country: 'Canada '})
  4. CREATE (john: MALE {name: 'john', surname: 'goodman ', age: 24, country: 'Mexico '})
  5. CREATE (annie: FEMALE {name: 'Annie ', surname: 'behr', age: 25, country: 'Canada '})
  6. CREATE (ripley: MALE {name: 'ripley ', surname: 'aniston', country: 'use '})
  7. MATCH (bradley: MALE {name: "Bradley"}), (matthew: MALE {name: "Matthew"}) WITH bradley, matthew CREATE (bradley )-[: FRIEND]-> (matthew), (bradley)-[: TEACHES]-> (matthew );
  8. MATCH (bradley: MALE {name: "Bradley"}), (matthew: MALE {name: "Matthew"}) WITH bradley, matthew CREATE (matthew )-[: FRIEND]-> (bradley );
  9. MATCH (bradley: MALE {name: "Bradley"}), (lisa: FEMALE {name: "Lisa"}) WITH bradley, lisa CREATE (bradley )-[: FRIEND]-> (lisa );
  10. MATCH (lisa: FEMALE {name: "Lisa"}), (john: MALE {name: "John"}) WITH lisa, john CREATE (lisa )-[: FRIEND]-> (john );
  11. MATCH (annie: FEMALE {name: "Annie"}), (ripley: MALE {name: "Ripley"}) WITH annie, ripley CREATE (annie )-[: FRIEND]-> (ripley );
  12. MATCH (ripley: MALE {name: "Ripley"}), (lisa: FEMALE {name: "Lisa"}) WITH ripley, lisa CREATE (ripley )-[: FRIEND]-> (lisa );


I. Index
Neo4j2.0 introduces indexes based on tags to restrict and index tags. This method facilitates data integrity check and Cypher optimization. The last blog post introduced restrictions. This article focuses on the usage and functions of indexes.
Neo4j indexes are similar to those defined by other RDBMS and are mainly used to improve node search performance. Indexes are automatically updated for any changes to the existing data structure. If an error occurs and the index is invalid, you need to make the error and generate them again.
The Cypher query will automatically use the index. Cypher has a query scheduler and query optimizer. You can evaluate the query and try to select the shortest execution time based on the selected index.
The process of creating an index is not complex:
First, use the following statement to create an index for the label MALE and attribute name:
  1. Create index on: MALE (name)
Verify that you have created the schema ls command in the neo4j-shell. However, if I use this command on the Web interface, an error is returned.
To delete an index, run the following command:
  1. Drop index on: MALE (name)
Once an index is created, when an attribute with an index appears in the where clause, whether it is a simple equivalent comparison or other conditions, the index is automatically used. However, there is also an explicit way to use the specified index, that is, the using clause. For example:
  1. Create index on: MALE (name)
  2. Match (n: MALE)
  3. Using index n: MALE (name)
  4. Where n. name = "Matthew"
  5. Return n
As shown in the result, it is a pity that the query time is not visible in non-terminal. Although the current learning and testing are completed under Win 10, you can also use Cypher-shell prepared by windows, refer to another article titled Neo4j getting started (V): Windows Shell for Cypher.

Note: We can also use the using clause in a single Query and provide multiple index items to provide index tips for Cypher Query Optimizer, you can also use scan to scan all labels for Cypher Query Planner and then perform subsequent filtering. The results of this practice mean excellent performance, after all, you do not have to consider unnecessary data when using tags. For example:
  1. Match (n: MALE)
  2. Using scan n: MALE
  3. Where n. name = "Matthew"
  4. Return n
Although the results of the above queries are the same, the performance will be better. Note that MALE is used for scan, not MALE (name) for index, which must be vigilant.


Ii. Index Sampling
It is not very good at translation, so you should use English phrases directly.
In fact, the first step for all Cypher queries is to develop an effective execution plan. Although this plan was created by Neo4j, before it was created, the system needs to know important information such as the number of nodes and connections contained in the current database, index, and index. This information will help Neo4j design an efficient and effective execution plan, so that our query requests can be responded faster. The next section describes the execution plan process in detail, but one of the steps for an effective execution plan is Index Sampling. In fact, Index Sampling is the whole process in which we often analyze and Sample Indexes to ensure that Index statistics are updated, as well as to add, delete, and modify data in the database and change the corresponding indexes.
We can enable the following attributes of the Neo4j database to implement automatic index sampling (the file neo4j. properties in Linux is not found in windows ):
  • Index_background_sampling_enabled: This boolean property value is set to False by default. Change it to True to enable automatic sampling.
  • Index_sampling_update_percentage: defines the percentage threshold value that needs to be changed before triggering sampling.
Of course, you can also enter the following command in the terminal to manually enable it:
  • Schema sample-a: triggers sampling on all indexes.
  • Schema sample-l MALE-p name: only the sampling of the index defined by TAG (l) and attribute (p) is triggered.

3. Understand the execution plan
For the generation of execution plans for all queries, Neo4j uses a Cost-Based Optimizer (CBO) for precise execution. You can understand the internal working mechanism in two different ways:
  • EXPLAIN: it is an interpretation mechanism. The Cypher statement with this keyword can preview the execution process but it is not actually executed, so no results will be produced.
  • PROFILE: This is a portrait mechanism. This keyword is used in a query to view not only the details of the Execution Plan, but also the execution results of the query.
Example:
  1. Profile match (n)
  2. Where n. name = 'Annie'
  3. Return n
As shown in:

After you click expand, it is as follows:

In Shell mode, I also found that the description and profile are not as informative as the Web interface. I don't know if it is a version issue or a parameter setting problem:



4. Analyze and optimize queries
We still use the example of Annie. As we can see, the original query is not efficient, because AllNodesScan needs to match all nodes one by one. However, do not forget that the tag system introduced by Cypher is not obvious. Try to add tags to the example just now:
  1. Profile match (n: FEMALE)
  2. Where n. name = 'Annie'
  3. Return n
The result is as follows:

As you can see, the optimizer is changed from AllNodesScan to NodeByLabelScan, And the filtering process is also reduced from 6-1-1 to the current 2-1-1. But this is not over yet, and further optimization can be made. Right, index is used.
  1. Create index on: FEMALE (name)
The rest remains unchanged, but an index is created in advance. Now the sub-optimizer can use NodeIndexSeek. The entire process is changed from 2-1-1 to 1-1, that is to say, there is no unnecessary extra effort to hit!
For specific optimization parameters, see estimated rows and db hits. The smaller the values, the better. The former refers to the estimated value of the number of rows to be scanned, and the latter is the hit (I/O) performance of the system's actual running results. In fact, there are no universal standards available in all directions (otherwise, developers will simply internalize these standards into Neo4j). Therefore, they also need to accumulate experience and often analyze and optimize Cypher execution.



Wuyue's Summit
May 29, 2017 (memphtan time)

Final draft in Dorsy Ave, Memphis, TN

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.