Introduction to the use of Elastic Stack-elasticsearch (ii)

Source: Internet
Author: User
Tags documentation null null throw exception idf

First, preface

Write a blog, but also strive to write blog!

Ii. Introduction of mapping

Mapping is similar to the definition of a table structure in a database: here we imagine that the table structure definition requires those:

1. Fields and field types, reflected in the elasticsearch is the structure of the index, the fields of the definition of the Index field name and field type, the previous article has a brief description of the fields have those types;

2. Index, in the database we can define the field index, in the Elasticsearch is equivalent to whether participle, according to word breaker participle;

First, use our artifact to customize a mapping first:

Next in the query under mapping structure:

Three , mapping commonly used parameters introduction

Type: Specifies the types of parameters;

Analyzer: Specify the word breaker;

Boost: Specifies the weight of the field,

Copy_to: Specify a few fields to merge;

Dynamic: field is dynamically added, there are 3 kinds of values:

true: No Limit;

False: Data can be written but the field is not retained;

Strict: Cannot write throw exception;

Format: "Yyyy-mm-dd hh:mm:ss| | yyyy-mm-dd| | Epoch_millis ", format this parameter to represent an acceptable time format of 3 kinds are accepted;

Ignore_options: This option controls the contents of the inverted index record, with 4 configurations:

Docs: Only document numbers are recorded;

Freqs: Document number + word frequency;

Postions: Document number + Word frequency + location;

Offsets: Document number + Word frequency + position + offset;

Index: Specifies whether a field is indexed;

Fileds: Multiple indexing modes can be provided for a field;

Null_value: When a field encounters a null value, the default is a null null value, and ES ignores the value, which can be changed by setting the value to set the default value of the field to change NULL to not show null values:

Properties: nested attributes;

Search_analyzer: Query the word breaker;

Similarity: For specifying a document scoring model, there are 2 configurations:

The default TF/IDF algorithm used by Default:elasticsearch and Lucene;

Bm25:okapi BM25 algorithm;

Basically commonly used is these, there is no introduction to everyone can refer to the official documents;

Iv. data types for fields

on the previous article introduced some simple data types in the official known as the core data type, here do not do too much introduction, here is mainly about the complex data types, geographic data types, specialized data types of the 3;

Complex data types

1. Array data type (array datatype)

There is no specific array type in Elasticsearch, and by default each field can store 0 or more values, but these must be the same value type;

2. Object data type (datatype)

When the document is submitted is a JSON document, the internal fields can be nested JSON objects;

3. Nested data types (Nested datatype)

The nested data type is a combination of 1+2, and the array contains JSON;

Geographic data types

1. Geographic data type (Geo-point datatype)

Query search for latitude and longitude;

2. Shape data type (Geo-shape datatype)

A query on the shape of a polygon;

Private data types

1.IP Data types

2. Complete data type

Provide Search-as-you-type search, this is still more useful for everyone can refer to the official documents;

The rest of the use is not much, we can refer to the official documents, here do not do too much introduction;

V. Search API Introduction

Search API Implementation of the data stored in the Elasticsearch query analysis, through the _search way to query, there are basically the following 4 kinds of situations:

1. Querying for all data in Elasticsearch without specifying an index;

2. Specify that index is the index query for the individual;

3. You can also specify multiple indexes;

4. You can also match the index in the form of a wildcard character;

Search API queries in two main ways: URL search and request Body Search, respectively, the following two scenarios are described below:

URL Search:

Specify the query parameter through the URL to implement the search, we first to a demo, and then introduce the following common parameters:

First add some data:

Next to the query

This completes the ES query, and then we will talk about common parameters:

Q: Specify the parameters of the query, that is, the contents of the document we are searching for;

DF: Specifies the field specified by our query document;

Sort: Specifies the sorted field;

Form: The beginning of the first few;

Size: a few pages;

The rest of us can check the official documentation,

Request Body Search

This is the main introduction, but also we often use. Mainly through the HTTP Requset body to send a JSON request to the elasticsearch stored in the query analysis, there is a more professional noun query DSL, there are two types of field query and compound query, and then we introduce the two kinds of query:

Field query:

Field query can be divided into two categories: full-text matching and word matching;

full-text match :

Mainly on the text type of the field for full-text retrieval, the query of the statement first participle, such as match, Match_pharse and so on;

Match Query

This is what we often use, and then we use the artifact to show you how to use the following;

First look at what fields are in person:

Next look at how to use:

Then look at the type of match later, because the match to too many, we show the way to display code:

{    "Took": 12,    "Timed_out":false,    "_shards": {        "Total": 5,        "Successful": 5,        "Skipped": 0,        "Failed": 0    },    "Hits": {//total number of matching documents"Total": 4,//Total"Max_score": 0.80259144,//Best Match score"Hits": [//returns the total number of documents            {                "_index": "Person",                "_type": "Doc",                "_id": "1",                "_score": 0.80259144,//Document Score"_source": {                    "Name": "WW",                    "Age": 18                }            },            {                "_index": "Person",                "_type": "Doc",                "_id": "7",                "_score": 0.5377023,                "_source": {                    "Name": "WWCC Waa",                    "Age": 29                }            },            {                "_index": "Person",                "_type": "Doc",                "_id": "8",                "_score": 0.32716757,                "_source": {                    "Name": "WW Waa",                    "Age": 29                }            },            {                "_index": "Person",                "_type": "Doc",                "_id": "9",                "_score": 0.32716757,                "_source": {                    "Name": "WW Waa",                    "Age": 29                }            }        ]    }}
View Code

After reading the document returned above, you can do the following to speculate on the query process is the content of our query and then query the document matching participle, because the document does not exist in the name equals Waa this situation, so did not appear, if you do not believe you can fill this document and then look at the results of the process, such as:

Can be set by setting the operator parameter to control the matching relationship between the word breaker, by default, or, can be set to and, when and when the document type must also appear in the query of the word;

It is also possible to set the number of matching parameters by Minimum_should_match parameter;

In addition, we have a focus on the score, we said that there are 2 models of document scoring: TF/IDF and BM25, but before we introduce this we need to know 4 concepts:

1.Term Frequency (TF): Word frequency, this should not be strange in the previous article introduced the inverted index principle of the introduction of this concept;

2.Document Frequency (DF): Document frequency, the frequency with which words appear in the document;

3.Inverse document Frequency (IDF): reverse documentation frequency, contrary to document frequency, can be understood as 1/DF. That is, the fewer documents the word appears in, the higher the correlation.

4.field-length Norm: The shorter the document, the higher the correlation;

TF/IDF Model:

BM25 model

We recommend that you look at the TF/IDF scoring algorithm, in addition to the explain parameters to see the calculation method of the score;

Match_phrase Query

The fields of the query must be included, and the order cannot be chaotic;

{    "Took": 5,    "Timed_out":false,    "_shards": {        "Total": 5,        "Successful": 5,        "Skipped": 0,        "Failed": 0    },    "Hits": {        "Total": 3,        "Max_score": 0.6135754,        "Hits": [            {                "_index": "Test",                "_type": "Doc",                "_id": "8",                "_score": 0.6135754,                "_source": {                    "Name": "WW Waa",                    "Age": 25                }            },            {                "_index": "Test",                "_type": "Doc",                "_id": "3",                "_score": 0.51623213,                "_source": {                    "Name": "WW Waa",                    "Age": 25                }            },            {                "_index": "Test",                "_type": "Doc",                "_id": "9",                "_score": 0.50104797,                "_source": {                    "Name": "WW waa Sdsfds",                    "Age": 25                }            }        ]    }}
View Code

Query_string Query

The syntax of query_string and Lucene query statements is very rigorous, allowing multiple fields to be queried using several special conditional keywords in the query, with a simple look at the usage:

You can specify a column by default:

Specify multiple column queries

Simple_string Query

simple_query_string The query never throws an exception and discards the invalid part of the query, using the +, |,-to replace and, or, not, and so on;

We often use match and match_phrase, the rest of me feel I am not using much, we can see themselves;

Word matching:

query statement as the whole word does not make participle, mainly have term, terms, range, prefix and so on;

term and terms query

To query a single word:

You can pass in multiple words at a time to query

Range Query

is mainly used to match a range of values, such as date, numeric type, and so on, we have an age type we have to according to this query under the 20-30-year-old people:

Prefix query

Querying a field for a document that begins with a given prefix, such as a user field that we want to query with Ki;

More commonly used is basically this kind of, the rest can be consulted official documents;

Compound query (Compound queries):

Compound query is to combine multiple queries together or change the query behavior, more commonly used is constant score query and bool query, the rest of you can view the official documents, according to their own needs to choose what they want;

Constant_score Query

Within a query, the internal wrapper will return a result in which each document is set to the same score;

{    "Took": 4,    "Timed_out":false,    "_shards": {        "Total": 5,        "Successful": 5,        "Skipped": 0,        "Failed": 0    },    "Hits": {        "Total": 3,        "Max_score": 1.2,        "Hits": [            {                "_index": "Test",                "_type": "Doc",                "_id": "8",                "_score": 1.2,                "_source": {                    "Name": "WW Waa",                    "Age": 25                }            },            {                "_index": "Test",                "_type": "Doc",                "_id": "9",                "_score": 1.2,                "_source": {                    "Name": "WW waa Sdsfds",                    "Age": 25                }            },            {                "_index": "Test",                "_type": "Doc",                "_id": "3",                "_score": 1.2,                "_source": {                    "Name": "WW Waa",                    "Age": 25                }            }        ]    }}
View Code

BOOL Query

Consists of one or more Boolean clauses, consisting mainly of the following 4 types:

Use the following:

{    "Took": 6,    "Timed_out":false,    "_shards": {        "Total": 5,        "Successful": 5,        "Skipped": 0,        "Failed": 0    },    "Hits": {        "Total": 3,        "Max_score": 1.6135753,        "Hits": [            {                "_index": "Test",                "_type": "Doc",                "_id": "8",                "_score": 1.6135753,                "_source": {                    "Name": "WW Waa",                    "Age": 25                }            },            {                "_index": "Test",                "_type": "Doc",                "_id": "3",                "_score": 1.5162321,                "_source": {                    "Name": "WW Waa",                    "Age": 25                }            },            {                "_index": "Test",                "_type": "Doc",                "_id": "9",                "_score": 1.501048,                "_source": {                    "Name": "WW waa Sdsfds",                    "Age": 25                }            }        ]    }}
View Code

Introducing the Minimum_should_match parameter, when only should is present, the document must meet the number of criteria to be met, while containing should and must, the document does not meet the conditions in should, but if the condition is met it increases the relevance score:

In the emphasis Point filter query only filter the compound condition of the document, does not carry out correlation calculation points;

Vi. End of

This article wrote for a long time, the next one to have a good chat about the search mechanism, Welcome to add group 438836709, welcome to pay attention to my public number:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.