Elasticsearch What is a document? Index a document _elasticsearch

Source: Internet
Author: User
Tags data structures lowercase unique id
What is a document.

Most entities or objects in a program can be serialized to a JSON object that contains a key-value pair, the key is the name of the field or property, and the value can be a string, a number, a Boolean type, another object, an array of values, or other special types. For example, a string representing a date or an object that represents a geographic location. 1

{
    "name":         "John Smith",
    "age": "
    confirmed":    true,
    "join_date": "    2014-06-01" ,
    "Home": {
        "lat":      51.5,
        "lon":      0.1
    },
    "accounts": [
        {
            ' type ': '] Facebook ",
            " id ":   " JohnSmith "
        },
        {
            " type ":" Twitter ",
            " id ":   " JohnSmith "
        }
    ]
}

In general, we can assume that objects (object) and documents (document) are equivalent. However, they are different: objects (object) are a JSON structure-similar to hashes, HashMap, dictionaries, or associative arrays, and other objects (object) may also be included in Objects (object). In Elasticsearch, the term documents (document) has a special meaning. It refers either to the topmost structure or to the JSON data serialized by the root object (root objects) (identified with a unique ID and stored in elasticsearch). 1

Document Meta Data

A document is not just data. It also contains meta data (metadata)-Information about the document. The three required metadata nodes are: Node description _index document store The _type document represents the object's class _id document unique identification _index

Index is similar to the "database" in a relational database-it is where we store and index the associated data.

Tips:

In fact, our data is stored and indexed in slices (shards), and the index is just a logical space in which one or more fragments are grouped together. However, this is just some internal detail-our program doesn't care about fragmentation at all. For our program, the document is stored in the index. The remaining details are cared for by elasticsearch both. 1

We'll explore how to create and manage indexes in the index management Section, but now we'll let Elasticsearch create an index for us. The only thing we need to do is select an index name. This name must be all lowercase, cannot begin with an underscore, cannot contain commas. Let's use website as the index name. 1

_type

In applications, we use objects to represent "things", such as a user, a blog, a comment, or an email. Each object belongs to a class, which defines the attribute or the data associated with the object. The object of the user class may contain name, gender, age, and email address. 1

In relational databases, we often store objects of the same class in a table, because they have the same structure. Similarly, in Elasticsearch, we use documents of the same type to represent the same "things" because their data structures are the same.

Each type has its own mapping (mapping) or structure definition, just like a column in a traditional database table. Documents under all types are stored under the same index, but type mappings (mapping) tell elasticsearch how different documents are indexed. We'll explore how to define and manage mappings in the maps section, but now we'll rely on Elasticsearch to automate the processing of data structures.

The _type name can be either uppercase or lowercase, and cannot contain underscores or commas. We will use the blog as the type name. _id

ID is just a string that, when combined with _index and _type, can uniquely identify a document in Elasticsearch. When creating a document, you can customize _id, or you can let Elasticsearch help you generate it automatically. Other meta data

There are other meta data that we'll explore in the map section. Using the elements mentioned above, we have already been able to store documents in Elasticsearch and retrieve them by ID--in exchange for speech, using Elasticsearch as a document memory.



Index a document

Documents are indexed through the index API-enabling data to be stored and searched. But first we need to decide where the document is. As we discussed, the document is uniquely identified through its _index, _type, _id. We can provide a _id ourselves, or use the index API to generate one for us. 1

Use your own ID

If your document has a natural identifier (such as a user_account field or other value representing the document), you can provide your own _id, using this form of the index API:

Put/{index}/{type}/{id}
{
  "field": "Value",
  ...
}

For example, our index is called "website", the type is called "blog", the ID we choose is "123", then the index request is like this:

put/website/blog/123
{
  "title": "My A-blog entry",
  "text":  "Just trying this out ...",
  "date":  "2014/01/01"
}

Elasticsearch's response:

{"
   _index": "    website", "
   _type":     "blog",
   "_id":       "123",
   "_version":  1,
   ' created ':   true
}

The response indicates that the requested index has been successfully created, which contains _index, _type and _id metadata, and a new element: _version. 2

Each document in the Elasticsearch has a version number, and the _version is incremented whenever the document changes (including deletion). In the versioning section we'll explore how to use the _version number to make sure that part of your program does not overwrite the changes made in the other part. Self-Increasing ID

If our data does not have a natural ID, we can let Elasticsearch automatically generate for us. The request structure has changed: The Put method-"Storing documents in this URL" becomes the Post Method-"Store documents under this type". (Translator Note: The original is to store the document to an ID corresponding to the space, now is to add this document to a _type).

The URL now contains only _index and _type two fields:

post/website/blog/
{
  "title": "My second blog entry",
  "text":  "still trying this out ...",
  " Date ":  " 2014/01/01 "
}

Response content similar to just now, only the _id field becomes the automatically generated value: 2

{"
   _index": "    website", "
   _type":     "blog",
   "_id":       "Wm0osfhdqxgzawdf0-drsa",
   "_version ":  1,
   " created ":   True
}

Automatically generated IDs are 22 characters in length, Url-safe, base64-encoded string universally unique identifiers, or UUIDs.


Retrieving documents

To get the document from the Elasticsearch, we use the same _index, _type, _id, but the HTTP method is changed to getting:

Get/website/blog/123?pretty

The response contains the now familiar metadata node, adding the _source field, which contains the original document we sent to Elasticsearch when the index was created.

{"
  _index": "   website", "
  _type":    "blog",
  "_id":      "123",
  "_version": 1,
  " Found ":    true,
  " _source ":  {
      " title ":" My A blog entry ",
      " text ":  " Just trying this Out ... ",
      " date ":  " 2014/01/01 "
  }
}
Pretty

Adding pretty parameters to any query string is similar to the example above. Allows Elasticsearch to beautify the output (pretty-print) JSON response to make it easier to read. The _source field will not be beautified, it looks the same as we entered.

The response returned by the GET request includes {"Found": true}. This means that the document has been found. If we ask for a document that doesn't exist, we still get a JSON, but the found value turns false.

In addition, the HTTP response status code also becomes ' 404 Not Found ' instead of ' OK '. We can get the response head after the curl-I parameter:

Curl-i-xget Http://localhost:9200/website/blog/124?pretty

The response is now similar to this:

http/1.1 404 Not Found
Content-type:application/json; charset=utf-8
content-length:83

{"
  _index": " Website ",
  " _type ":"  Blog ",
  " _id ":    " 124 ",
  " found ":  false
}
Retrieve part of a document

Typically, a GET request returns all of the document and is stored in the _source parameter. But maybe the field you're interested in is just title. Request an individual field to use the _source parameter. Multiple fields can be separated with commas:

Get/website/blog/123?_source=title,text

The _source field now contains only the fields we requested, and the date field is filtered:

{"
  _index":   "website", "
  _type":    "blog",
  "_id":      "123",
  "_version": 1,
  "exists":   true,
  "_source": {
      "title": "My I-blog entry",
      "text":  "Just trying this out ..."
  }< c14/>}

Or you just want to get _source fields instead of other meta data, you can ask:

Get/website/blog/123/_source

It simply returns:

{
   "title": "My I-blog entry",
   "text":  "Just trying this out ...",
   "date": "  2014/01/01"
}


Check if a document exists

If all you want to do is check if the document exists-you're not interested in the content-use the head method instead of get. The head request does not return the response body, only the HTTP headers:

Curl-i-xhead http://localhost:9200/website/blog/123

Elasticsearch will return to the OK status if your document exists:

http/1.1 OK
content-type:text/plain; charset=utf-8
content-length:0

If there is no return 404 Not Found:

Curl-i-xhead http://localhost:9200/website/blog/124
http/1.1 404 Not Found
content-type:text/plain; charset=utf-8
content-length:0

Of course, this only means that the moment you are querying the document does not exist, but does not mean that a few milliseconds later still does not exist. Another process may create a new document during this period.


Update the entire document

Documents are immutable in Elasticsearch-we cannot modify them. If you need to update an existing document, we can rebuild the index (REINDEX) or replace it with the index API mentioned in the indexed Documents section.

put/website/blog/123
{
  "title": "My A Entry",
  "text":  "I am starting to get the hang of this ..." ,
  "date":  "2014/01/02"
}

In response, we can see that the elasticsearch has increased the _version. 1

{"
  _index": "   website", "
  _type":    "blog",
  "_id":      "123",
  "_version": 2,
  " Created ":   false <1>
}
<1> created is identified as false because a document with the same ID already exists under the same index or similar type.

Internally, Elasticsearch has marked the old document for deletion and added a complete new document. The old version of the document does not disappear immediately, but you cannot access it. Elasticsearch will clean the deleted document as you continue to index more data.

Later in this chapter, we will explore the update API in local update. This API seems to allow you to modify parts of the document, but in fact Elasticsearch follows the exact same process as before: Retrieving JSON from the old document modify it delete old document index new document

The only difference is that the update API completes this process by requiring only one client request and no more get and index requests.


From:https://es.xiaoleilu.com/030_data/05_document.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.