Elasticsearch is a distributed document storage engine. It can store and retrieve complex data structures in real-time-serialized JSON documents. In other terms, once the document is stored in Elasticsearch, it can be retrieved on any node of the cluster.
Of course, we not only need to store data, but also to quickly bulk query. While there are already many NoSQL solutions that allow us to store objects as documents, they still need to consider how to query the data and which fields need to be indexed for faster retrieval.
most entities or objects in a program can be serialized as JSON objects that contain key-value pairs.keys (key)is afields (field)orAttribute (property)'s name,values (value)can be a string, a number, a Bohr type, another object, an array of values, or other special types, such as a string that represents a date or an object that represents a geographic location.
document Meta-data MetaData):
A document is not just data. It also contains metadata (metadata)-information about the document. The three required meta data nodes are:
node |
Description |
_index |
Where the document is stored |
_type |
The class of the object that the document represents |
_id |
Unique identification of the document |
_index
An index is similar to a "database" in a relational database-it's where we store and index associated data.
In fact, our data is stored and indexed in Shard (Shards) , an index is simply a logical space to group one or more shards together. However, this is just some internal detail-our program doesn't care about sharding at all. For our program, the document is stored in index . The rest of the details are cared for by Elasticsearch.
We'll continue to explore how to create and manage the index later, but for now we'll let Elasticsearch create an index for us. The only thing we need to do is choose an index name. The name must be all lowercase, cannot begin with an underscore, and cannot contain commas. Let's use it website
as the index name.
_type
In the app, we use objects to represent "things", such as a user, a blog, a comment, or an email. Each object belongs to a class, which defines the property or the data associated with the object. user
class may contain name, gender, age, and email address.
In relational databases, we often store objects of the same class in a table because they have the same structure. Similarly, in Elasticsearch, we use documents of the same type (types) to represent the same "things" because their data structures are the same.
Each type has its own mapping (mapping) or struct definition, just like a column in a traditional database table. Documents under all types are stored under the same index, but the type mapping (mapping) tells Elasticsearch how different documents are indexed. We will explore how to define and manage mappings in the maps section, but now we will rely on Elasticsearch to automate the processing of data structures.
_type
The name can be uppercase or lowercase, and cannot contain an underscore or a comma. We will use this blog
as the type name.
_id
The ID is just a string that, _index
_type
when combined with and, uniquely identifies a document in Elasticsearch. When creating a document, you can customize it _id
or let Elasticsearch help you generate it automatically.
PS: There are other parts of other meta-data, follow-up re-introduction.
Use your own ID
If your document has a natural identifier (such as a user_account
field or other value representing a document), you can provide your own _id
, using this form of index
API:
PUT /{index}/{type}/{id}
{"key": "value"...}
如,PUT /website/blog/123
{
"title": "My blog entry",
"Text": "Chinese You can." ",
"Date": "2015/07/16"
}
{
"_index": "Website",
"_type": "Blog",
"_id": "123",
"_version": 5,
"Created": false
}
each document in the Elasticsearch has a version number, and each time the document changes (including deletion) _version
increased. Later we will explore how to use _version
numbers to make sure that part of your program does not overwrite changes made by the other part.
Self-Increment ID
If our data does not have a natural ID, we can let elasticsearch automatically generate it for us. The request structure has changed: PUT
the Method---- “在这个URL中存储文档”
becomes the POST
method "在这个文档下存储文档"
. (Note: The original is to save the document to an ID corresponding to the space, it is now to add this document to the _type
next).
The URL now contains only _index
and _type
two fields:
POST /website/blog/{ "title""My second blog entry", "text": "Still trying this out...", "date": "2015/07/16"}
The response is similar to what just happened, and only the _id
fields become automatically generated values:
{ "_index": "website", "_type": "blog", "_id": "AU6Vi9GsUzILmCnC2hkX", "_version": 1, "created": true}
Update entire document
Documents are immutable in Elasticsearch-we cannot modify them. If you need to update a document that already exists, we can use the API mentioned in the index document to index
rebuild the index (REINDEX) or replace it.
PUT /website/blog/123{ "title""My first blog entry", "text": "I am starting to get the hang of this...", "date": "2014/01/02"}
In response, we can see that the Elasticsearch _version
has increased.
{ "_index" : "website", "_type" : "blog", "_id" : "123", "_version"2, "created": false <1>}
- <1>
created
is identified as a false
document that already has the same ID as the index and the same type.
Internally, Elasticsearch has marked the old document for deletion and added a complete new document. The old version of the document will not disappear immediately, but you will not be able to access it. Elasticsearch will clean up the deleted document as you continue to index more data.
In the following discussion of the update
API, this API seems to allow you to modify the local parts of the document, but in fact Elasticsearch follows the exact same process as previously said, the process is as follows:
- Retrieving JSON from an old document
- Modify it
- Delete old documents
- Index New Document
The only difference is that the update
API finishes this process with just one client request, which is no longer needed get
and index
requested.
Delete a document
The syntax pattern for deleting a document is basically the same as before, except that you want to use the DELETE
method:
DELETE /website/blog/1234
If the document is found, Elasticsearch returns the 200 OK
status code and the following response body. Note that _version
the number has been increased.
{ "found" : true, "_index" : "website", "_type" : "blog", "_id" : "1234", "_version"3}
If the document is not found, we will get a 404 Not Found
status code, the response body is this:
{ "found" : false, "_index" : "website", "_type" : "blog", "_id" : "1234", "_version"4}
Although the document does not exist-the value of "found" is false--_version still increased. This is part of the internal record, which ensures that different operations can be in the correct order between multiple nodes. Deleting a document is not immediately removed from the disk, it is only marked as deleted. Elasticsearch will delete content cleanup in the background when you add more indexes later.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Elasticsearch How to add, retrieve data