Environmental dependency:
python:2.7
ES-dependent Packages: Pyelasticsearch
elasticsearch:5.5.1/6.0.1
Operating system: Windows 10/centos 7
This article mainly on the ES basic crud operation to do to generalize, ES official to Python relies on the support to have many, Eg:pyelasticsearch, Esclient, Elasticutils, Pyes, Rawes, Surfiki refine and so on. Blogger in the work only involved in the Pyelasticsearch, so this article mainly on the reliance to do the explanation, other dependencies can be detailed in the official website.
Pyelasticsearch Dependency Pack installation command: Pip install Elasticsearch
Pyelasticsearch rely on the interface provided is not a lot, the following mainly from the single operation and bulk operation of the two categories for discussion and analysis. Single Operation
Insert
Create: You must specify the Idnex, type, ID, and query body to be queried.
Index: More flexible than create,index usage; ID is not a required option, and if specified, the ID of the document is the specified value, and if not specified, a globally unique ID is automatically generated to assign to the document.
eg
BODY = {"name": ' Lucy ', ' sex ': ' Female ', ' age ':}
es = Elasticsearch ([' localhost:9200 '])
es.index (index= ' IndexName ', doc_type= ' typeName ', Body, Id=none)
Remove
Delete: Delete document with specified index, type, ID
Es.delete (index= ' indexname ', doc_type= ' typeName ', id= ' idvalue ')
Find
Get: Gets the document that corresponds to the specified index, type, ID
Es.get (index= ' indexname ', doc_type= ' typeName ', id= ' idvalue ')
Update
Update: Document corresponding to the new specified index, type, ID
Es.update (index= ' indexname ', doc_type= ' typeName ', id= ' idvalue ', body={to update fields})
Bulk Operations
Conditional Query
Search: Query All documents that meet the criteria, no id attribute, and index,type and body can be none.
The body's syntax format must conform to the DSL (Domain specific Language) format
query = {' query ': {' Match_all ': {}}}# Find all documents
query = {' query ': {' term ': {' name ': ' Jack '}}}# find all documents named Jack,
query = {' query ': {' range ': {' age ': {' GT ': 11}}}}# Find all documents older than 11
Alldoc = Es.search (index= ' indexname ', doc_type= ' TypeName ', body=query
print alldoc[' hits ' [' Hits '][0]# returns the contents of the first document
Conditional Deletion
Delete_by_query: Delete all data that satisfies the condition, the query condition must conform to the DLS format
query = {' query ': {' match ': {' sex ': ' famale '}}}# delete all documents of sex for women
query = {' query ': {' range ': {' age ': {' lt ': 11}}}}# delete a young All documents in 11
es.delete_by_query (index= ' IndexName ', body=query, doc_type= ' TypeName ')
Conditional Update
Update_by_query: Update all the data that satisfies the condition, the same as delete and query
BULK INSERT, delete, update
Bulk: In this focus and everyone talk about bulk method, all the previous methods are very simple, but this bulk when the author began to contact, spent a lot of time; This method can perform multiple operations at the same time. Single request once, thus in bulk operation, can greatly reduce the program system overhead. In addition, bulk can not only perform inserts, or deletes, in batches at a time, but can insert, delete, and update operations in one request.
However, it should be noted that any operation has a fixed document format that succeeds only if it fully conforms to the format requirement. Nonsense not much to say, directly on the code:
doc = [{"index": {}}, {' name ': ' Jackaaa ', ' age ': +, ' sex ': ' Female ', ' address ': U '
Beijing '}, {"index": {}}, {' name ': ' jackbbb ', ' age ': 3000, ' sex ': ' Male ', ' address ': U ' Shanghai '}, {"index": {}}, {' name ': ' JACKCCC ', ' age ': 4000, ' sex ': ' Female ', ' address ': U ' guangzhou '}, {' index ': {}}, {' name ': ' jackddd ', ' age ': 1000, ' sex ': ' Male ', ' address ': U ' shenzhen '},] doc = [{' index ': {' _index ': ' IndexName ', ' _type ': ' TypeName ', ' _id ': ' IdV ' Alue '} {' name ': ' Jack ', ' sex ': ' Male ', ' age ': ' {' delete ': {' _index ': ' IndexName ', ' _type ': ' TypeName ', ' _id ': ' Idvalue '} {' create ': {' _index ': ' IndexName ', ' _type ': ' TypeName ', ' _id ': ' Idvalue '} {' name ': ' Lucy ', ' Sex ': ' Female ', ' age ': {' update ': {' _index ': ' IndexName ', ' _type ': ' TypeName ', ' _id ': ' Idvalue '} {' doc ': {' age ': ' M '}] es.bulk (index= ' indexname ', doc_type= ' typeName ', Body=doc)
Through the above two examples can be seen in the batch operation with bulk, for different types of operation, must be corresponding to the operation of a header information (eg:{"index": {}}, {' delete ': {...}}, ...} ), otherwise it will report Transporterror (U ' illegal_argument_exception ') error.
Here, in the actual process, many times will be here to the special batch of such a dictionary array. Suppose you have the following scenario:
If you want to bulk insert a batch of data, as in the first example above, it is easy to think of a workaround on the basis of an existing dataset: quickly implement the required dictionary array by means of the odd-even merge of the list. A python tip is recommended here: [:: 2] and [1::2] to implement a parity merge. Details can be described in my blog: Python programming tips.
A complete example of this article can be described in my GitHub