Distributed search elasticsearch source code analysis 2-Brief Analysis of index process source code

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Elasticsearch provides a simple analysis of the index logic. Here we will only clarify the main context, and some details will be elaborated in future articles. If you call the elasticsearch index interface through Java APIs, you first construct a JSON string (represented as xcontent in ES, which is an abstraction of the content to be processed ), in indexrequest, specify the index to which the document will be indexed, its type, and the Document ID. If no Document ID is specified, es automatically generates a uuid using the uuid tool, and the code is in the Process Method of indexrequest.

       if (allowIdGeneration) {            if (id == null) {                id(UUID.randomBase64UUID());                opType(IndexRequest.OpType.CREATE);            }        }

Then, use the transportservice that encapsulates netty to send requests to the es server over TCP (rest is through HTTP ). The server obtains the transportaction and then parses the INDEX request (transportshardreplicationoperationaction ). To asyncshardoperationaction. start () method to start partitioning. First, read the cluster status, extract the target index and its shard information, and perform hash modulo Based on the index data ID, type, and index shard information, are you sure you want to allocate the data to that shard.

   private int shardId(ClusterState clusterState, String index, String type, @Nullable String id, @Nullable String routing) {        if (routing == null) {            if (!useType) {                return Math.abs(hash(id) % indexMetaData(clusterState, index).numberOfShards());            } else {                return Math.abs(hash(type, id) % indexMetaData(clusterState, index).numberOfShards());            }        }        return Math.abs(hash(routing) % indexMetaData(clusterState, index).numberOfShards());    }

Find the primary shard for which the data is to be allocated, and submit the INDEX request to the primary shard for processing (transportindexaction. shardoperationonprimary ). Determine whether the routing value must be specified

      MappingMetaData mappingMd = clusterState.metaData().index(request.index()).mappingOrDefault(request.type());        if (mappingMd != null && mappingMd.routing().required()) {            if (request.routing() == null) {                throw new RoutingMissingException(request.index(), request.type(), request.id());            }        }

To determine the index operation type, there are two types of index operations. One is index. When the Document ID to be indexed already exists, the original document is not overwritten, but the original document is updated. One is create. When the index Document ID exists, an existing error is thrown.

if (request.opType() == IndexRequest.OpType.INDEX)

Call internalindexshard to perform index operations

            Engine.Index index = indexShard.prepareIndex(sourceToParse)                    .version(request.version())                    .versionType(request.versionType())                    .origin(Engine.Operation.Origin.PRIMARY);            indexShard.index(index);

You can use (internalindexshard) to find mapping that matches the Data Type of the Request index. Parse the JSON string to be indexed and convert it to the corresponding parsing result parseddocument according to mapping.

    public Engine.Index prepareIndex(SourceToParse source) throws ElasticSearchException {        long startTime = System.nanoTime();        DocumentMapper docMapper = mapperService.documentMapperWithAutoCreate(source.type());        ParsedDocument doc = docMapper.parse(source);        return new Engine.Index(docMapper, docMapper.uidMapper().term(doc.uid()), doc).startTime(startTime);    }

Finally, call the related methods (add or modify) in the external engine to perform operations on the underlying Lucene, which is written to the memory index of Lucene (external engine. innerindex ).

         if (currentVersion == -1) {                // document does not exists, we can optimize for create                if (index.docs().size() > 1) {                    writer.addDocuments(index.docs(), index.analyzer());                } else {                    writer.addDocument(index.docs().get(0), index.analyzer());                }            } else {                if (index.docs().size() > 1) {                    writer.updateDocuments(index.uid(), index.docs(), index.analyzer());                } else {                    writer.updateDocument(index.uid(), index.docs().get(0), index.analyzer());                }            }

After the memory index is written, it is also written into translog (translog is an index operation log that records no persistent operations) to prevent index data loss caused by power failure before flush.

Translog.Location translogLocation = translog.add(new Translog.Create(create));

After the primary shard INDEX request is complete, the request is sent to the replica for indexing. Finally, return the successful information to the client.

Address: http://blog.csdn.net/laigood12345/article/details/8450331

References: http://www.searchtech.pro/articles/2013/02/15/1360941961206.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Distributed search elasticsearch source code analysis 2-Brief Analysis of index process source code

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Distributed search elasticsearch source code analysis 2-Brief Analysis of index process source code

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support