Splunk indexing process

Source: Internet
Author: User
Tags apache log

Terminology :

Event:events is records of activity in log files, stored in Splunk indexes. Simply put, the processing of the log or words Cantana a row of records is an event;
Source Type: Identifies the format of the data, simply stated, a particular format of the log, can be defined as a source Type;splunk by default provides more than 500 types to determine the format of data, including Apache log, logs of common OS, logs of network devices such as Cisco, etc.;
Index:the index is the repository for Splunk Enterprise data. Splunk transforms incoming data into events, which it stores in indexes. There are two layers of meaning: first, the expression of data physical storage, but also a data processing action expression: Splunk indexes your data, this process produces two types:
The raw data in compressed form (RAWDATA)
Indexes that point to the raw data, plus some metadata files (index files)
Indexer:an Indexer is a Splunk Enterprise instance that indexes data. Commonly said index concept, but also to Splunk "Indexer" This particular module of the title, is a splunk Enterprise Instance;
The two types of data stored by BUCKET:INDEX are different directories according to the age organization, called buckets;

responsibility-----Bye

Search Head: Front-end searches;
Deployment Server: Equivalent to the Configuration Management Center, the other nodes unified management;

Forwarder: Responsible for collecting, preprocessing, and forwarding data to indexer (consume and forward it in indexers), in conjunction with the mechanisms that make up flume-like agents and collector; actions include:
· Tagging of metadata (source, sourcetype, and host)
· Configurable buffering
· Data compression
· SSL Security
· Use of any available network ports
· Running scripted inputs locally

Note: Forwarders can transmit three types of data: raw, unresolved, resolved. The type of data the forwarder can send depends on the type of forwarder and how it is configured. Generic forwarders and light forwarders can send raw or unresolved

The data. A heavy forwarder can send raw or parsed data.

Indexer: is responsible for "indexing" the data processing, that is, indexing process, also known as an event processing, including:
· Separating the datastream into individual, searchable events. (branch)
· Creating or identifying timestamps. (Time stamp recognition)
· Extracting fields such as host, source, and SourceType. (External public field processing)
· Performing user-defined actions on the incoming data, such as identifying custom fields, masking sensitive data, writing n EW or modified keys, applying breaking rules for multi-line events, filtering unwanted events, and routing events to Speci fied indexes or servers.

Parts of an indexer cluster--distributed deployment

An indexer cluster are a group of Splunk Enterprise instances, or nodes, that, working in concert, provide a Redun Dant indexing and searching capability. Each cluster has three types of nodes:

    • A single master node to manage the cluster.
    • Several to many peer nodes to index and maintain multiple copies of the data and to search.
    • One or more search heads to coordinate searches across the set of peer nodes.

The Master node manages the cluster. IT coordinates the replicating activities of the peer nodes and tells the search head where to find data. It also helps manage the configuration of the peer nodes and orchestrates remedial activities if a peer goes down.

The peer nodes receive and index incoming data, just like non-clustered, stand-alone indexers. Unlike stand-alone indexers, however, peer nodes also replicate data from and nodes in the cluster. A Peer node can index its own incoming data and simultaneously storing copies of data from the other nodes. You must has at least as many peer nodes as the replication factor. That's, to-support a replication factor of 3, you need a minimum of three peer nodes.

The search head runs searches across the set of peer nodes. You must use a search head to manage searches across indexer clusters. --Send the search request to the indexer node and merge the search request

For most purposes, it's recommended that and forwarders to get data into the cluster.

Here is a diagram of a basic, Single-site indexer cluster, containing three peer nodes and supporting a replication factor of 3:

This diagram shows a simple deployment, similar to a small-scale non-clustered deployment, with some forwarders sending lo ad-balanced data to a group of indexers (peer nodes), and the indexers sending search results to a search head. There is additions that's you don ' t find in a non-clustered deployment:

    • The indexers is streaming copies of their data to other indexers.
    • The master node, while it doesn ' t participate in any data streaming, coordinates a range of activities involving the SEARC H peers and the search head.
How Indexing Works

Splunk Enterprise can index any type of time-series data (data with timestamps). When Splunk Enterprise indexes data, it breaks it to events, based on the timestamps.

Event Processing

Event processing occurs in the stages, parsing and indexing. All data this comes into Splunk Enterprise enters through the parsing pipeline as large (bytes) chunks. During parsing, Splunk Enterprise breaks these chunks into events which it hands off to the indexing pipeline, WH Ere final processing occurs.

While parsing, Splunk Enterprise performs a number of actions, including:

    • Extracting a set of default fields for each event, including host , source and sourcetype .
    • configuring character set encoding.
    • Identifying line termination using linebreaking rules. While many events is short and only take up a line or both, others can be a long.
    • Identifying timestamps or creating them if they don ' t exist. At the same time, it processes timestamps, Splunk identifies event boundaries.
    • Splunk can is set to mask sensitive event data (such as credit card or social Security numbers) at the this stage. It can also is configured toapply custom metadata to incoming events.

In the indexing pipeline, Splunk Enterprise performs additional processing, including:

    • Breaking all events into segments so can then be searched upon. You can determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of disk compression.
    • Building the index data structures.
    • Writing the raw data and index files to disk, where post-indexing compression occurs.

The breakdown between parsing and indexing pipelines are of relevance mainly when deploying forwarders. Heavy Forwarders can parse data and then forward the parsed data in to indexers for final indexing. Some source types-those that reference structured data-require configuration on the forwarder prior to indexing. See "Extract Data from files with headers".

For more information on events and what happens to them during the indexing process, see the chapter "Configure event P Rocessing "in the Getting Data in Manual.

Note: Indexing is an i/o-intensive process.

This diagram shows the main processes inherent in indexing:

Note: This diagram represents a simplified view of the indexing architecture. IT provides a functional view of the architecture and does not fully describe Splunk Enterprise internals. In particular, the parsing pipeline actually consists of three pipelines: parsing, merging, and Typi Ng, which together handle the parsing function. The distinction can matter during troubleshooting, but does isn't generally affect how do you configure or deploy Splunk Enterp Rise.

How indexer Acknowledgment works

In brief, indexer acknowledgment works like This:the forwarder sends data continuously to the receiving peer, in blocks O F approximately 64kB. The forwarder maintains a copy of each of the block in memory until it gets a acknowledgment from the peer. While waiting, it continues to send more data blocks.

If all goes well, the receiving peer:

1. receives the block of data, parses and indexes it, and writes the data (raw data and index data) to the file s Ystem.

2. streams copies of the raw data to each of its target peers.

3. sends an acknowledgment back to the forwarder.

The acknowledgment assures the forwarder that the data is successfully written to the cluster. Upon receiving the acknowledgment, the forwarder releases the block from memory.

If The forwarder does not receive the acknowledgment, then means there was a failure along the the. Either the receiving peer went down or that peer is unable to contact its set of target peers. The forwarder then automatically resends the block of data. If The forwarder is a using load-balancing, it sends the block to another receiving node in the load-balanced group. If The forwarder is not set up for load-balancing, it attempts to resend data to the same node as before.

Important: To ensure end-to-end data fidelity, you must explicitly enable indexer acknowledgment for each forwarder that ' s sending da TA to the cluster, as described earlier in this topic. If end-to-end Data fidelity is not a requirement for your deployment, you can skip this step.

For more information about indexer acknowledgment works, read "Protect against loss of in-flight data" in the Forward ing Data Manual.

Splunk indexing process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.