MongoDB index concepts and usage

Source: Internet
Author: User
Tags install mongodb

Index, which allows you to quickly access specific information in a database table. An index is a structure that sorts the values of one or more columns in a database table, such as the name column of the employee table. If you want to search for a specific employee by name, the index will help you get the information faster than all rows in the table that must be searched.

------------------------------------------------------------------

MongoDB details: click here
MongoDB: click here

MongoDB backup and recovery

CentOS compilation and installation of MongoDB

CentOS compilation and installation of php extensions for MongoDB and mongoDB

CentOS 6 install MongoDB and server configuration using yum

Install MongoDB2.4.3 in Ubuntu 13.04

How to create a new database and set in MongoDB

MongoDB beginners must read (both concepts and practices)

MongoDB authoritative Guide (The Definitive Guide) in English [PDF]

------------------------------------------------------------------

Advantages of indexes:

You do not need to perform full table scan. You only need to scan index indexes to store only a small part of the data in this table. This small part can help you achieve quick query, therefore, only this small part can be scanned during scanning. If this small part is loaded into the memory, the speed will be faster.

· Greatly reduces the amount of data to be scanned by the server

· Indexing helps the server avoid sorting or using temporary tables

· The index can convert random I/O to sequential I/O.

 

Disadvantages of indexing:

Indexes store a small part of data in a data table. Therefore, the data needs to be stored additionally. If the data in the table is updated, the index data to be responded must be updated, the search operation is accelerated, but it is still to be evaluated whether the write speed is useful for the search acceleration, for example, if we create an index by age in a table (create an index by age), most of the operations are usually performed by name, so the index has no effect, the so-called index must be exactly matched with the search creation to make sense, but we need to know that most searches may not only be executed on finite fields, which means that the index creation must contain multiple segments, you need to see how the index is generated. You can use the index as a composite index for multiple conditions. Therefore, the index design is very skillful.

The index itself may not be an advantage. If a table has many indexes, the impact on the overall system performance may be very large. If a table has only a dozen rows, creating an index slows down because full table scan does not take long.

However, if the table is very large, the index is very useful. If the data volume is too large, the index may not be meaningful. For example, if a table is very large and has T data, you can imagine what indexes you can create. Therefore, you can only cut large tables into small tables and distribute them on different physical nodes. For mysql, partitions are called; for mongodb, shaerd is called.

 

Index level:

Top 3-star Index

1 star: indexing can place relevant records together, greatly reducing I/O

2 stars: The storage sequence of the data in the index is the same as that in the search criteria (as long as it is well designed)

3 stars: if the index contains all the data required for the query (covering the index)

 

Index category:

· Ordered index
· Hash Index

The index is mapped to the hash bucket. The ing is performed through the hash function.

 

Index Evaluation Criteria:

1. Access type (if the equivalent value is better than the hash value, if the range search is better, the order is better.

2. Access duration (in order to complete an access, the access time based on the index type may be different)

3. Insert duration (if the table is updated, the index itself may be very costly. If the hash index is used, the following algorithm can be re-executed. However, for the sequential index, it is possible to move the index data after the index List)

4. Deletion duration

5. Space overhead

 

Index type:

· Ordered index: the files stored by clustered indexes are also called index ordered files. The most common index types are indexed file records. If they are stored in order, they are indexed ordered files, otherwise, it is a heap file.

· Clustered index: if the record order in a record file is sorted by the corresponding search code (key/key), it is called the primary index.

· Non-clustered index: the specified order in the search code is inconsistent with the record order

 

Create an index based on whether the index is for each record response:

· Dense index (each search code value has a corresponding index item

· Sparse index (not every record has an index)

· Multi-level indexes (indexes point to indexes, and so on, and the final indexes point to data;

Index itself, indexes other than the primary index are called secondary indexes, and only the primary index can use sparse indexes. All other indexes must be dense indexes, and secondary indexes must be dense indexes.

· B + tree indexes:

· Balance Tree Index

· Each leaf node has the same distance from the leaf to the root, so it is called a Balance Tree.

· Layers must be dynamically created based on the data volume

· B + tree is a sequential Index

 

 

Hash index:

Through the hash function, the database is loaded as an I/O pointer to load data twice.

I/O occupies the maximum proportion of time, and the speed of Index HASH indexes is faster in exact matching, Because I/O times are much less, therefore, the hash index allows us to avoid access to the index structure.

The disadvantage of the hash index: the hash index may also cause skew. For a long time, the load on each node may be uneven due to the full null of the hash bucket, if the hash function is not random enough, it may cause skew.

So the hash function needs to do the following:

· Distributed random

· Distributed and even

Applicable scenarios of hash functions: exact value matching, for example, equivalent comparison: =, IN (), <=>

 

Full Text Index:

By default, the ordered index can only index the first limited byte of the field. If the field name is test, test can create and store a large amount of text, and it is impossible to store all the data in the index, only some bytes are extracted. Therefore, the search criteria must be the leftmost prefix and cannot contain the entire field. If you want to match full-text matching keywords, in this way, full-text indexes can only be used (mysql only supports the myisam engine) (innodb can use external index tools for implementation, such as sphinx)

If you must implement full-text indexing, using sphinx is a good choice.

 

Spatial Index:

The data in the index cannot be searched. You must use the spatial index function to obtain the corresponding search result.

 

Index features:

· Full value matching:

To put it simply, match the username: Name = "User12" and the leftmost Prefix:

Name LIKE "User1 %"

Invalid: Name LIKE "% User1 %"

 

· Matched column Prefix: Same as the leftmost prefix (Name LIKE "User1 %" is invalid: Name LIKE "% User1 %"). If the composite index creates two fields: Name, age is valid from the leftmost, So Age> 80 is meaningless, because the search condition must start from the leftmost, but in turn it is very useful: (Age, Name)

Matching range value: precisely matches a column and matches another column in the range, for example, name = 12 and age greater than 80.

 

Access only the index query:

Assume that the order index is three levels. To find the corresponding row data, if no overwrite index is used, the I/O is required several times: first, the root index is found. then look for the next level index. If the next level index is on the disk, it means that the data block will be loaded. This is one IO, And the next index consumes another IO, and the hard disk data will be loaded again, i/O again. If the root index is not loaded in advance, it takes at least four times for I/O to find the data.

The primary key. The unique key is a sequential index, but the only difference is that the primary key cannot be repeated and cannot be empty. The unique key can be repeated and can be empty.

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.