Index, which allows you to quickly access specific information in a database table. An index is a structure used to sort the values of one or more columns in a database table. For example, the name of the employee table
Index, which allows you to quickly access specific information in a database table. An index is a structure used to sort the values of one or more columns in a database table. For example, the name of the employee table
Index, which allows you to quickly access specific information in a database table. An index is a structure that sorts the values of one or more columns in a database table, such as the name column of the employee table. If you want to search for a specific employee by name, the index will help you get the information faster than all rows in the table that must be searched.
------------------------------------------------------------------
MongoDB details: click here
MongoDB: click here
Related reading:
MongoDB backup and recovery
CentOS compilation and installation of MongoDB
CentOS compilation and installation of php extensions for MongoDB and mongoDB
CentOS 6 install MongoDB and server configuration using yum
Install MongoDB2.4.3 in Ubuntu 13.04
How to create a new database and set in MongoDB
MongoDB beginners must read (both concepts and practices)
MongoDB authoritative Guide (The Definitive Guide) in English [PDF]
------------------------------------------------------------------
Advantages of indexes:
You do not need to perform full table scan. You only need to scan index indexes to store only a small part of the data in this table. This small part can help you achieve quick query, therefore, only this small part can be scanned during scanning. If this small part is loaded into the memory, the speed will be faster.
· Greatly reduces the amount of data to be scanned by the server
· Indexing helps the server avoid sorting or using temporary tables
· The index can convert random I/O to sequential I/O.
Disadvantages of indexing:
Indexes store a small part of data in a data table. Therefore, the data needs to be stored additionally. If the data in the table is updated, the index data to be responded must be updated, the search operation is accelerated, but it is still to be evaluated whether the write speed is useful for the search acceleration, for example, if we create an index by age in a table (create an index by age), most of the operations are usually performed by name, so the index has no effect, the so-called index must be exactly matched with the search creation to make sense, but we need to know that most searches may not only be executed on finite fields, which means that the index creation must contain multiple segments, you need to see how the index is generated. You can use the index as a composite index for multiple conditions. Therefore, the index design is very skillful.
The index itself may not be an advantage. If a table has many indexes, the impact on the overall system performance may be very large. If a table has only a dozen rows, creating an index slows down because full table scan does not take long.
However, if the table is very large, the index is very useful. If the data volume is too large, the index may not be meaningful. For example, if a table is very large and has T data, you can imagine what indexes you can create. Therefore, you can only cut large tables into small tables and distribute them on different physical nodes. For mysql, partitions are called; for mongodb, shaerd is called.
Index level:
Top 3-star Index
1 star: indexing can place relevant records together, greatly reducing I/O
2 stars: The storage sequence of the data in the index is the same as that in the search criteria (as long as it is well designed)
3 stars: if the index contains all the data required for the query (covering the index)
Index category:
· Ordered index
· Hash Index
The index is mapped to the hash bucket. The ing is performed through the hash function.
Index Evaluation Criteria:
1. Access type (if the equivalent value is better than the hash value, if the range search is better, the order is better.
2. Access duration (in order to complete an access, the access time based on the index type may be different)
3. Insert duration (if the table is updated, the index itself may be very costly. If the hash index is used, the following algorithm can be re-executed. However, for the sequential index, it is possible to move the index data after the index List)
4. Deletion duration
5. Space overhead
Index type:
· Ordered index: the files stored by clustered indexes are also called index ordered files. The most common index types are indexed file records. If they are stored in order, they are indexed ordered files, otherwise, it is a heap file.
· Clustered index: if the record order in a record file is sorted by the corresponding search code (key/key), it is called the primary index.
· Non-clustered index: the specified order in the search code is inconsistent with the record order
Create an index based on whether the index is for each record response:
· Dense index (each search code value has a corresponding index item
· Sparse index (not every record has an index)
· Multi-level indexes (indexes point to indexes, and so on, and the final indexes point to data;
Index itself, indexes other than the primary index are called secondary indexes, and only the primary index can use sparse indexes. All other indexes must be dense indexes, and secondary indexes must be dense indexes.
· B + tree indexes:
· Balance Tree Index
· Each leaf node has the same distance from the leaf to the root, so it is called a Balance Tree.
· Layers must be dynamically created based on the data volume
· B + tree is a sequential Index
Hash index:
Through the hash function, the database is loaded as an I/O pointer to load data twice.
I/O occupies the maximum proportion of time, and the speed of Index HASH indexes is faster in exact matching, Because I/O times are much less, therefore, the hash index allows us to avoid access to the index structure.
The disadvantage of the hash index: the hash index may also cause skew. For a long time, the load on each node may be uneven due to the full null of the hash bucket, if the hash function is not random enough, it may cause skew.
So the hash function needs to do the following:
· Distributed random
· Distributed and even
Applicable scenarios of hash functions: exact value matching, for example, equivalent comparison: =, IN (), <=>
Full Text Index: