Spatial index
Index we have used, it is a special storage structure, like the library of books in the classification of storage strategy or Modern Library book query system, can help us quickly find the book we need. In the database, the storage of the index is usually implemented by using B-tree or + + trees, and the binary method is used to locate the data position quickly.
Normal indexes are drawbacks for one-dimensional data (key->data), but there is little you can do to face spatial data (Lon,lat-a) If you are looking at a point near (116.27636, 40.041285):
We create a normal index on the LON or LAT column, assuming that it is a LON column, and then, after finding the data in the same longitude through the Lon column, we also filter out data that is too large in latitude.
If you create a multicolumn index on Lon,lat, it is fast to query for data that is similar to the same longitude and latitude, but the nearby points are not just the same longitude.
In this way, the spatial index will be used. Spatial index through the data structure such as Quadtree, R-tree, and the Geohash algorithm, the two-dimensional data is transformed into one-dimension using ordinary B-tree index, which can realize the fast searching in the space scope.
However, today's theme is not here, our primary goal is to solve the problem, the implementation of these spatial indexes to write a special article to achieve. This article is to say that the existing database of spatial index support, I hope to help the same as my GIS small white technology selection.
Set up to switch the POI data storage database, took a week to install the configuration of various databases to test the efficiency of the spatial index, testing the Redis, Mongo, PostgreSQL, Mysql, the several well-known support spatial index database, technology selection basically completed, But in the middle of the pit and configuration experience can not be lost, the details are as follows:
About Redis
Redis, a powerful, highly efficient cache database (perhaps not just a cache database), has become an integral component of relational storage-based databases in every project. First of all, it is because of its efficiency is guaranteed, and the project is almost necessary, operation and maintenance cost is very low. The Redis spatial index uses the Geohash principle, with the collection storage, the query efficiency is close to log (N).
Redis 3.0 and above support spatial index, new projects do not have to consider these, and the general old project may need to upgrade Redis, in addition, PHP may also upgrade the extension of Redis to support the spatial indexing function of Redis.
Use
Redis installation configuration no longer mentioned here, here is a brief introduction to Redis's GEO-series functions.
GEOADD key longitude latitude member [longitude latitude member ...]
Geoadd add elements to the collection, you can add more than one element at a time, the corresponding PHP function prototype is: geoadd($key, $lon, $lat, $member)
;
GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [ASC|DESC] [COUNT count]
Georadius the elements within a circle that are centered at the target point and radius in the collection. Its PHP function prototype is georadius($key, $lon, $lat, $radius, $unit, $options)
; its $options is similar toarray(‘count‘ => $count, ‘WITHDIST‘ ...);
Conclusion
Redis is really efficient and easy to use, but there is an insurmountable problem, that is, you can't implement a multi-criteria query. Redis is impeccable just by querying nearby points, but what if the demand is to inquire about nearby restaurants? Or what about ' Wanda ' near the demand query?
It is not possible to implement:
Storing the details of each location in a relational database, the member in Redis stores the primary key ID of each location in the relational database, queries the ID of the location, and then takes the details of the location to filter.
Additional network overhead and IO overhead are caused by unnecessary library access.
In a certain rule stitching member value, for example $memeber = $name.‘,‘.$category;
, after the query to the location after the resolution member filter.
Compared to the above method, the network cost is saved, but not flexible enough, if added to the ' city ' limit, then the entire library of data will be cleaned.
Reference: Redis command reference? Geo (geographical location)
MongoDB Introduction
MongoDB is a well-established database that supports spatial indexes, and as a document database, it works fine when storing logs or static data. It provides two types of spatial indexes:
- The 2d index supports the index of the platform's normal coordinates before the 2.4 release; we no longer consider that there is a greater error in efficiency when storing and computing on a large scale.
- 2dsphere indexes support queries perform geometric calculations on a globe-like sphere, storing data in the form of Geojson objects or normal coordinate pairs.
Both the 2d index and the 2dsphere index are implemented with the B + tree using the Geohash algorithm.
Use
Mongo the way to create a spatial index is simple: db.collection.createIndex( { field : "2dsphere" } );
.
Query statements are similar (the following are the locations within 3000 meters from the target point):
db.poi.find( { loc : { $near : { $geometry : { type : "Point" , coordinates : [ 113.965355, 23.782865] } , $maxDistance : 3000 } } } )
The use of Mongo should be noted as follows:
Mongo's PHP extension has been updated, the old extension has been discarded, operations to use MONGODB\DRIVER\XXX and other classes to do, the specific method or the official document is relatively clear.
The Mongo 2dsphere index requires indexed fields to store the data as GeoJSON objects, which are constructed in PHP in a similar manner:
$document = [ ‘loc‘ => [ ‘type‘ => ‘Point‘, ‘coordinates‘ => [$lon, $lat], ], ‘name‘ => $name ];
MONGO need to use the RunCommand command when querying the return distance, whose syntax is similar to db.runCommand({"geoNear":"collection", "near":[lon, lat], "num":count, query:{other condition}})
;
Conclusion
MONGO's spatial index is more flexible, GeoJSON objects are a bit, line, polygon, multiple line segments, multipoint, multiple polygons. Supports the inclusion, intersection, and proximity of queries, and it also solves the problem of multi-conditional queries for Redis.
However, the test found that MONGO has the following problems:
- Performance can drop sharply when large amounts of data are being carried out, especially when the results are many, and the query time is simply impossible to read.
- Mongo support for Word segmentation fuzzy query is not very good, it is necessary to find a way to make fuzzy query by site name.
- The security configuration of the Mongo is a problem.
Reference: MongoDB geospatial Indexes and Queries (geospatial Indexes)
Mongodb? GeoJSON
PostgreSQL Introduction
PostgreSQL is a well-known relational database, built on the space object Extension Module PostGIS makes it a real large-scale spatial database. It achieves a common spatial index through R-Tree or GIST tree index, and the query efficiency is very high. At the same time, it supports the fuzzy query of Word segmentation, and can also solve the demand of query by location name.
PostGIS is an open source program that provides storage geospatial geo-data support for object-relational database PostgreSQL, making PostgreSQL a spatial database capable of spatial data management, quantitative measurement, and geometric topology analysis. PostGIS realizes the SQL implementation reference of the Basic feature class (Point, Line, Polygon, multipoint, multi-line, multi-plane, etc.) proposed by Open Geospatial Consortium.
Use
The use of PostgreSQL, compared to other databases, is more cumbersome.
- To use the spatial index of PostgreSQL, you need to install PostGIS, because it relies on a lot of complexity, can use yum,apt-get,homebrew tools such as priority use;
- Initializes a database with the INITDB command after the database is complete;
- Use a non-root user to
postgres -D datadir
open the service;
- Use
CREATE EXTENSION postgis;
installation extensions;
- Use
CREATE INDEX idx_name ON table USING gist(field);
;
You can then build a table to guide the data.
The following is a typical query statement (query with the location name and distance within 3000 meters of the target point):
SELECT name, ST_Distance(loc, ST_GeomFromText(‘POINT(113.2242 23.323234)‘, 4326)) as dist from test_table WHERE ST_DWithin(loc, ST_GeomFromText(‘POINT(113.2242 23.323234)‘, 4326), 0.03) ORDER BY dist limit 200
You also need to be aware of the use:
- If a Chinese word segmentation query is required, an option is added to specify the character set when initializing the database
-E UTF8
;
- PostgreSQL can not use the root user login, the control of the authority is relatively strict, the switch needs to give permission;
- When you build a table, you specify its SRID (spatial reference identifier, a unique identifier associated with a particular coordinate system, tolerance, and resolution) that is stored in latitude and longitude for 4326; For example, it is
loc geometry(point, 4326)
also important to use srid:4326 when converting data to gometry points;
- The primary key index data type can be specified as
serial
, type in the MySQL int auto increment;
- Use
\timing on
and \timing off
to toggle whether the command execution time is displayed;
Conclusion
PostgreSQL support for spatial queries is very flexible enough to support a variety of complex spatial queries, PostGIS can calculate the real space distance in different projected coordinate systems, and the query efficiency is very high, in a large amount of data will not be like MONGO performance drops sharply.
At the same time, its relational database features enable us to make multi-conditional query, and finally it can also use the zhparser
extension to Chinese word segmentation to support the location name fuzzy query.
Although it is slow to write when there are complex indexes, it is irrelevant for storing the infrequently changing location information.
Reference: Pgsql function Analysis · Advantages of PostGIS in the application of applications
The millisecond-level efficient implementation of full-field fuzzy query for PostgreSQL full table
MySQL Introduction
The importance and power of Mysql is needless to say, and its storage engine, MyISAM, supports spatial indexing very early. InnoDB, however, adds support for spatial indexes in the 5.7.4 Labs version.
They all use the R-tree to achieve spatial indexing.
Use
When using a spatial index in Mysql, be aware that:
- The field of the spatial index must first be set to
field geometry NOT NULL
;
- Create a column of spatial indexes using the build spatial index
SPATIAL KEY `idx_fld` (`geom`)
;
- The conversion function of string and geometry in SQL statement
POINTFROMTEXT(‘POINT(lon lat)‘)
;
- To make a range query, you first construct the space area:
GEOMFROMTEXT(‘Polygon((lon1 lat1,lon2 lat2,lon3 lat3,lon4 lat4 ...))‘
The following is a typical spatial query (the query is from a point within 3km of the target point):
SELECT id, ST_Distance_Sphere(Point(-73.951368, 40.716743), geom) as dist, tags, ST_AsText(loc)FROM nodesWHERE ST_Contains( ST_MakeEnvelope( Point((-73.951368+(3/111)), (40.716743+(3/111))), Point((-73.951368-(3/111)), (40.716743-(3/111))) ), loc ) ORDER BY dist LIMIT 10
Conclusion
Because Innodb is much more powerful than MyISAM and is irreplaceable for functions such as transactions, row locks, B + Tree indexes, MyISAM is no longer discussed here.
Mysql's spatial index query efficiency is not low. As a traditional relational database, its multi-conditional support and participle are well supported.
Although there is confidence in the spatial index of InnoDB, but also slightly expected, but for a long time existence of the system, the database version of the upgrade is really not a simple thing.
Reference: MySQL blog-mysql support for GIS spatial data
Summarize
I tested it with 1.26 million POI data, querying for points within the range of 3km (up to 200). System Information: macos10.12 (x86_64), Core: 2 GHz Intel Core i5, Memory: 8 GB 1867 MHz LPDDR3;
The following is a comparison of each database:
Database |
Time Consuming |
Area Query |
Multi-Conditional support |
Word breaker Support |
operation and maintenance of complexity |
Notes |
Redis (3.2.8) |
1-10ms |
Not supported |
Not supported |
Not supported |
Low |
Simple but functionally single |
MONGO (3.4.4) |
10-50ms |
Support |
Support |
Not supported |
In |
When the data is large, the performance decreases obviously. |
PostgreSQL (9.6.2) |
3-8ms |
Support |
Support |
Support |
In |
Slow Data Write |
MySQL (5.7.18 Innodb) |
8-15ms |
Support |
Support |
Support |
Low |
Version upgrade too difficult |
There may be some errors in the test operation, there is a good ability to use these databases can be commented on the exchange.
The database is not necessarily good, as long as the scene can be.
If you feel that this article is helpful to you, you can click on the recommendation below to support me. Blog has been updated, welcome attention .
Spatial index-usage reports for each database spatial index