A localization algorithm based on naive Bayes

Source: Internet
Author: User
Tags map class

1 Positioning Background Introduction

when it comes to positioning everyone will think of GPs, but GPS positioning is slow for the first time (specifically can refer to the previous blog post "lbs positioning technology"), indoor can not use, power and other defects, these defects greatly limited the use of GPs. In most mobile internet applications such as Google Maps, Baidu maps, etc. , often based on WiFi, base station to locate.

The General app will report the detected WiFi signal and the base station signal when it requests location. In the case of WiFi, the hand detects the corresponding signal strength (RSSI) for each of the surrounding WiFi (MAC addresses), which is the collection of signal vectors (<WF1, rssi1> <wf2, rssi2> ... <wfn, rssin>). After the server receives the client request, it passes the signal vector to the positioning engine , which is returned to the service-side location result (x, y), positioning accuracy, etc., and the service side Returns the result to the app. .

Location engine work is based on two parts: 1) Large-scale data collection; 2) fine algorithm model.

, users in the request app, especially the Map class app, often manually turn on GPS positioning mode, and then the app will collect the base station WiFi signal and GPS location to the coordinates sent to the server, the service side will GPS coordinates (x, y) and signal vector (<WF1, RSSI1 > <wf2, rssi2> ... <wfn, rssin>) relate to the library.

As long as the user is more than enough to quickly accumulate a large number of "location-signal" data . How to use this data to achieve base station WiFi based positioning will be a challenge.

by reading the literature, this paper A localization model based on naive Bayesian is designed .

2 localization model based on naive Bayes

We look at the location problem from the perspective of probability.

Our goal is to calculate the probability of finding a position in the case where the known signal vector m (<wf1, rssi1> <wf2, rssi2> ... <wfn, rssin>) is found, and the objective formula is as follows:

It is very difficult to solve the equation directly, but we can do a Bayesian conversion of the formula:

The denominator p (m) indicates the probability that the signal vector m will appear, which is constant for the user's request and therefore can be ignored, so the target formula is converted to:

where P (p) represents the probability that position p appears, p (m|p) indicates the probability of the signal vector m appearing at position p. For the sake of simplification, we assume that the probability of a position p appears equal (it should be unequal, such as the probability that the location is in the lake and the street in the city center), which can be considered later. This translates the target formula to:

Max (P (m|p)) refers to the finding of a point p in the geospatial, which makes the signal vector m the most probable occurrence. We can be exhaustive, calculating the probability of the signal vector m appearing at each point in the space and finding the most probable point. Since geospatial space is a two-dimensional plane with infinite points, this calculation is unacceptable, and we can simplify the calculation by meshing the geospatial space (as shown). We divide the geospatial into m*n-sized meshes (which can be encoded by Geohash, referring to the previous blog post "Geohash"), So Max (P (m|p)) is converted to find a mesh in the geospatial, making the signal vector m the most likely to appear.


How to calculate the probability of a grid p appearing signal vector m?

By cleaning the "location-signal" database, you can count the number of signal vectors that appear in each grid, i.e. get the histogram.

After obtaining the histogram of the signal vectors of the grid p, we can find the probability of P (m|p) (the probability of the signal vector m appearing in the lattice where P points are located).

In practice, however, it is almost impossible to count the signal vectors, although there are many requests for Baidu to locate the results fall within the grid p, but these requests carry the signal vectors are almost different, to get a statistically significant histogram is almost impossible.

Therefore, we continue to simplify the formula, we assume that each WiFi signal is independent of each other, this hypothesis is also reasonable. The formula is then converted to:

Now the problem is solving P (wfi=rssii | p), that is, in the grid p, the MAC address is WFI and the signal strength is rssii probability. We can count the histogram of each WiFi signal in the grid p in advance, so it is easier to form a statistically significant histogram.

The above model requires us to mesh the geospatial space and pre-calculate the signal histogram for each wifi in each grid and store it. However, in the actual application of our grid number will be very much, the cost of storing histograms is large, so as far as possible to save the information that is carried in the grid. So one way of thinking is to use Gaussian distribution curve to simulate the histogram, so for a grid, only need to store each wifi corresponding Gaussian distribution of several parameters can be.

Of course, the histogram curve of each grid can not be approximated by the Gaussian distribution curve, some literatures indicate that there may be a variety of curves such as Shuangfeng curve, so can also use the kernel density estimation method.

By means of the above method, we can find the grid with the maximum probability of the signal vector m and return it with the center point coordinate of the grid. Of course, sometimes it is difficult to distinguish, for example, we find the probability of A and b two lattice is almost equal, this time just return a or B is not appropriate, weighted interpolation is a more reasonable choice. A relatively robust approach is to find TOPK meshes and then interpolate them.

3 Problems and Solutions

1) How is the discrete mesh size determined?

If the grid is too small, the storage is huge, and many grids do not have any histogram of the signal, if too large, the accuracy is not guaranteed. A method grid size is fixed value, according to the experimental results to determine the grid size, the better way is the adaptive grid size, for the signal is more dense in the local (urban) grid is small, for the signal sparse local (suburban) grid set large.

2) What if there are some new MAC address signals in the signal vector requested by the user?

If the following formula is applied directly to the calculation, then the probability will be 0, because we are the probability of the multiplication, as long as a probability of 0, the overall result is 0. In order to remove the 0 probability, a simple method is to use the addition of a smoothing (Add-one smoothing) or Laplace smoothing (Laplace smoothing).

3) How to improve computational efficiency?

In theory, when a user requests a request, we need to traverse the probability of each grid in the computed database and return the center point of the maximum probability grid. Assuming that our lattice is 10*10 meters in size, then all the grid in Beijing will have 160 million lattice, traverse computation overhead is very huge.

A method to improve the computational efficiency is to solve the approximate spatial range based on the user's signal vectors, and then calculate the probability of each lattice in the spatial range.

4 References

Practical Metropolitan-scale positioning for GSM phones

Cellsense:a probabilistic rssi-based GSM positioning System

An improved algorithm to Generate a Wi-Fi fingerprint Database for indoor positioning

Topical issues:location Fingerprinting

Location method of wireless sensor network based on interval number clustering

Research on application of KNN algorithm based on optimization in indoor positioning

A localization algorithm based on naive Bayes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.