Vernacular Spatial Statistics 24: Geographic weighted regression (iv)

Source: Internet
Author: User
Tags constant types of functions

Originally this chapter is ready to write (copy) the ArcGIS Help document, write the use of the geo-weighted regression tool ..., and then directly end the geographical weighted regression, but recently received a lot of students e-mail, many of them were dropped in the shrimp God dug out of the big pit inside, such as write a method, not listed formula, Another example of writing a formula wood has a derivation process (...). As a shrimp God of high numbers, he knew me and I did not know him. )

So this time to write GWR, as much as possible to dig a little pit, write the things are written, accesses than either in order to pass by the classmate less drop pits, both good memory than rotten pen (rotten keyboard ...). , just finish writing as a reading note or memory index.

So geographically weighted regression, may also have to write several chapters of the principle, if you want to fast-forward students, please go directly to the ArcGIS Help document in the Spatial Statistics Toolbox-spatial relationship modeling-the geo-weighted regression section, the students who installed ArcGIS for desktop directly can open the Help document, You can also check the address:

Http://pro.arcgis.com/zh-cn/pro-app/tool-reference/spatial-statistics/geographically-weighted-regression.htm
If you find the help document too obscure, then you can only be patient and so busy with the shrimp God ...

Today, we write about the choice of spatial weights in a spatial weight matrix in a geographically weighted regression, and after reading it, you can explain the meaning of two important parameters in the GWR tool in ArcGIS.

As I wrote in the previous section, the most important aspect of geo-weighted regression is the so-called spatial weights matrix, which is computed with the conceptualization of spatial relationships, and in ArcGIS there are seven types of spatial relationship concepts, as follows:



From the previous analysis, we can know that both the proximity method, or the contact method, will lead to local regression results, that is, the calculation of the interval is not the same, will result in a change in the number of samples, and all added in the operation, and become a global regression, so in GWR, can and can choose, only distance method.

First look at the distance threshold ... The concept of distance thresholds is




As the military study terminology says: Close at hand, people do the enemy.

So holding a 40-metre long knife, you can be very generous to say: "Come, allow you to run 39 meters first" ... Because within 40 meters, is the range of attack, can be regarded as 1 ..., 40 meters away, are 0,

So the distance threshold is also a partial regression, which brings the question to see an article, or the first few paragraphs.

The rest can only be the inverse of the distance, the so-called inverse, is the distance, the smaller the weight, it seems reasonable, because this statement conforms to the first law of geography, the closer the relationship is greater, the farther the relationship is smaller, so give a distance formula:

where α is a constant, can take 1 or 2 (of course, you can achieve greater, its meaning is whether to highlight the meaning of distance changes, refer to the following image). According to the official ArcGIS empirical formula, this value is best between 0-3 (preferably not equal to 0, equal to 0, D becomes constant).


However, this also has a problem, that is, when our regression point, is also a sample point, there is a return to the point of observation of the value of the infinite value of the situation ... To eliminate this situation from the sample data in each calculation, there will be a series of problems, such as precision reduction and so on, so the inverse distance method can not be used directly in GWR, need to make some corrections.

Here is a function of the inverse distance improvement method that is most commonly used in GWR. is to choose a continuous monotone decrement function to represent the relationship between the weight W and the distance d, in order to overcome the disadvantage of inverse distance. In general, there are many functions that can satisfy this idea, but there are two types of functions that are most widely used because of their universality:

1. Gauss Function method

The representation of the Gauss function is as follows:


The function is represented as follows:

The so-called bandwidth B, refers to the weight and distance between the function of the non-negative attenuation parameters, as shown in the above figure, the larger the bandwidth, the weight increases with the distance attenuation of the slower, the smaller the bandwidth, the weight increases with the distance attenuation is fast. This parameter is the same as the power function above the inverse distance, but different from the direct inverse distance formula is: In this formula, when the bandwidth is 0, only the weight of the return point is 1, the other observation points of the weights are infinitely nearly 0, so that the process of regression is the data re-expression just. When bandwidth is infinite, all of the observation points are infinitely close to 1, so it becomes a global regression.

After the data is brought in, as long as the bandwidth given, the distance d is 0, the weight w = 1, the weight to reach the maximum, and with the increase in distance, the weight w gradually reduced, when the source is sufficient, the weight w is infinitely close to 0. So these points are far enough to be seen as having little effect on the parameter estimates of the regression points.

However, if the data is very discrete, the result is that there is a large number of data hiding far away, this so-called "long tail effect" will bring a lot of computational overhead, so in the actual operation, the use of near-Gaussian function to replace the Gaussian calculation, those that have no effect (or little influence) to cut off, to improve efficiency, In Professor Fotheringham's 1998 paper, it was also proposed that the Bi-square function be used for calculation.

The Bi-square function is represented as follows:




As can be seen from the above figure, the Bi-square function is actually a combination of the distance threshold method and the Gauss function hair method. The regression points are in the range of bandwidth, and the weights of the data points are calculated by the Gaussian contact monotonically decreasing function, and the weights are all recorded as 0.

These two functions, in the actual calculation of GWR, are the two most used methods.

Next preview: The choice of bandwidth is an important parameter in the calculation of spatial weights, the next section will briefly discuss the two methods of bandwidth selection, said after the concept, formally into the software operation introduction part.

To be continued.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.