In the previous article, we looked at a lot of the basic principles of distance and clustering, starting with this chapter, we talked about some specific tools and algorithms.

Before we use the Moran index, p-value, Z-score What, we can get a copy of the data is discrete, random or aggregation, if more than one data is aggregated, which of the data is the highest aggregation? This requires a specific value to quantify.

Of course, the Z-score can reflect the aggregation degree to some extent, but he is not purely considering the spatial aggregation. So there's an algorithm we're going to talk about today (in ArcGIS, called "Average Nearestneighbor", in the "Analysis mode toolset" of the Spatial Statistics Toolbox): The average nearest neighbor.

The average nearest neighbor can produce an exponent of the degree of aggregation of the data, which, through this index, can be used to compare which data is most aggregated in different data.

There are two of data that show the cluster distribution, but which one is more clustered? In particular, without regard to attributes (very rogue use of pure spatial clustering mode).

Then using this method, you can calculate the specific degree of clustering of each data, the results are as follows:

Here's a comparison:

From the average observation distance and the average expected distance, the difference is not big, where the average observation distance of data one is greater than data two, and the expected distance, data one is less than data two.

The expected distance is related to the maximum distribution of the entire data, that is, the distribution area, and the last nearest neighbor index is as follows:

The nearest neighbor index of two data is less than 1, the pattern is clustering, conversely, if the exponent is greater than 1, then the pattern of the expression tends to be discrete or competitive.

The smaller the index, the greater the degree of clustering, so the degree of clustering of data two is higher than the data one.

What is the principle of this kind of calculation? Keep looking down.

The average nearest neighbor tool is the first to assume a randomly distributed average distance within the study area (recorded as de). Then measure the centroid of each feature, the distance between the centroid of the feature closest to him, and then calculate their average (do) by the distance after these measurements. Finally, with Do/de, the average nearest neighbor index is obtained.

If de > Do calculates an exponent **greater than 1**, then the pattern of this data tends to be **discrete** .

If de < do, the calculated exponent is **less than 1**, then the pattern representing this data tends to **converge** .

The closer the index is to 1, the greater the chance of random.

The method is calculated as follows:

First, assuming that there are n points in the study area and the area of the study area is a, the formula for assuming their average expected distance is:

For example, we have 3 points (the point here, which generally takes the same amount of features to cover the same total area), and the area of the study area is 60, then

De =0.5/sqrt (3/60) = 2.23606797749979

Then the average observed distance of the actual data is calculated, and the formula is as follows:

Where di is the distance between each feature and his nearest feature, such as:

do = (4 +6 + 7)/3 = 5.6667

Then calculate their average nearest neighbor index.

ANN = 5.667/2.2361=2.5343

This calculated value, far greater than 1, more than twice times more, that shows that is in discrete mode.

Of course, you also need to calculate the Z-score, in which case the Z-score is calculated as follows:

Where the SE formula is as follows:

As the above data, the calculated Z-score is:

z = (5.6667-2.2361)/(0.26136/(sqrt (3*3/60)) =0.67482861824318

According to our previous P-and Z-scores, the Z-score between 1.65--1.65 is a statistically random distribution trend.

Well, given the calculation of the data is very random, and the results show that we are given a random data. But through the above calculation, just want to explain the average nearest neighbor calculation process.

From our calculations above, we can see that the average nearest neighbor, the area of the **study area is very sensitive, a little change, the results will have a significant impact** (especially the P-and Z-scores will vary drastically). So we'd better **specify a fixed area value** before the calculation.

If you do not specify an area value, the system will default to the minimum bounding rectangle of your research data to determine the area of your study, so the reliability of the calculated results will lead to more uncertainty. As shown below:

If you do not specify a fixed area, the above situation will occur, the region has changed, the calculated results may also change.

Therefore, the average nearest neighbor tool is best suited for **comparing different features in a fixed study area** . For example, in the same city range, different types of enterprises in the distribution of research, or the same type of enterprise, within the fixed area, with the change in different years of research.

If you have questions, please pay attention to the shrimp God public number:

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Six of vernacular spatial statistics: the average nearest neighbor