A basic introduction to the kriging interpolation can be found in the ArcGIS Help documentation [1]. Its essence is to determine the value of its surrounding points (prediction points) based on the values of the known points. The most intuitive method is to find the relationship between the known points and the predicted point values, thus predicting the value of the predicted point. The IDW interpolation method, for example, assumes that the values of the known and predicted points are inversely proportional to their relative distances. The subtlety of kriging is that it takes into account not only the distance between known and predicted points, but also the autocorrelation relationship between these known points.
How do you measure the autocorrelation relationship between known points? The semivariogram is commonly used, and its formula is as follows [1]:
Semivariogram (distance h) = 0.5 * Average ((value I–value j) 2)
This is where the kriging interpolation differs from the core of the other interpolation methods, and by calculating the squared sum of the deviations of all the paired points within each distance range, you can plot the variance values at different distances, as follows [1]. We know that the sum of squared deviations is a measure of the degree of change in a set of data, and here, by calculating the sum of squares of the distance between all known points and the neighbor points in different distance ranges, we can roughly measure the degree of variation between the known points and the neighbors of different distance ranges, usually by curve fitting, commonly including exponential fitting, Spherical fitting, Gaussian fitting, and so on. And the result of this fitting is that we need to interpolate the value of the nugget, the base value and so on.
Note that, so far only the kriging interpolation is about how to parse known point information and how to construct relationships between these known points, the entire interpolation (prediction) process is to assume the relationship between this known point: the distance relationship and the autocorrelation relationship, as well as for the prediction point!
However, in the absence of more information, how do we know if this interpolation is credible? At present, a more reasonable method is to cross-validate the crosses validation. The essence is to come up with some known points as prediction points, these points are taken out of the exploration process of the known point relationship, but as a validation data to measure whether our predictions are reasonable. For example, each time we come up with a known point as the validation data to verify the predicted value of this point, we can get the deviation between all known points and their predicted values, and the deviation of all points provides us with a reasonable basis for the whole prediction method in some way.
An example of an R-Gstat spatial interpolation package:
For example, suppose we have a raster diagram of a data point we know that needs to interpolate values from other predictors (white transparent areas):
First, we need to read this raster data into R (we need to install the raster package)
Install.packages ("Raster")
Library (Raster)
data.observed <-Raster ("c:/users/workspace/allmax.img")
Since Gstat data is processed in data frame format, we first write a function that transforms this raster data into data frame:
raster_to_dataframe<-function (inraster, name) # { rp<-rastertopoints (inraster) df = Data.frame (RP) colnames (DF) = C ("X", "Y", "VALUE") df$name<-NAME return (DF)}
The name parameter can be a random string that is convenient for identifying data when merging with other data frames, and we call this function as follows:
Data.observed <-raster_to_dataframe (data.observed, "observed")
To view the results converted to data frame, you can use:
Head (data.observed)
X Y VALUE name1-318466.5 3794841 observed2-304466.5 3794841 observed3-303466.5 3794841 9 observed4-302466.5 3794841 observed5-301466.5 3794841 7 observed6-297466.5 3794841 Observed
Well, next is to analyze this write known point, Gstat provides a semivariogram drawing function Variogram:
v<-Variogram (object = Value~1,data = Df.wg,locations =~x+y, width=) plot (v)
By observing, the autocorrelation relationship of the known points semivariance with the increase of the distance distance, which shows the weakening of the exponential form (remember?). Sum of squared deviations? The smaller the value, the stronger the representation of the relationship, so by observing the distribution of the variance of the Semivariogram, determine the use of an exponential model to fit:
V.fit = Fit.variogram (V, VGM (model = "EXP", psill=, range = 20000, Kappa = ten), Fit.sills =) plot (V, Model=v.fit)
V.fit
How to determine the initial value of Pstill, range and so on, also by observing the approximate given can, the Fit.variogram () function will automatically calculate the correct result, the initial value of the existence of the function is only auxiliary calculation, not very correct, the results such as:
Model Psill Range1 EXP 61.39472 5326.663
As you can see, R calculates the new Psill and range values.
Next, we can use the information from the above analysis to perform kriging interpolation. The Krige function is used here:
kingnewdata<-raster_to_dataframe2 (MAX.ALLGF, "Interpolate location") aa<-Krige (formula = Value~1,locations =~x +y, model = VGM (61.39472, "Exp", 5326.663, ten), data = data.observed, Newdata= kingnewdata, nmax=12, nmin=10)
The parameter formula indicates the relationship between known points and other information (if this example is known to represent the soil moisture content, we also know the rainfall information for these points, then the formula here is a simple linear relationship between the soil water content and rainfall), and here we have no other information, that is, ordinary kriging interpolation. The parameter location is the coordinates of the known point, where x represents longitude, and the Y value represents latitude. Parameter model is the exponential model we analyzed above, using the fitted parameters. The data frame that represents the known point to use. NewData represents the position of the point to interpolate, note that to include the same variable name as the dataframe used by the data parameter (in this case, the position is uppercase X, Y), Nmax and nmin are the maximum and minimum number of searches, other parameters can refer to the Help document, the results of the interpolation are as follows:
Well, we've done all the kriging tasks and we've got the interpolation graphs we need, but how do we know if our interpolation results are credible or not? Of course, if we have the actual data of the prediction point, we can evaluate the accuracy of the interpolation results, but in many cases, we do not have this data, cross validaion came into being. For kriging interpolation, the cross validation function provided by Gstat is KRIGE.CV (), and for this article, cross validation can be done with the following code:
kriging<-KRIGE.CV (formula = value~1, locations = ~x+y, model = VGM (61.39472, "Exp", 5326.663, ten), data = Data.observ Ed, nfold= Nrow (data.observed))
Head (kriging)
Note that the KRIGE.CV () function essentially carries out many kriging values, and then we can analyze the values and predictions of the known points that are taken out, estimating the trustworthiness of the kriging interpolation. Here we take a point each time, so the value of Nfold is set to the same number of rows as data.observed, you can see the result:
Var1.pred Var1.var observed residual zscore fold X Y1 19.020820 31.29921 12-7.020820-1.2549348 1-318466.5 37948412 11.220626 27.62193 10-1.220626-0.2322499 2-304466.5 37948413 10.758057 24.82611 9-1.758057-0.3528407 3-303466.5 37948414 9.200462 24.85426 one 1.799538 0.3609613 4-302466.5 37948415 10.840395 27.07824 7-3.840395-0.7380158 5-301466.5 37948416 12.446044 29.40208< c18/>14 1.553956 0.2865826 6-297466.5 3794841
The results include predicted values, variance of predicted values, known values, residuals, z-statistic values, etc., and we can plot the Z-statistic:
Ggplot (data = kriging, AES (x=x,y=y)) + Geom_raster (Aes (fill= Zscore)) + Scale_fill_gradient2 (name= "Zscore" , low = "green", mid = "Grey", high = "red", midpoint = 0, space = "RGB", Na.value = "grey50")
The results are as follows:
or its histogram to evaluate the interpolation scheme:
Par (Mar=c (2,2,2,2)) hist (Kriging$zscore)
For example, we can draw a similar conclusion: since most of the values are within the 1 standard deviation range, we can make the assumption that the interpolation scheme is feasible in the prediction area because the interpolation scheme is feasible.
Finally, you are welcome to leave a message to discuss, reproduced please indicate the source .
Reference documents:
[1] http://desktop.arcgis.com/en/arcmap/10.3/tools/3d-analyst-toolbox/how-kriging-works.htm
about R Basic Raster data processing I recommend a blog post: http://rstudio-pubs-static.s3.amazonaws.com/1057_1f7e9ac569644689b7e4de78c1fece90.html
Analysis of r:kriging interpolation and cross validation kriging interpolation and crossover verification