This topic introduces a generalized workflow for Geostatistical studies and the main steps. As described in what is geostatistical, Geostatistical is a statistical class used to analyze and predict values associated with spatial phenomena or spatiotemporal phenomena. ArcGIS geostatistical Analyst provides a set of tools that allow you to build models that use space (and time) coordinates. These models can be applied to a variety of situations and are typically used to generate predictions for non-sampled locations, and can be used to generate measures of uncertainty for these predictions.
As with almost any data-driven study, the first step is to examine the data carefully. This typically starts with mapping datasets, using classification and color schemes that make it clear that the data set might display important features, such as a significant increase (trend) of values from north to south, a mixture of low and high values in a non-specific arrangement (perhaps a symbol that obtains data at a scale that does not display spatial correlations); Or a more densely sampled area (priority sampling), this area might allow you to decide to use the clustering weights in your data analysis. The second step is to build a geostatistical model. This process requires several steps, depending on the purpose of the study (the type of information the model needs to provide) and the dataset elements that are considered important enough to be included. At this stage, the information gathered during the rigorous exploration of the data set and prior knowledge of the phenomenon will determine the complexity of the model and the accuracy of the interpolated values, as well as the accuracy of the uncertainty measures. In, the build model involves preprocessing the data to remove the spatial trend, which is modeled separately and re-added back in the final step of the interpolation process, transforming the data so that it adheres more closely to the Gaussian distribution (some methods and model outputs are required), and aggregates the data sets to compensate for the priority sampling. Because checking a dataset can get a lot of information, it's important to include a perception of the phenomenon that you might have. Modelers cannot rely solely on datasets to display all of the important elements; You can still reflect the expected results by adjusting the parameter values to include those features that are not displayed in the model. It is important that the model be as realistic as possible so that interpolated values and related uncertainties can accurately represent actual phenomena.
In addition to preprocessing data, you might also need to model spatial structures in a dataset (spatial dependencies). Some methods, such as kriging, need to be explicitly modeled using Semivariogram and covariance functions, while other methods, such as "inverse distance weights", depend on the hypothetical spatial structure, which must be provided by the modeler based on a priori knowledge of the phenomenon.
The final component of the model is the search strategy. It defines the number of data points that are used to generate values for the non-sampled locations. You can also define its spatial configuration (relative to another location and the location of the non-sampled location). Both of these factors affect the interpolated values and their associated uncertainties. For many methods, the search ellipse is defined along with the number of slices that the ellipse splits into and the number of points obtained from each partition for prediction.
After the model definition is complete, the model can be used in conjunction with the dataset to generate interpolated values for all the non-sampled locations within the area of interest. The output is typically a map that displays the values of the variables being modeled. You can study the effect of outliers in this step, because it may change the parameter values of the model, thereby changing the interpolated map. Depending on the interpolation method, the same model can also be used to generate the measured values of the interpolated values in the uncertainties. Not all models have this capability, so it is important to define the required uncertainty measures at the outset. As with all modeling attempts, the output of the model should be checked to ensure that the measurements of interpolated values and related uncertainties are reasonable and match the expectations.
Once the model has been built and adjusted and its output checked, the results can be used for risk analysis and decision making.
ArcGIS Tutorial: geostatistical workflow