Geostatistical Analysis Notes (i) Exploration data

Source: Internet
Author: User

Before performing geostatistical analysis, it is critical to browse, familiarize, and examine your data. Drawing and examining data is a necessary phase in the process of geostatistical analysis, and we can obtain some prior knowledge from these tasks to guide the follow-up work.


Stage 1 Plotting data

By drawing data from the layer rendering scheme in ArcMap, we can get a first impression of the data.

For example, use single symbol rendering to understand the density distribution of sample points, and to understand the distribution of low values of high values for sample points through categorical rendering, and so on.


Stage 2 Check Data

After plotting the data, use the Exploratory spatial data analysis (exploratory spatial database ANALYSIS,ESDA) tool to perform the second phase of data exploration. These tools provide a more quantitative way to examine data than drawing data, helping us to better understand what is being researched and help us make more informed decisions about how the interpolation model is built.

ESDA tools include:


Does the Ⅰ obey the normal distribution? histogram/histogram

The histogram is used to display the frequency distribution of the data set of interest and to calculate the aggregated statistics, how to interpret the graphs and statistical information?

    • If the data obeys a normal distribution, the mean (mean) is similar to the median (median), the skewness (skewness) should be close to 0, and the kurtosis (kurtosis) should be close to 3.
      • The average value is the arithmetic mean of the data. The average provides the measured value of the distribution center. The median is corresponding to the cumulative scale of 0.5. If the data is in ascending order, the value of 50% is below the median value, and the value of 50% is above the median value. The median value provides another measure of the distribution center. The first and third sub-digits correspond to the cumulative ratio of 0.25 and 0.75, respectively. If the data is in ascending order, the value of 25% is below the first, and the value of 25% is above the third-digit number. The first and third are the special cases of the number of sub-digits.
      • The skewness coefficient is the measured value of the distribution symmetry. For symmetrical distributions, the skewness factor is zero. If the distribution has a long large right tail, then a positive partial distribution, or a negative partial distribution if the distribution has a long small left tail. For positive partial distributions, the average value is greater than the median value, and the mean value is less than the median value for the negative partial distribution.
      • Kurtosis depends on the size of the distribution tail and provides a measure of the likelihood of the distribution generating outliers. The kurtosis of the normal distribution is equal to three. The distribution with thicker tails is called peak State and its kurtosis is greater than three. A distribution with a thinner tail is called a low peak state, and its kurtosis value is less than three.
    • The variance of the data, which is usually sensitive to high or low values. The standard deviation is the square root of the variance, which describes the degree of dispersion of the data around the mean. The smaller the variance and the standard deviation, the more closely the measured values are clustered relative to the average.


Normal qqplots/normal QQ graph

The point on the normal QQ graph indicates the normality of the univariate distribution of the data set. If the data is normally distributed, the point will fall on the 45-degree reference line. If the data is not normally distributed, the points will deviate from the guide line.


General qqplots/Common QQ Map

The common QQ graph is used to evaluate the similarity of the distributions of two datasets. The creation of these graphs is similar to the process of the normal QQ graph, except that the second data set is not necessarily subject to a normal distribution and can be used with any dataset. If two datasets have the same distribution, the points in the normal QQ graph will fall on a 45-degree line.



# # # About data transformations

Some interpolation methods in geostatistical Analyst require that the data be normally distributed. If the data is skewed (unevenly distributed), you may need to transform the data into a normal distribution.

Box-cox Transformation (also called power transformation)
If the count value is small in a certain part of the study area, the variability of this area is less than that of the other region with a larger count. In this case, the square root transformation will help to make the variance in the entire study area more constant , and usually the data will be normally distributed. The square root transformation is the λ= in the Box-cox transformation? The special case.

Logarithmic Transformations
The logarithmic transformation is actually a special case of λ= 0 o'clock in the Box-cox transformation. Logarithmic transformations are typically used for data that is positively skewed . Some of these values are very large, and if these large values are in the study area, logarithmic transformations help to make the variance more constant and normalized data.

For example the data is distributed as follows:

Contrast before and after transformations:

anyway Chord transform
The inverse chord transform can be used to represent proportions or percentages of data . Usually when the data is in proportional form, the variance is close to 0 and 1 o'clock and the maximum is close to 0.5. The inverse chord transformation helps to make the variance in the entire study area more constant, and usually causes the data to be normally distributed.


Ⅱ are there outliers?

A global outlier is a measured sample point that has a very high or very low value relative to all values in the dataset.
A local outlier is a measured sample point whose value is within the range of the normal field of the entire dataset, but its value appears unusually high or abnormally low when viewed around a point.

If outliers are real anomalies in the phenomenon, then this may be the most important point in the study and understanding of phenomena. If the exception value is caused by an error in the data entry process, it should be corrected or removed before the surface is created.

histogram/histogram

If you can see an orphaned bar at the leftmost (minimum) or rightmost (maximum) of the histogram, it may indicate that the point represented by this bar is an outlier. The more isolated this bar is from the histogram's main bar group, the greater the probability that the point is outliers.


Voronoi Chart

The Voronoi diagram is a Tyson polygon map formed by sampling points.

When viewing a Voronoi chart, check to see if there is a significant difference in the color of the polygon symbol in the vicinity.

For example, the red face is significantly different from the surrounding value.


Semivariogram/covariance Cloud/semivariogram/co-crosscovariance

The Semivariogram/Covariance cloud tool can be used to examine the local characteristics of spatial autocorrelation in a dataset and to find local outlier values.

Each point in the cloud represents a pair of points in the dataset, the X-axis represents the distance between locations, and the y-axis represents the squared difference between the values in those locations. Each point in the semivariogram represents a position pair, not a single location on the map. So the number of points in the cloud increases rapidly as the number of point in the dataset increases. If there are n points in the dataset, then N (n-1)/2 points will be displayed in the Semivariogram/covariance crosscovariance. Therefore, it is not recommended to use datasets with more than thousands of points. If your dataset contains thousands of points, you should use the subset features tool to randomly select points and then use subsets in the Semivariogram/covariance cloud.

The Semivariogram/Covariance cloud tool is particularly useful for detecting local outliers. They appear to be close to each other (low values on the x-axis), but high on the y-axis, indicating a significant difference in the values of the two points that make up the point pair. This is the opposite of the desired result, that is, the points that are close to each other have similar values.


Is there a trend in Ⅲ? Trend Analyst/Trend analysis

The Trend analysis tool provides a three-dimensional perspective of the data. The location of the sample point is plotted on the X, Y plane, and the z value represents the attribute value of interest. The trend analysis tool projects scatter plots onto the x,z plane and y,z planes, fitting each projection with a polynomial curve.

Browse the thick lines on the vertical wall of the graphic. These lines represent trends. One trend line is along the x-axis (which usually represents a vertical trend) and the other shows a trend along the y-axis (which usually represents a latitude trend). If the curve passing through the projection point is flat, there is no trend, and if the polynomial curve has an exact pattern (such as the Blue and Green lines), this indicates that there is a trend in the data.

In addition, it is useful to change the order of the polynomial when checking for trends, and it is helpful to check for trends that are different from the standard n–s and e–w directions, and you can check the data by rotating the trend axis.


Ⅳ is spatial autocorrelation?

We can explore the spatial autocorrelation of data by examining the sampling data pairs of different locations, still using the ESDA tool Semivariogram Cloud mentioned previously.


Semivariogram/covariance Cloud/semivariogram/co-crosscovariance

If there is a spatial dependency, the closer point pair (the leftmost side of the x-axis) should have a small difference (smaller values on the y-axis). As the distances between the points become larger (the points move to the right on the x-axis), the square of the difference should also be increased (moving upward on the y-axis). Usually, the squared difference will remain the same after a certain distance. A position that exceeds this distance is considered irrelevant.

If the point pairs in the Semivariogram form a horizontal line, then there may be no spatial autocorrelation in the data, so interpolation of the data is meaningless.

The basic assumption of geostatistical method is that the square of the difference should be similar for any two positions which are close to each other's distance and direction. This kind of relationship is called stationarity . Spatial autocorrelation may depend only on the distance between two locations, which is called isotropic . If things are more similar in some direction than in other directions, that is the directionality effect in both semivariogram and covariance, it is called anisotropy.


Cross Association Crosscovariance

The cross-covariance cloud tool can be used to study cross-correlation between two datasets. Cross-crosscovariance shows empirical cross-covariance of all position pairs across two datasets and draws them as a function of distance between two locations, similar to the one above, which also provides a covariance surface with the search direction capability.


The first impression of the data, and with the ESDA tool to check the data, we have to deal with the data of the study has some prior knowledge, then the following can choose interpolation method to create the surface, the next article continues.

Geostatistical Analysis Notes (i) Exploration data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.