ArcGIS Tutorial: Generating feature files, classes, and cluster analysis

Source: Internet
Author: User

With the ArcGIS Spatial Analyst extension, you can create clusters by grouping raster cells into classes or clusters. A class usually refers to a known category, such as a forest, a residential area, or a water body, while clustering is a cell grouping based on the statistics of the cell attribute. A feature is a subset of cells that represent classes or clusters. The statistics for the feature are stored in a signature file that will be used to classify all cells that are in the intersection of the input bands.

  What is a class?

A class corresponds to a meaningful grouping of locations. For example, forests, waters and wheat-producing areas are all classes.

Each location can have a value set or a value vector, a value corresponding to each variable, or an input band as an attribute. Each location can be displayed as a point in a multidimensional attribute space that corresponds to a variable in the input band of an axis. The grouping of points in this multidimensional attribute space is called clustering. In this case, because the cluster references some meaningful objects, it can also be treated as a class. If the properties of two locations (vector of band values) are similar, the two locations will belong to the same cluster.

If classes can be delimited or differentiated by their property values, the known classes can also form clusters in the attribute space. You can interpret the location of the attribute space corresponding to the natural cluster as a naturally occurring class of strata.

  Identify the classes used to supervise the classification

In the supervised classification, you should be aware of which classes to divide the research site into, and there are sample locations representing each class in the research site. For example, if you are creating a land-use map based on satellite imagery, you can divide the map into the following categories: Urban, water, forest, wilderness and road. The purpose of this is to assign each location within the study area to a known class. The more sample locations that can be identified as belonging to a class, the more similar the cell values in the class, the better the resulting classification results will be. The actual location that will be used to determine the location of a known class is called a training sample.

You can identify a training sample on a polygon layer or grid. When defining a training sample, you can identify an existing raster as a reference. Typically, the color composition of the first three layers in the raster is displayed as a background and used as a reference for identifying areas to be delineated when the training sample is generated.

  Creating clusters during unsupervised classification

The first step in the unsupervised classification process is to create clusters. From a statistical point of view, clustering is a naturally generated grouping in data. The Iso Clustering tool requires the input raster bands, the number of classes, the name of the output signature file, the number of iterations, the size of the small class, and the time interval to reference when extracting the sample points upon which the cluster is computed (the last three parameters will be described below).

This tool returns a signature file that contains multivariate statistics about the subset of cells of the cluster being identified. The results can be used to determine the relationship between the cell location and the cluster, the average value of the cluster, and the variance covariance matrix. This type of information is stored in an ASCII signature file. The signature file is essential when clustering and classifying the remaining non-sampled cells.

  Storage class or cluster statistics: Signature files

A signature file is an ASCII file that stores multivariate statistical information for each class or cluster of interest. The file includes the average of each class or cluster, the number of cells in the class or cluster, and the variance covariance matrix of the class or cluster.

You can use any text editor to display the signature file.

For any class or cluster, the diagonal value in the variance covariance matrix that moves from the upper-left corner to the lower-right corner is the variance value of the variable that corresponds to the specific input raster band (determined by the row/column intersection in the band Matrix). All other values in this variance covariance matrix are covariance values.

  How to determine clustering for unsupervised classification

An algorithm named Iso Cluster is used when clustering is created during unsupervised classification. The prefix Iso for the Isodata clustering algorithm represents an iterative self-organization (ISO), which is a method for performing clustering operations. Clustering is calculated from a subset of the cells in the study area. All cluster calculations are performed on the cell values in the multivariate attribute space, not on any spatial characteristics. That is, the average value is calculated based on the attribute values of the different input bands. The variance and covariance values are calculated based on the variance within the band and between the two bands.

The following example uses the K-means or ISO clustering method. This method is theoretically elaborated using a dual-band raster. This method is valid for all bands entered or in n-dimensional space. To better understand the ISO clustering approach, the following is conceptually illustrated.

    • Creates an empty diagram using the range of values within the first band plotted on the x-axis and the range of values drawn within the second band on the y-axis.
    • Draws a 45-degree line and divides it into segments with the same number of segments as the number of classes specified. The center point of each segment is the initial mean of the class.

  

    • Each sample cell is plotted on the diagram and then the distance between the point and each average center point on the 45-degree line is determined. In the attribute space, the distance is calculated using the Pythagorean theorem. Assigns a sample point to a cluster represented by the nearest average center point.

  

    • Draw the next sample point and repeat the above steps for all sample points.

  

    • The above procedure will be iterated. The new mean center points for each cluster will be computed based on the cell location values currently assigned to the clusters in the previous iteration before the next iteration. Repeat the first two steps using the new average center point for each cluster.

  

    • Update the average, and then repeat the previous steps. The iterative process of updating the mean continues until the user-defined iteration number is reached, or until less than 2% of the cells are converted from one cluster to another cluster that is related to the new mean in the iteration.

The clustering process is sensitive to the range of values within each band. This value range is used to determine the values on the X and Y axes on which to calculate the Euclidean distance between the mean point and the sample point. To make the number of attributes in each band approximately the same, whether it is a supervised classification or an unsupervised classification, the range of values for each band should be similar. When the value range of one band is smaller relative to other bands, the Euclidean distance in the multivariate space may be too small to cause the average of several clusters to be equal to 0. If the average of any one cluster is 0, the final classification and any other multivariate analysis tools based on the signature file will fail. Ideally, all bands should be normalized to the same range of values.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

ArcGIS Tutorial: Generating feature files, classes, and cluster analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.