Working with NC data using Python

Source: Internet
Author: User

Objective

These two days to help a friend to deal with some NC data, this thought very simple things, did not expect to involve a lot of details and pits, whether it is "difficult to understand the easy" or "easier" can not fully explain the problem, or "know the line of unity" to more reliable, both know the theory and know how to achieve, This was then summed up in a less-than-adequate study to document how NC data was processed using Python.

First, NC Data introduction

The NC Full name NetCDF (the Network Common Data Form), which can be used to store a series of arrays, is that simple (refer to https://www.unidata.ucar.edu/software/netcdf/docs/ netcdf_introduction.html).

Since NC can be used in a series of arrays, it is often used to store scientific observational data, preferably long-time series.

Imagine a scientist every minute to collect the experimental data and stored down, if not stored in this format, long time may need to create a series of CSV or TXT, and the use of NC a file can be done, is not very convenient.

More convenient is if the scientific experiment with meteorological, hydrological, temperature and other geographic information slightly edge, can also be stored with NC, GeoTiff can save more than a few bands (where the band can be considered as meteorological, hydrological and other different signals), and NC can store different bands of long-time observation results, is not very convenient.

You can use Gdal to view data information and perform:

gdalinfo name.nc

You can get the following information:

Driver: Netcdf/network Common Data FormatFiles: TEST.NCSizeis 512, 512coordinateSystem is`'subdatasets:subdataset_1_name=netcdf: "test.nc": T2subdataset_1_desc=[696x130x120] T2 (32-bit floating-point)subdataset_2_name=netcdf: "Test.nc":P SFCsubdataset_2_desc=[696x130x120] PSFC (32-bit floating-point)subdataset_3_name=netcdf: "test.nc": Q2subdataset_3_desc=[696x130x120] Q2 (32-bit floating-point)subdataset_4_name=netcdf: "test.nc": U10subdataset_4_desc=[696x130x120] U10 (32-bit floating-point)subdataset_5_name=netcdf: "test.nc": V10subdataset_5_desc=[696x130x120] V10 (32-bit floating-point)subdataset_6_name=netcdf: "test.nc": Raincsubdataset_6_desc=[696x130x120] Rainc (32-bit floating-point)subdataset_7_name=netcdf: "test.nc": Swdownsubdataset_7_desc=[696x130x120] Swdown (32-bit floating-point)subdataset_8_name=netcdf: "test.nc": GLWsubdataset_8_desc=[696x130x120] GLW (32-bit floating-point)subdataset_9_name=netcdf: "test.nc": LATsubdataset_9_desc=[130x120] LAT (32-bit floating-point)subdataset_10_name=netcdf: "test.nc": LONGsubdataset_10_desc=[130x120] LONG (32-bit floating-point)Corner Coordinates:Upper Left (0.0, 0.0)Lower Left (0.0, 512.0)Upper Right (512.0, 0.0)Lower Right (512.0, 512.0)Center (256.0, 256.0)

Each subdataset represents a format of data (meteorology, hydrology, and so on), and if you want to see specific information about this subdataset, you can do the following:

gdalinfo NETCDF:name.nc:SUBDATASET_NAME

Here the subdataset_name for the above T2, PSFC and so on, you can get the following information:

Driver: Netcdf/network Common Data FormatFiles: TEST.NCSizeis 120, 130coordinateSystem is`'Metadata:Lat#description=latitude, South is negativelat#fieldtype=104Lat#memoryorder=xylat#stagger=Lat#units=degree_northCorner Coordinates:Upper Left (0.0, 0.0)Lower Left (0.0, 130.0)Upper Right (120.0, 0.0)Lower Right (120.0, 130.0)Center (60.0, 65.0)Band 1 block=120x1 type=float32, colorinterp=undefinedNoData value=9.96920996838686905e+36Unit Type:degree_northMetadata:Description=latitude, South is negativefieldtype=104Memoryorder=xyNetcdf_varname=latstagger=Units=degree_north

There is only one Band, and each Band records a record of a point in time (or other distinguishing form), which is an array.

So see here, you should already understand, you can directly use GDAL processing NC data, such as directly using Gdalwarp to convert a subdataset to GeoTiff, etc., here for the moment, you just need to check the Gdalwarp manual to know how to deal with.

Understand that the above information is basically clear how to deal with this data.

Second, data processing

Python is very widely used, the nature of its various types of library is very rich, professional point of view is called ecological rich.

2.1 NetCDF4

This framework can read the NC directly into the array (details refer to Https://github.com/Unidata/netcdf4-python). Read in the following way:

= netCDF4.Dataset('name.nc')  # open the dataset

This allows you to read the data information in the entire NC, and if you need to get a subdataset, just use it dataset[SUBDATASET_NAME] , and return a three-dimensional array that represents the data information for different time periods (or other differentiated methods).

We can do a variety of operations on this array, such as averaging, variance, and so on, reminds me of the University of a bunch of boring but interesting experimental courses. Of course, if you use the NumPy framework for processing, you will have a multiplier effect, such as averaging in a long time series:

== np.average(np_arr, axis=0)

Here, the comrades involved in the letter will be able to see a problem, this framework can only process the data, not the location-related operations, which makes the data can not become a straightforward map visualization. In fact, any data are interlinked, we can use this way to GeoTiff after processing, of course, we can also directly use the GeoTiff process to deal with.

2.2 Rasterio

Rasterio is a Mapbox open-source spatial data processing framework that is very powerful and does not detail here, only how to handle our NC data.

Of course the first way is to use this framework to write to GeoTiff after using netCDF4, but this is less graceful and uses two frames, which is obviously too cumbersome, and we use this framework directly from the reading data to start processing.

There are tricks to read here, like using Gdalinfo read Subdataset to read this subdataset data directly, as follows:

with rio.open('NETCDF:name.nc:SUBDATASET_NAME'as src:    print(src.meta)    =int(src.meta['count'])    src.read(range(1+1))

That is, the open function is passed in NETCDF:name.nc:SUBDATASET_NAME , with the src.read(range(1, dim + 1)) ability to directly read out all Band (time points) in this range of information, the scope can be set by itself, note that starting from 0, of course, you can read only one Band information.

Src.meta records the metadata information for this subdataset, which is essentially the same as Gdalinfo sees.

This allows us to continue processing this data using frameworks such as numpy and, more importantly, to write to GeoTiff when it is finished (to put it bluntly, to add spatial information).

Also very simple, as follows:

with rio.open'w'**as dst:    dst.write_band(1, res_arr)

NewFile is the storage path, Res_arr is an array of computed results, note that the dimensions do not change (width*height), Out_meta as the target file metadata description information, can be directly above the Src.meta for simple processing.

=     meta.update({"driver""GTiff",                 "dtype""float32",                 'count'1,                 'crs''Proj4: +proj=longlat +datum=WGS84 +no_defs',                 'transform': rasterio.transform.from_bounds(west, south, east, north, width, height)                })

CRS represents the target data space projection information, transform represents the target file space range information, can be computed by latitude information and image size.

dst.write_bandThe data is written to the corresponding band, which can also be written to multiple bands, depending on the result of the calculation, as well as starting from 1.

Iii. Summary

This paper briefly introduces the characteristics of NC data and how to use Python to process NC data. Each goal has a number of ways to achieve, it is important to find that they like and suitable for their own road, however, then say back, even if the road is not the one you want, not the same can achieve the goal! So the key is to find your own goals.

Working with NC data using Python

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.