13.4.2.2 formatting data from the World Bank
We declare the Readvalues function, read the value from the XML document, and the last argument is the parse function, which is used to convert each data point to the value of the appropriate type. The array we downloaded contains the area of three datasets in square kilometers, and three datasets for forest coverage. Listing 13.16 shows the conversion of the original document into a data structure from which important information can be easily extracted.
Listing 13.16 converting raw data to a typed structure (F #)
LetAreas =Seq. Concat (data.[0..2]) [1] |> ReadValues( FunA-float (a) *1.0<km^2>) [2] |>Map. ofSeq[3] LetForests =Seq. Concat (data.[3..5]) |> ReadValues( FunA-float (a) *1.0<percent>) |>Map. ofSeq
Before pipeline processing, the data in all pages representing the first indicator are connected together [1], then each value is converted from a string to a value of square kilometers [2], and then the data generation map (map) [3]. The second command, which deals with forest coverage, is similar to this one.
The main part of data processing uses pipeline operations, which is a new feature that we haven't covered yet, and it takes the first three elements from the data set. This is called slicing (slicing), syntax data. [0..2] Generates a sequence that contains an array entry that is indexed from 0 to 2 [1]. A sequence that is returned with a seq.concat connection so that a sequence can be obtained that contains all the year data. The next step in the pipeline operation is to read the values and convert them to the appropriate type with unit of measure [2]. This outer conversion becomes the simplest part of a simple lambda expression! Note that the World Bank uses points as separators, so numbers are like 1.0. The built-in float function always uses a fixed locale, so on any system it can parse the string correctly [not knowing what that means].
We use the MAP.OFSEQ function to generate F # mapping types from data [3]. This function parameter is a sequence that contains a tuple, the first element is the key, and the second element is the value. In Listing 13.16, the type of the key is int * string, which contains the year and region names. The first case value is of type float
13.4.2.2 formatting data from the World Bank