Visual Framework Design-data type, visual framework data type
Data Type
- Data Classification Method
- Design of Data Types-Measurement
- More
Introduction
The essence of data visualization is to map data to graphs. Different data types have different graphic attributes. This chapter describes how to design data classification and G2. Graphic attributes are explained in the next chapter on visual channels.
Data Type
The data type can be categorized in two ways:
* Natural data classification
* Whether the data is continuous
Natural Data Classification
Data types can be divided:
* Term: a common term that does not care about the sequence, such as the name of a country.
* Ordered: ordered classification, such as alarm information, yellow warning, orange warning, and red warning from low to high.
* Interval: Number with an interval. The value 0 is not considered. For example, temperature. 0 degrees does not mean no temperature.
* Proportion: There is a proportional relationship between fields. 0 must be meaningful.
Data continuity
Based on whether the data is continuously divided:
+ Classification (qualitative) data is classified into ordered and unordered categories.
+ Continuous (quantitative) data, continuous and continuous values, and time is also a continuous data type
First, let's take a look at the data below the pipeline:
[{"Month": "January", "temperature": 7, "city": "tokyo" },{ "month": "February", "temperature": 6.9, "city": "newYork" },{ "month": "March", "temperature": 9.5, "city": "tokyo" },{ "month ": "April", "temperature": 14.5, "city": "tokyo" },{ "month": "May", "temperature": 18.2, "city ": "berlin"}]
Where: month represents the month, temperature represents the temperature, and city represents the city
- In the above data
month
Andcity
They are all discrete classes, but they are different.month
Is an ordered classification type, whilecity
Is an unordered category.
- Temperature is a continuous number
How to design data types-metrics
In G2, we divide data types based on whether or not they are continuous. Each type is designed with a different Scale to achieve the following functions:
Therefore, each measurement must contain the following information:
- Definition domain (domain), which refers to the minimum and maximum values of various categories and continuous measurements.
- Range: maps classification and continuous data to the range. The default value is 0-1.
- Coordinate point (ticks) is used to display on the legend or coordinate axis. For classification measurements, coordinate points are classification types; Continuous Data Types, it is necessary to calculate friendly coordinate points and friendly coordinate spacing for people, for example:
- 1, 2, 3, 4, 5
- 0, 5, 10, 15, 20
- 0.001, 0.005, 0.010
Instead:
- 1.1, 2.1, 3.1, 4.1
- 12, 22, 32, 42, 52
Supported metrics
G2 provides the following metrics:
- Identity: a constant value, that is, a field of the data is a constant;
- Linear, continuous number [1, 2, 3, 4, 5];
- Cat, category, ['male', 'female];
- Time, continuous time type;
- Log: continuous non-linear Log data converts [,] to [,];
- Pow: continuous nonlinear pow data converts [2, 4, 8, 16, 32] to [1, 2, 3, 4, 5];
- TimeCat, non-consecutive time. For example, the stock time does not include the weekend or the days when the disk is not started.
Attribute and interface design
Common attributes of a metric:
Attribute name |
Description |
Type |
Measurement type |
Range |
Value range for Metric Conversion. The default value is []. |
Alias |
Alias. The names of fields in most data sets are in English. Sometimes Chinese names need to be defined to facilitate display on legends and prompt information. |
Ticks |
The supported coordinate points can be displayed on the legend and coordinate axis. The calculation of coordinate points is described in detail later. |
TickCount |
The number of coordinate points. The default values of different metric types are different. |
Formatter |
The formatting function used to output fields affects the display of data on the coordinate axis, legend, and tooltip. |
Linear
A base class of continuous data types, including the following special attributes
Attribute name |
Description |
Min |
Minimum value of the defined domain |
Max |
Maximum Value of the defined domain |
TickCount |
Continuous Measurement. By default, the number of coordinate points generated is 5. |
TickInterval |
It is used to specify the distance between each scale point of the coordinate axis, which is the gap difference between the original data. tickCount and tickInterval cannot be declared at the same time. |
Nice |
Whether to adjust min and max Based on the friendliness of people to digital recognition. For example, min: 3, max: 97. If nice: true, it is automatically adjusted to min: 0, max: 100. |
Cat
Special attributes of a category Metric
Attribute name |
Description |
Values |
CATEGORY value of the current Field |
When creating a chart in G2, the values field is generally automatically obtained from the data, but in the following two cases, you need to manually specify
When you need to specify the order of classification, for example, the type field has three types: 'Max', 'minimal ', and 'moderate, when we want to specify the order of these categories on the coordinate axis or legend:
[{A: 'a1', B: 'b1 ', type: 'minimal'}, {a: 'a2', B: 'b2', type: 'Max'}, {a: 'a3 ', B: 'b3', type: 'moderate'}] var defs = {'type': {type: 'cat ', values: ['minimal ', 'moderate', 'maximum ']};
If the values field of the measurement is not declared, the default sequence is: 'minimal', 'Max', and 'moderate'
If the classification Type in the data is represented by enumeration, you also need to specify values
[{A: 'a1', B: 'b1 ', type: 0}, {a: 'a2', B: 'b2', type: 2}, {: 'a3 ', B: 'b3', type: 1}] var defs = {'type': {type: 'cat', values: ['minimal ', 'moderate ', 'maximum']};
The 'cat' type must be specified. The value of values is one-to-one correspondence between the index and enumeration type.
Time
The time type is a special continuous value. Therefore, we define the measurement of the time type as a subclass of linear. Besides supporting all common attributes and linear measurement attributes, there are also their own special attributes:
Attribute name |
Description |
Mask |
Default format of data format: 'yyyy-mm-dd' |
Currently, two types of time are supported:
- The number format of the timestamp. It is 1436237115500 // new Date (). getTime ()
- Time string: '2017-03-01 ', '2017-03-01 12:01:40', '2017/05 ', '2017-03-01T16: 00: 00.000Z'
The placeholder of the mask when formatting A Date:
- Y: year
- M: month
- D: date
- H: hour
- M: minute
- S: second
Log
Log-type data can map a very large range of data to a uniform range. This measurement is a subclass of linear and supports all common attributes and linear measurement attributes, unique attributes:
Attribute name |
Description |
Base |
Log base. The default value is 2. |
We recommend that you use log measurement in the following scenarios:
- In a scatter chart, data is widely distributed across several intervals. For example, if the data is distributed between 0-100,100 00-100000 and 1 million-0.1 billion, the log measurement is applicable.
- When the heat map is used, the color is near a very high data point when the data distribution is uneven. In this case, the log measurement must be used to process the data.
Pow
Data of the pow type is also a subclass of the linear type. In addition to all common attributes and attributes of the linear measurement, pow data also has its own attributes:
Attribute name |
Description |
Exponent |
Index. The default value is 2. |
TimeCat
TimeCat data is a type of date data, but not a consecutive date. For example, it indicates the date when a stock transaction exists. If the time type is used, there is no data on holidays, and the line chart and line chart will break. Therefore, the timeCat metric is used to represent the date of the category, data is sorted by default.
Attribute name |
Description |
TickCount |
Set the number of coordinate points. |
Mask |
Format of data |
Calculation of measurement coordinate points
When the metric information needs to be displayed on the legend and coordinate axis, it is impossible to display all the data. Therefore, we need to select some representative data to be displayed on the legend and coordinate axis.ticks
(Coordinate point), different types of Measurement Calculation ticks (coordinate point) algorithms are different. Here we provide three types of measurement ticks (coordinate point) calculation:
- Classification metrics, including cat and timeCat
- Continuous Type measurement, including linear, log, and pow
- Time type measurement, including time
Calculation of classification metrics
Generally, you do not need to calculate ticks for a classification metric. You can directly display all the categories on the legend and coordinate axis.
However, when there are too many numeric values for the classification Type and there is a sequential relationship between the classes, you can omit some categories, for example:
Attributes to be used during computing:
- Values: The classification value of the current metric. If not specified, it is extracted from the data source directly.
- TickCount: Reserved coordinate points
The ticks Calculation of classification is very simple.
- Evenly retrieve tickCount coordinate points from values
- To ensure that the first and last values of values are in ticks, the interval between values is (values. length-1)/(tickCount-1), ensure that the first and last values of values are in ticks
Division scenario:
Var values = ["week 1", "Week 2", "week 3", "Week 4", "week 5", "week 6", "week 7 ", "week 8", "week 9"]; var tickCount = 5; // due to values. length = 9; // average interval step = (9-1)/(5-1) = 2; var ticks = ["first week", "third week ", "week 5", "week 7", "week 9"]
Division not allowed:
Var values = ["week 1", "Week 2", "week 3", "Week 4", "week 5"]; var tickCount = 4; // because of values. length = 5; // average interval step = (5-1)/(4-1) = 4/3; integer step = 1; // discard the fourth week var ticks = ["first week", "Second week", "third week", "fifth week"]
Calculation of Continuous Data measurements
The coordinate points for calculating continuous data types must consider the following issues:
- The coordinate points must be nice numbers, which cannot be calculated evenly.
For example, min: 3, max: 97, tickCount: 6. If the average partition is used, ticks: [3, 21.8, 40.6, 59.4, 78.2, 97] is generated. our ideal method is ticks: [0, 20, 40, 60, 80,100]
- The calculated value range is uncertain. It may be 0,100,100 0, 0.01, or 0.02.
The coordinate points of continuous data are calculated as follows:
- Specify an approximation array [1, 2, 5, 10] to calculate the user-friendly tickInterval.
- Calculate tickInterval based on the input min, max, and tickCount, convert the value of tickInterval to 0-10, and retain the conversion coefficient, such as min: 0, max: 9003, tickCount = 4, n, then the calculated tickInterval = 3001 is changed to 3.001, the coefficient is 1000, and then an approximate value is found in the approximation array, taking 3.001 as an example.
- You can select an upward approximation (the final number of coordinate points is smaller than tickCount) to obtain the approximate value 5.
- Or select downward approximation (the final number of coordinate points is greater than tickCount) to obtain the approximate value 2.
- Or rounding (there may be more, there may be less), and an approximate value of 5 is obtained.
- Multiply the obtained approximate value by the obtained coefficient 1000,
- If the approximate value is 5, tickInterval = 1000*5 = 5000, take min, and max as the multiples of tickInterval, and finally calculate the ticks: [0, 5000,100 00]
- If the approximate value is 2, tickInterval = 1000*2 = 2000, take min, and max as the multiples of tickInterval, and finally calculate the ticks: [0, 2000,400 0, 6000,800 0, 1000]
The pseudocode is as follows:
Var snapArray = [0, 2, 5, 10]; var min = 0; var max = 9003; var tickCount = 4; var tickInterval = (max-min) /(tickCount-1); // 3001; var factor = getFactor (tickInterval) // 1000. If value> 10, divide the value by 10 until Division 1 <value <10, if the value is <1, the value is multiplied by 10 until 1 <value <10var snapValue = snap (snapArray, tickInterval/factor, 'ceil '); // The value is rounded up, approximate value: 5var tickInterval = snapValue * factor; var min = snapMultiple (tickInterval, min, 'floor ') // returns an integer multiple of tickInterval, 0var max = snapMultiple (tickInterval, max, 'ceil ') // returns the integer multiple of tickInterval to the top, 15000var ticks = []; for (var I = min; I <= max; I + = tickInterval) {ticks. push (I);} return ticks;
Notes
* SnapArray can be adjusted. The larger the internal values of the array, the smaller the spacing, and the smaller the gap between the calculated tickCount and the expected values.
* Min. The value must be a multiple of tickInterval, max, and a multiple of tickInterval.
Coordinate point for Time Measurement Calculation
The time type data is continuous data, but it is not suitable for the calculation of Continuous Data measurements because:
- The timestamp value is relatively large and contains millisecond information. The time taken to format a person-friendly value is not necessarily friendly to people, for example, 1466677570000, Which is '2017 18:26:10'
- For a dataset with a date interval greater than the month and a year, a fixed tickInterval cannot be obtained because the interval between the month and the year is not equal.
Therefore, you must use your own algorithm to measure the time type. The algorithm is as follows:
- Calculate tickInterval Based on min, max, and tickCount;
- Calculate the ratio of tickInterval to one year. yfactor = interval/yms (the number of milliseconds in a year)
- If yfactor> 0.51, that is, the time interval is greater than half a year, take the year of min and max and calculate ticks by year.
For example, min:, max:, tickCount = 6, ticks = [2001-01-01,2004-01-01 2007-01-01,2010-01-01,201, 2016-01-01]
- If 0.0834 <yfactor <0.51, and the interval is greater than one month, ticks is calculated on a monthly basis.
For example, min:, max:, tickCount = 5, ticks = [,]
- If the interval is greater than one day, it is calculated by a multiple of days. If the interval is greater than one hour, it is calculated by a multiple of hours .... Then ticks is calculated in minutes, seconds, and milliseconds.
Note:
- The value of tickCount cannot determine the number of ticks.
- Make sure that the first value of the calculated ticks is smaller than min, and the last value is greater than max.
More
This chapter describes how to design data classification for data classification and G2, and provides the calculation of coordinate points (ticks). The legends and the texts displayed on the coordinate axes are all determined by the measurements described in this chapter, the next section describes Visual channels and the relationship between visual channels and data classification.
G2 site: https://g2.alipay.com/