01 What is
data visualization
Data visualization mainly aims to convey and communicate information clearly and effectively by means of graphical means. In other words, the existence of visualization is to help us better transmit information.
02 The general process of
data visualization
First of all, we need to analyze our existing data, draw our own conclusions, and clarify the information and topics to be expressed (that is, what problems you want to illustrate through the chart). Then according to this purpose, select the chart that can meet your goal from the existing or the chart information library you know. Finally, I started to make charts, beautify and check the charts, until the final chart is completed.
One of the mistakes we are likely to make here is: first set the visualization effect we want to achieve, and then look for the corresponding data. This often leads to the misunderstanding: "The existing data can't make the visual effect envisaged in advance, or it is necessary to obtain more data if you want to make the ideal chart."
03 Common data types
For better visualization, we divide the data into four categories: categorical data, time series data, spatial data, and multivariate data.
1. Classification data
Categorical data is data indicating the category of things. For example, the user's equipment can be divided into two types: Iphone users and andorid users; payment methods can be divided into three types: Alipay, WeChat, and cash payment. The data obtained by such classification is called classification data.
2. Time series data
Time series data is also called time series data, which refers to the data series recorded in chronological order by the same unified index. Such as: the number of new users each month, the annual GMV of a company in the past ten years, etc. Data corresponding to indicators recorded in chronological order becomes time series data.
3. Spatial data
Spatial data refers to the data used to represent the location, shape, size and distribution characteristics of spatial entities. It can be used to describe the target from the real world. It has the characteristics of positioning, qualitative, time and spatial relationship.
Spatial data is a kind of data that uses basic spatial data structures such as points, lines, surfaces, and entities to represent the natural world on which people live.
4. Multivariate data
Data usually appears in the form of a table. There are multiple columns in the table. Each column represents a variable. This data is called multivariate data. Multivariate is often used to study the correlation between variables. That is, it is used to find out what factors affect a certain index.
04 Visualize what information you want to express
Express a certain conclusion (in which area of the users on the platform there are more users, who is the most authoritative person in the field of data analysis, and whether the GMV in 2016 increased or decreased compared with last year). Explain a certain phenomenon (whether the performance of students may be related to family background, whether the income of fresh graduates is related to the school they graduated from).
05 Choose a specific visualization form
After clarifying what information we need to use the chart to convey, we can start to choose the appropriate chart. Here we use the point of view put forward by the author of "The Beauty of Data", not to list specific charts such as bar charts and line charts, but to introduce some The parts that make up these charts. For example, a histogram is composed of a length and a rectangular coordinate system. We only need to select the required parts and combine them. Next, let's look at these parts in detail.
Data-based components include: visual cues, coordinate systems, rulers, background information, and any combination of the previous four forms.
1. Visual cues:
It means that you can connect with the consciousness in the subconscious by looking at the chart to get the consciousness expressed by the chart. Commonly used visual cues mainly include: position (position height), length (length), angle (size), direction (direction rise or fall), shape (different shapes represent different categories), area (area size), volume (volume size) ), saturation (the intensity of the hue is the depth of the color), hue (different colors).
2. Coordinate system:
The coordinate system here is the same as the coordinate system we learned in mathematics, but the meaning of the coordinate axis may be slightly different. Common types of coordinate systems are: Cartesian coordinate system, polar coordinate system and geographic coordinate system.
Everyone is familiar with Cartesian Coordinate System and Polar Coordinate System, here is the geographic coordinate system.
The geographic coordinate system uses a three-dimensional sphere to define the position of the earth's surface to realize a coordinate system that references points on the earth's surface through latitude and longitude. But when we visualize data, we generally use projection to transform it from three-dimensional data into two-dimensional plane graphics.
3. Ruler:
The three coordinate systems mentioned above just define the dimensions and directions of the displayed data, and the ruler is used to measure the size in different directions and dimensions. In fact, it is very similar to the familiar scale.
4. Background information:
The background here and the background we have learned in the language is a concept. It is to illustrate the relevant information of the data (who, what, when, where, why), to make the data clearer and easier for readers to understand.
5. Combined components:
The combination component is to combine the above four kinds of information according to the target purpose, which is the chart style we will present at the end. The specific combination depends on your goal.