Seven Secrets of data visualization experts

Source: Internet
Author: User

The road to data visualization is full of invisible traps and mazes, and recently clearstory Data's two-bit visualization developers shared the 7 non-secrets of their data visualization development, and the average developer knows how to improve their horizons and make less detours.

The era of data visualization, especially web-based data visualization, has arrived. JavaScript-like visual libraries such as D3.js, Raphaël, and Paper.js, as well as the latest browsers support such as Canvas and SVG, and the complexity of visualizations that have only been developed by computer experts and professional designers in the past are becoming more and more simple.

Data visualization is now a must-have feature for many web site projects. Startups like Platfora, datameerclearstory data and Chartio can leverage a browser-based analytics platform to invest $ millions of.

Data visualization is an important way of data exploration and data presentation, however, there are still many challenges to be faced by developers of data visualization. The way to meet these challenges is a secret that many professional data visualization developers do not want to let others know about. ClearStory Data's two-bit visualization developer, Nate Argrin and Nick Rabinowitz, shared the 7 Secrets of the data visualization development they summarized and how they responded in practice in netmagzine.com.

Secret one: Data in reality is often ugly.

Most of the data visualization tutorials will make it easy to start with a raw dataset. Whether you're learning a basic bar chart or a force-oriented network diagram, your data is clean and collated. These perfect JSON or CSV files are as neat and tidy as the cooking hobs on TV. In fact, when you are dealing with real data in reality, you have 80% of the time to search, acquire, load, cleanse and convert your data.

This process can sometimes be done with automated tools. However, almost any work that needs to be cleaned for more than two datasets will always require more or less manual work. There are many tools that can convert an XLS file into an XML format or a timestamp to a different date format. However, in order to compare the type of sales used in a company with competitors, or to check the input errors, or to check the text produced by different encoding or OCR, it can only be handled by hand.

Tools and Processing methods:

1) in the data visualization project to allow sufficient time for data cleansing, especially in the need to deal with multiple data sources, manual input or OCR data, different categories of matching, or need to deal with some non-standard format, need to set aside more time.

2) Google Refine (editor: Need FQ) is a good data cleansing tool, although in some places, especially when dealing with non-tabular data, there are some deficiencies. In addition, there are a number of data cleansing tools such as Wranger and Mr data Converter. However, a lot of data cleansing work still requires you to be familiar with scripting languages like Python or need you to do some manual work in Excel. Remember to archive your scripts and you'll definitely use them later.

3) Use some simple scatter plot or histogram to find some super normal range of error data.

Secret two: Histograms tend to be better

Compared with the histogram, the bubble chart can show more data in the same space, the pie chart can show the whole and local relationship more clearly, and the tree chart can better represent the layered structure. However, these graphs cannot be compared to histogram in simple and clear terms.

When considering a data visualization design, the first question we ask ourselves is: "Is this a better plan than a bar chart?" "If you need to visualize a quantifiable dataset on a single dimension, there are few ways to compare it to a histogram." Similarly, a time series is best represented as a line chart, while a scatter plot is generally used to represent the correlation of two linear measurements. In the data visualization design, use of these graphs has been in use since 18th century the lowest risk. Histograms are the best way to visualize data comparisons. Because our human eyes are most accustomed to the comparison of two things by comparison.

One of the biggest secrets of data visualization in the bar chart is that the coolest visualizations are often the least useful. The most novel and aesthetically pleasing visualizations often pose a problem, which is the understandable problem of data. Many alternative graphs of histograms force people to compare in ways they are not good at, such as comparing area, angle, color, or transparency. These comparisons, said the good, is to increase the difficulty of the comparison, said serious, may distort the data, resulting in users to draw the wrong conclusion.

Tools and Processing methods:

1) do not abandon those traditional visualizations easily, if these methods can represent your data. Try a histogram or a line chart first, if your data really needs to be considered by other figures.

2) Understand the performance advantages of other forms of graphs, for example, bubble chart support more data range, pie chart support local Global contrast, tree chart can support hierarchical structure and so on.

3) Histogram is one of the easiest visualizations, you can manually write a piece of HTML code, just use CSS or a small amount of javascript, or from a formula in Excel, you can generate a valid bar chart.

Secret three: No substitute for real data

Cleaning and formatting a data set is cumbersome, if you need to design a visualization based on multiple datasets. For example, you need to visualize data from different parts of the company, each with its own database, and you don't have time to manually clean each data set. At this point, people's first thought might be to grab some demo data to visualize. And there may be some standard sample data in your visualization library.

Unfortunately, real data cannot be replaced. The demo data generally follows a normal distribution and has a limited amount of data. is to show the visualization. A seemingly perfect histogram does not help you solve data loss, abnormal data, or real-world problems. If you rely too much on the demo data, when you use real data, you will find that your data visualization design does not really meet the needs of your data analysis or data performance.

Tools and Processing methods:

1) If you can't access the entire data set, try taking some random sample data from the real data sets first.

2) Keep invalid or missing data, and do not clean the sample data if your data set is not ready for data cleansing before visualization.

3) The real data set may be too large. As you work with the sample data, scale the sample data proportionally before generating the final visualization.

Secret four: The details of the place are the most headache

For example, when you arrange the data identity horizontally, the data identification can not be seen clearly, if rotated 90 degrees, the data identification is clear, but also wasted a lot of space. Choosing an appropriate data identity format is a solution for some visualizations, but not for all scenarios.

Design data identification, annotations, or longitudinal axes of the axis are usually considered after initial visualization. However, these elements are important for visualization and can be difficult or require a lot of time to do them well. Especially if you can't predict your data beforehand.

When designing your visualizations, you need to set aside a considerable amount of space so that you may need to add logos only, usually to leave a relatively large space around your diagram. The markings on the longitudinal axis are to ensure that they are not covered and readable by each other. If necessary, you can rotate the identity to increase readability. If you have a spatial identity that is too concentrated, and you need these identifiers to be readable, you can allow you to consider a little farther away from the elements that they refer to, and then connect the logos and elements with the connector lines. Another way is to integrate identities into a group and visualize them in a way that identifies tooltips. If the marked text is too long, you can consider abbreviations or cut out the text beyond the way.

Similarly, annotations to diagrams need to be planned in advance. The simplest way to do this is to keep a portion of the area in the visualization to make it easier to annotate. However, this means that the portion of your diagram will be reduced. To preserve space, place the note on a blank part of the diagram. Or you can drag and drop the annotations so that the user could move the annotations away to see the part of the comment that is obscured.

Tools and Processing methods:

1) in the design of the data identification, data axis and note space on the map to stay good.

2) For data identification, define the maximum number of characters, beyond the need to cut off. A similar identity is grouped together and displayed when the user points to it.

3) for long annotations, you may consider rolling or expanding the way

4) In any case, do not neglect these elements. Data identification may not be your primary consideration when you focus on graphic design, but they are important to the user of visualization.

Secret five: Use animation when needed

Visual designers often want to be able to add animations to the final design. Animation is a very useful tool for connecting data and changing trends. However, animations can often lead to incorrect understanding of your data. Instead of simply animating it at the end, you need to evaluate how it will affect your final effect. Animations are best suited to show how data is grouped together in different states, how it changes over time, or how it affects each other.

The general design principle is that animations are simple, predictable, and can be re-played. Allows users to play the animation multiple times, allowing them to see where the animation element starts and stops. To prevent different elements from covering each other in the movement, do not let the element's motion be unpredictable. For complex animations, studies have shown that the animation can be decomposed into several different stages, pausing at each stage to give the user some time to experience it. This helps to improve the user's understanding.

Tools and Processing methods:

1) make the animation as simple as possible

2) If the animation is complex or has many animated elements, you can consider staged animation

3) animation at the beginning often gives a person a sense of freshness, but it can quickly make users feel bored. Don't just add animations to your visualizations simply because you will animate them.

Secret Six: Data visualization is not an analysis

Data visualization can produce some analysis results, but it should be noted that visualization is a tool for auxiliary analysis, not an alternative to data analysis, and it is not an alternative to statistics: your graphs may reveal some data differences or data correlations. However, a statistical approach is needed to arrive at a reliable conclusion that these differences and correlations exist. To really understand your data, skills that need analysis, and professional knowledge. Don't expect visualizations to give you these. Therefore, when visualizing a project, adjust the expectations of the customer or your CEO.

Tools and Processing methods:

1) Unless you are a data analyst, your conclusions about visualizing data are not easily judged. If you need to make a conclusion, it is best to find a statistic or a professional to verify it and then give the conclusion.

2) Some minor design changes, such as palette changes, visualization of a variable, can change the visual conclusion. If you are using visualization for analysis, be sure to try a variety of visualizations instead of relying on one way.

3) Stephen Few's book "Now You See It" describes how to use visualization for business analysis, including some suggestions on how developers can design visual tools for analysis use, which readers can refer to

Secret Seven: Data visualization is more than just programming

A large number of visual programming libraries and tutorials now enable ordinary people to design high-quality visual products in Web-based visualizations. However, in order to truly design a visual product that provides insight or a clear expression, there are many other skills that need to be developed in addition to programming. More than design, data analysis, interactive design, and understanding of people waiting. These skills are not available in the Visual Programming library.

The good news, though, is that if you stick with some basic principles of data visualization.

You don't need to know too much about these skills. For beginners, it is necessary to adhere to some of the most basic principles, such as the use of bar chart, do not set the radius of the circle by the linear scale (editor: In the area comparison will give users a wrong understanding), design to be simple (do not use 3D, less use of animation, do not use the shadow) and so on. According to some good visualization samples, beginners can also create good visual works.

Seven Secrets of data visualization experts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.