Data mining: Visualize your data with visualizations
"Editor's note" In this paper, the author of the Octopus Meow, the article through the introduction of visualization is visualized, listed the way the data are displayed. The most difficult part of data analysis is the presentation of data, interpretation of the relationship between the data, clear and effective communication and communications of data information.
For data mining, we can find out the methods and ideas of analyzing data and displaying data through the case of data visualization in this paper.
The Data visualization is a very interesting thing. Recently in the attempt to process data, it turned over visualization progress, and then in addition to IBM's famous many-eyes.
There is also a good interesting site is visualizing. Visualizing and Many-eyes very much like, are community forms of the site, users can register and upload, and the site has accumulated a lot of data for users to use.
Of course, I didn't write this post for the purpose of introducing this website, post is a note-taking process, and if I can't learn anything from it, it's a bit of a waste of time. Below to get to the point, I try to summarize some of the available experiences when visualization.
What form should be used to represent the data
Useful forms that are extracted from the visualizing.org classification include (but honestly, the classification is not very easy to use)
- Chart
- Time series
- Map
- Flow
- Matrix
- Network
- Hierarchy
- Info-graphic
To visualize the data can be divided into several categories (I think not comprehensive, welcome to add, learn together)
There are a series of objects that are associated with each other
Written in a↔b bold, the Latin alphabet represents a range of objects, such as a range of locations.
In this case, because you want to show the relationship between the data, it is essentially a network diagram, but some techniques can make a simple network diagram into a better form.
Mode one: Use Convert to flow chart. By listing the objects two times it was supposed to be a complex, hard-to-see network that became a clear, easy-to-find flow.
The one I like in this kind of diagram is people moving flow.
This flow chart is a great showcase for migrating from one country to another, with people emigrating (migrate, immigrants?). To Canada, you can see whether China (CH) has emigrated to Canada or more. With this flow, we can easily and intuitively analyze the data.
Mode two: The network diagram of the ring shape. Why do you want to make a circle? Because the loop allows the connection to be centered inside the loop, it can reduce data crossover. With interactive design, you can make connections without crossing. Like this migrants moving money:
This is China's overseas Chinese, that is, the Chinese immigrants sent back to China's money. We can see the exclusion of Hong Kong, the United States is the largest source.
In fact, this method is the same as the first nature.
Way three: Network Diagram. associated with points and lines. Examples such as attractions of COUNCILS:WEF GAC Interlink survey
But the picture is actually not good. And sometimes, the lines can be removed, such as the visualization of this international flight:
Click a nation to see all connected nations via flights. Click again to see arranged nations based on the distance. Double-click the background to reset.
:
Mode four: Use table. However, in order to be more intuitive, the area to represent the size of the data.
For example, 10 people of any two people in accordance with each other's goodwill degree score, in order to show any two people A and B mutual affection degree, you can use the color column to display, selected as two people the same degree of affection, color column above the color of A to B more than B to a goodwill, and vice versa.
Here is an example of a council, as follows:
hierarchical data, the data can be divided into several hierarchical relationships
is Hierarchy, but sometimes you can skip the connection.
Like this soft drink hierarchy figure.
From this immediately you can see the Coca-Cola and Pepsi of the vast, through the original page can be free to zoom out to see the different products of the company.
Such a hierarchy chart is more informative than a neat list of monotonous juxtaposition, because the size of the circle can represent a dimension of the data, and even a color can be introduced to represent more dimensions.
simple two-dimensional data, such as the frequency with which a phenomenon occurs
Mode one: Use histogram. This is a more classic choice, which is to use the length of a rectangle or line to represent the size of the data. For example, this visualization on energy
Method Two: Use the tree map, the area of the data to represent the size of the database. Here is an example of the UN Global Pulse visualization:
Method Three: Use scatter points, use attributes such as the size or color of the scatter to represent the size of the data.
A good example is the example of a student sitting in a seating habit:
In fact, the tag page belongs to this category, we can use each tag size color and so on to indicate the size of the data.
Coordinate data
In addition to using the methods mentioned above, for coordinate data, there is a feature of mapping (map), and map can be combined with other forms, such as flow. A good example of this is a picture of us flying by plane, as follows:
The map above the picture is the departure city of the flight, and the lower map is the destination city. More content can be seen at this site of UCSB, which provides demo software.
The combination of different visualization
Some time ago, I thought that astronomer Goodman wrote a paper on the visualization of high-dimensional astronomical data, which mentions that linked views are important, that is, we want to combine multiple visualizations to display the data, I intercept a picture in the paper to illustrate.
The combination of different visualization to the data for multi-angle rendering, can make us have a deeper understanding of the data. So data mining is actually a very extensive application, a data mining professional student in this astronomy major is now bombarded by a large number of figures (a paper is said data tsunami ERA) is really a treasure.
With a good example of historical data visualization, the timeline and map are assembled to show that this scheme is actually a deep-seated linked views:conflict history of the
A few useful tools
1, Http://en.wikipedia.org/wiki/Data_visualization Nature must first look at Wikipedia la La la ~
2. Visualizing.org has a list:
3, http://selection.datavisualization.ch/lists a lot of useful tools.
4, Https://github.com/blprnt/Kepler-Visualization This was a processing sketch to visualize data from NASA's Kepler mission.
5, http://flowingmedia.com/timeflow.html time Flow is a open-source timeline built to help journalists analyze temporal da Ta. The application offers several view modes–timelime, calendar, list, table–to help explore thousands of data points.
6, Http://mapbox.com/Mapbox is a tool for map making.
Data Visualization Agencies/organizations/communities
1, http://envisioningtech.com/
Some good data visualization, such as (image from Envisioningtech.com)
2, IBM's many-eyes.com
The first mention of this is a visualization community.
3, http://datavisualization.ch/
A list of its tools has been mentioned before. This site is
Datavisualization.ch is the premier news and knowledge resource for data visualization and infographics.
4, http://visual.ly/
A website similar to the data visualization community.
5, http://visualization.geblogs.com/
An example from GE.
6, http://oicweave.org/
web-based Analysis and visualization environment
The data used herein is shown in accordance with visualizing.org, using the CC BY-NC-SA protocol, except for clearly specified images, all other images are from visualizing.org.
Well, when you're done, you can use the exoplanets.org data to play.
Resources:
The importance of big data and data visualization: http://www.ciotimes.com/bigdata/110469.html
21 Cool data visualization tools to take away! : http://www.woshipm.com/xiazai/216656.html
China cloud data Mirror: http://www.moojnn.com/index.html
Counting: 55 Most practical big Data visualization analysis tools: http://tech.it168.com/a2015/0318/1712/000001712286.shtml
Data mining: Visualize your data with visualizations: http://www.leiphone.com/news/201406/warlial-visualization.html
30 Best Data visualization tools recommended: http://www.iteye.com/news/28936/
The world's 28 largest data visualization application case (iv): http://mt.sohu.com/20160226/n438541718.shtml
"Data Visualization Reference"