The Pygal module of Python real-combat data visualization (actual combat article)

Source: Internet
Author: User
Tags aliases i18n python list


Frontier


Through the previous section on the Python combat data visualization of the Pygal module (Basic) Learning, we have a preliminary understanding of the use of Pygal module, this section will be a practical project to deepen the use of Pygal module. The JSON-formatted population data can be downloaded from the Web and processed using JSON modules, and the Pygal module provides a map creation tool for beginners, which we will use to visualize demographic data to explore the distribution of the global population. For the JSON-formatted population data file, you can download it by talking about the supporting resources of the Matplotlib module (actual combat chapter) of the Python combat data visualization. I am in the learning and coding process of the problems encountered, I will solve one by one.


The execution efficiency of a small episode


I am in the process of learning, go to various forums and find an interesting post. is a probe into the efficiency of Python execution------ Add a line of code to make Python run 100 times times faster . What code is so powerful? We test to see, from 11 straight to 100 million.
(1) Original code:


import time

def foo (x, y):
     tt = time.time () # time.time () returns the timestamp of the current time (floating point seconds since the 1970 epoch)
     s = 0
     for i in range (x, y):
         s + = i
     print (‘Time used: (} sec’.format (time.time ()-tt))
     return s

print (foo (1, 100000000))


What is a timestamp? The timestamp represents the offset in seconds from 00:00:00 January 1, 1970 (time.gmtime (0)) The function in this module cannot handle 1970 that previous date and time or too distant future (processing limit depends on C function library, for 32-bit systems, is 2038).
The results of the operation are as follows:

(2) Add a line of code and look at the results:


from numba import jit # added code
import time

@jit # added code
def foo (x, y):
     tt = time.time () # time.time () returns the timestamp of the current time (floating point seconds since the 1970 epoch)
     s = 0
     for i in range (x, y):
         s + = i
     print (‘Time used: (} sec’.format (time.time ()-tt))
     return s

print (foo (1, 100000000))


The results of the operation are as follows:

Summary: The original code to test out is 23sec, add a line of code on the programming 0.25sec, as if really faster nearly 100 times times yes. Concrete implementation of the principle seems a bit complex, and so on after a wide range of knowledge to study the internal principle of it.


JSON Format data


JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for people to read and write. For example:


[
  {
    "Country Name": "Arab World",
    "Country Code": "ARB",
    "Year": "1960",
    "Value": "96388069"
  },
  {
    "Country Name": "Arab World",
    "Country Code": "ARB",
    "Year": "1961",
    "Value": "98882541.4"
  },
....
....
....


As you can see, this file is actually a long python list, where each element is a dictionary with four keys: Country name, Country code, year, and value that represents the population number.


Extracting data from JSON-formatted files


Under the project catalog, create a world_population.py file and place the Population_data.json format file in the project directory. Then write the following code to try to extract the formatted data after the JSON module transformation:


# Import the json module to analyze the JSON format file
import json

filename = ‘population_data.json’
with open (filename) as f:
     # Function json.load () converts data (file object) into a format that Python can handle,
     pop_data = json.load (f) # pop_data is a list, each element contains a dictionary of four keys

for pop_dict in pop_data:
     # Only select the population of the country in 2010
     if pop_dict [‘Year’] == ‘2010’:
         # Save and print out the country name and country population of each country
         country_name = pop_dict [‘Country Name’]
         population = int (float (pop_dict [‘Value‘]))
         print (country_name + ":" + str (population))


The results of the operation are as follows:

It is important to note that the above code gets the value of pop_dict[' value '] as a string, and the number of population must be used when we visualize the data, so we first convert to float type and then to int type. why not convert directly to int type? This is because Python cannot convert directly to an integer when the for loop iterates over the population value is a string containing a decimal point (for example: ' 1127437398.85751 '), otherwise an error similar to the following will appear:

To eliminate this error, the correct approach is to first convert the ' 1127437398.85751 ' string to a float type (1127437398.85751) and then to an int type (1127437398).


Get a two-letter country code


The Map authoring tool in Pygal requires that the data be in a specific format: country codes for countries, and numbers for population numbers. The most important problem is that the Population_ The Data.json contains three-letter country codes, but Pygal uses two-letter country codes (stored in the i18n module, actually in a dictionary countries of the module, which contains keys and values of two-letter country codes and country names) to represent the country. So the problem we're going to solve is to get two-letter country codes based on the country name in the dictionary countries in the i18n module. This allows the number of country codes and population numbers represented on the world map to use the two-letter country code in the dictionary countries and the population in the Population_data.json file, respectively. Well, the process of solving the problem, now we try to use the i18n module to get the keys and values in the dictionary countries, first create a country_codes.py file in the project directory.
Note that the 16.2.4 section of the book P327 page, the method of importing the i18n module is not applicable to the present. if the code for the import module writes "From pygal.i18n import Countries", the following error is reported:

Should be changed to "from pygal_maps_world.i18n import countries".
The code is as follows:


# Returns the country code of the corresponding country name in the COUNTRIES dictionary in the il8n module
from pygal_maps_world.i18n import COUNTRIES

def get_country_code (country_name):
     for code, name in COUNTRIES.items (): # Returns all key-value pairs of the dictionary
         if name == country_name: # Returns a two-letter country code based on the country name
             return code
     return None # None if not found


Modify the code for the world_population.py file as follows:


# Import the json module to analyze the JSON format file
import json
from country_codes import get_country_code

filename = ‘population_data.json’
with open (filename) as f:
     # Function json.load () converts data (file object) into a format that Python can handle,
     pop_data = json.load (f) # pop_data is a list, each element contains a dictionary of four keys

for pop_dict in pop_data:
     # Only select the population of the country in 2010
     if pop_dict [‘Year’] == ‘2010’:
         # Save and print out the country name and country population of each country
         country_name = pop_dict [‘Country Name’]
         population = int (float (pop_dict [‘Value‘]))
         code = get_country_code (country_name) # Pass the country name obtained from the population_data.json file into the function, and return the corresponding country code if it exists
         if code: # If present, output the country code corresponding to the country name
             print (code + ":" + str (population))
         else:
             print (‘ERROR-‘ + country_name)


The results of the operation are as follows:

As can be seen, in fact there are a large number of countries do not have corresponding country code, resulting in the display of error messages for two reasons. First, not all population numbers are countries, some are regional and economic groups. Second, some statistics use different full country names, so they cannot be identified.


Make a map of the world


With the two-letter country code, we can take the following steps:
1. Construct virtual data A world map shows which countries have designated country codes, and the number of people present, to feel the grandeur of the world map.
2. Draw a complete map of the world population with 2010 years of real data.
3. Grouping of countries on the basis of population
4. World Population map chart for style optimization processing.
(1) Making a world map of analog data
prior to this, the P329 page 16.2.5 section of the code called the function to create the world map method for now also does not work, if the code is written "WM = Pygal." WorldMap () "will report the following error :

Change the code to "WM = Pygal.maps.world.World ()" to pass. the code is as follows:


import pygal

wm = pygal.maps.world.World () # create an instance
wm.title = ‘North, Central America’

# Use the add function to add labels and country names and the number of people. If parameter 2 is a list (only country names and no specified number of people), the default number of people is 1, then the same color is used, and the color is the same. Unless the population is different.
# And if parameter 2 is a dictionary, it means that the country name is also specified as well as the number of population. Although the same type of color is used, the color depth is determined according to the number of population.
wm.add (‘North America’, {‘ca’: 10000, ‘mx’: 20000, ‘us’: 30000})
wm.add ('Central America', {'bz': 40000, 'cr': 50000, 'gt': 60000, 'hn': 70000, 'ni': 80000, 'pa': 90000, 'sv': 100000})

wm.render_to_file (‘americas.svg’)


Put the americas.svg in the browser to display, the result is as follows:

The population figures above are purely fictitious. As you can see, moving the mouse over the country shows the country name and population, with three countries in North America using the same, but different shades of color, where the larger the population, the darker the color. The same is true of many countries in Central America.
(2) Drawing a complete map of the world population
To render the population of other countries, the data previously processed (the two-letter country code and the population of the corresponding country) needs to be converted to the Pygal required dictionary format (that is, the second parameter passed into the Add function as an argument). The code is as follows:


import json
import pygal
from country_codes import get_country_code

filename = ‘population_data.json’
with open (filename) as f:
     # Function json.load () converts data (file object) into a format that Python can handle,
     pop_data = json.load (f) # pop_data is a list, each element contains a dictionary of four keys

cc_populations = ()
for pop_dict in pop_data:
     if pop_dict [‘Year’] == ‘2010’:
         country_name = pop_dict [‘Country Name’]
         population = int (float (pop_dict [‘Value‘]))
         code = get_country_code (country_name)
         if code:
             cc_populations [code] = population

wm = pygal.maps.world.World ()
wm.title = ‘World Population in 2010, by Country’
wm.add (‘2010’, cc_populations)

wm.render_to_file (‘world_population.svg’)


The results of the operation are as follows:

According to the above can be summed up: If the whole world all countries in the country code and the number of people into a dictionary, and then call an add function added to the world map, then the dictionary as a group, then in the world map display, using the same, but different shades of red color to express, The larger the population, the darker the color. Think again, this is not very good, because overall, it is difficult to reflect the difference in the number of people, in order to solve this problem, we are the root cause, mainly we only use a dictionary and only call an add function to represent the whole world cause the color is single, the solution is we can group it, The number of people in a group, the population of a group of medium, the number of people in a group of small, then divided into three groups.


Group countries according to population size


For the analysis of the previous section's conclusions, we will use this section to reflect the difference in the number of population by grouping. Divided into three groups according to the population: less than 10 million, between 10 million and 1 billion, and more than 1 billion.
The code is as follows:


import json
import pygal
from country_codes import get_country_code

filename = ‘population_data.json’
with open (filename) as f:
    # Function json.load () converts data (file object) into a format that Python can handle,
    pop_data = json.load (f) # pop_data is a list, each element contains a dictionary of four keys

cc_populations = ()
for pop_dict in pop_data:
    if pop_dict [‘Year’] == ‘2010’:
        country_name = pop_dict [‘Country Name’]
        population = int (float (pop_dict [‘Value‘]))
        code = get_country_code (country_name)
        if code:
            cc_populations [code] = population

# Divide all countries into three groups based on population
cc_pops_1, cc_pops_2, cc_pops_3 = {}, {}, {}
for cc, pop in cc_populations.items ():
    if pop <10000000:
        cc_pops_1 [cc] = pop
    elif pop <1000000000:
        cc_pops_2 [cc] = pop
    else:
        cc_pops_3 [cc] = pop

wm = pygal.maps.world.World () # create an instance
wm.title = ‘World Population in 2010, by Country’
wm.add (‘0-10m’, cc_pops_1)
wm.add (‘10m-1bn’, cc_pops_2)
wm.add (‘> 1bn’, cc_pops_3)

wm.render_to_file (‘world_population.svg’)


The results of the operation are as follows:

As can be seen, the world map uses three different colors, more intuitively see the difference in population number, in each group, each country according to the population from less to more light to deep color. Of these, China and India are countries with more than 1 billion people.


World Population Map chart for style optimization processing


In the previous case, using the default color settings is not very nice, we can use the Pygal style setting instructions to adjust the color. The Pygal style is stored in the module style, and we import the Rotatestyle class from this module, and when we create an instance of this class, we need to provide an argument-16 RGB color. The RGB color in hexadecimal format is a string preceded by a pound sign (#) followed by 6 characters, with the first two characters representing the red component, the next two representing the green component, and the last two representing the blue component. The value range for each component is 00 (no corresponding color) ~ff (contains the most appropriate color). Pygal usually use a darker color theme by default. The color of the map is highlighted using Lightcolorizedstyle.
The code is as follows:


import json
import pygal
from country_codes import get_country_code
from pygal.style import LightColorizedStyle as LCS, RotateStyle as RS # Import RotateStyle and LightColorizedStyle and take aliases, and use aliases for subsequent calls, which is much more convenient

filename = ‘population_data.json’
with open (filename) as f:
    # Function json.load () converts data (file object) into a format that Python can handle,
    pop_data = json.load (f) # pop_data is a list, each element contains a dictionary of four keys

cc_populations = ()
for pop_dict in pop_data:
    if pop_dict [‘Year’] == ‘2010’:
        country_name = pop_dict [‘Country Name’]
        population = int (float (pop_dict [‘Value‘]))
        code = get_country_code (country_name)
        if code:
            cc_populations [code] = population

# Divide all countries into three groups based on population
cc_pops_1, cc_pops_2, cc_pops_3 = {}, {}, {}
for cc, pop in cc_populations.items ():
    if pop <10000000:
        cc_pops_1 [cc] = pop
    elif pop <1000000000:
        cc_pops_2 [cc] = pop
    else:
        cc_pops_3 [cc] = pop

wm_style = RS (‘# 336699’, base_style = LCS) # A style object, the parameter specifies a hexadecimal RGB color
wm = pygal.maps.world.World (style = wm_style) # Create an instance and pass in a style object wm_style with a specified color
wm.title = ‘World Population in 2010, by Country’
wm.add (‘0-10m’, cc_pops_1)
wm.add (‘10m-1bn’, cc_pops_2)
wm.add (‘> 1bn’, cc_pops_3)

wm.render_to_file (‘world_population.svg’) 


The results of the operation are as follows:



The Pygal module of Python real-combat data visualization (actual combat article)


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.