A tour of the waterfall diagram using Python to draw data _python

Source: Internet
Author: User

Introduced

The waterfall diagram is a useful tool for drawing certain types of data. It is not surprising that we can use pandas and matplotlib to create a repeatable waterfall diagram.

Before I go down, I want to tell you what kind of chart I'm referring to. I will build a 2D waterfall diagram described in wikipedia article.

A typical use of this chart is to show the value of the + and-the "bridge" effect between the start and end values. For this reason, finance people sometimes call it a bridge. Similar to the other examples I used earlier, this type of drawing is not easy to generate in Excel, certainly has the method to generate it, but it is not easy to remember.

The key point to remember about waterfall diagrams is that it is essentially a stacked bar, but the special point is that it has a blank bottom bar, so the top bar is "suspended" in the air. So, let's get started.
Create a chart

First, perform the standard input and make sure Ipython can display the Matplot diagram.

Import NumPy as NP
import pandas as PD
import matplotlib.pyplot as plt
 
%matplotlib Inline

Set up the data we want to draw the waterfall chart and load it into the data frame (dataframe).

The data needs to start with your starting value, but you need to give the final total. We'll figure it out below.

index = [' Sales ', ' returns ', ' credit fees ', ' rebates ', ' late charges ', ' shipping ']
data = {' Amount ': [350000,-30000,- 7500,-25000,95000,-7000]}
trans = PD. Dataframe (Data=data,index=index)

I used the handy display function in Ipython to make it easier to control what I want to display.

From Ipython.display Import display
display (trans)

The great trick of the waterfall diagram is to calculate the contents of the stacked bar at the bottom. On this point, I learned a lot from the discussion on StackOverflow.

First, we get cumulative and.

Display (Trans.amount.cumsum ())
sales      350000
returns     320000 credit
fees   312500
Rebates     287500
late charges  382500
shipping    375500
Name:amount, Dtype:int64

This looks good, but we need to move data from one place to the right.

Blank=trans.amount.cumsum (). Shift (1). Fillna (0)
display (blank)
 
sales        0
returns     350000
Credit Fees   320000
rebates     312500
late charges  287500
shipping    382500
Name: Amount, Dtype:float64

We need to add a net total to the trans and blank data frames.

Total = Trans.sum (). Amount
trans.loc[' net ' = Total
blank.loc[' net ' = Total
display (trans)
Display (blank)

Sales        0
returns     350000 credit
fees   320000
rebates
312500 late Charges  287500
Shipping    382500
net       375500
Name:amount, Dtype:float64

Create the steps that we use to show changes.

Step = Blank.reset_index (drop=true). Repeat (3). Shift ( -1)
step[1::3] = Np.nan
display (step)
 
0     0
0    nan
0  350000
1  350000
1    NaN
1  320000
2  320000
2    nan
2  312500
3  312500
3    NaN
3  287500
4  287500
4    nan
4  382500
5  382500
5    NaN
5  375500
6  375500
6    nan
6    nan
name:amount, Dtype:float64

For the "net" row, we need to make sure that the blank value is 0 in order not to double the stack.

blank.loc["NET" = 0

Then, paint it and see what it looks like.

My_plot = Trans.plot (kind= ' bar ', Stacked=true, Bottom=blank,legend=none, title= "2014 Sales Waterfall")
My_ Plot.plot (Step.index, Step.values, ' K ')

It looks pretty good, but let's try formatting the y-axis to make it more readable. To do this, we use funcformatter and some python2.7+ syntax to truncate the decimal number and add a comma to the format.

def money (x, POS):
  ' The two args are of the value and tick position ' return
  ' ${:,.0f} '. Format (x) from
 
matplotlib. Ticker import funcformatter
formatter = funcformatter (Money)

Then, group them together.

My_plot = Trans.plot (kind= ' bar ', Stacked=true, Bottom=blank,legend=none, title= "2014 Sales Waterfall")
My_ Plot.plot (Step.index, Step.values, ' K ')
My_plot.set_xlabel ("Transaction Types")
My_plot.yaxis.set_major_ Formatter (Formatter)

Full Script

The basic graphics work, but I want to add some labels and make some minor formatting changes. Here's my final script:

Import NumPy as NP import pandas as PD import Matplotlib.pyplot as PLT from matplotlib.ticker import Funcformatter #Use Python 2.7+ syntax to format currency def money (x, POS): ' The two args are the value and tick position ' return ' ${:. 0f} ". Format (x) formatter = Funcformatter #Data to plot. Don't include a total, it'll be calculated index = [' Sales ', ' returns ', ' credit fees ', ' rebates ', ' late charges ', ' shipping ' ] data = {' Amount ': [350000,-30000,-7500,-25000,95000,-7000]} #Store data and create a blank series to use for the water Fall trans = PD. Dataframe (data=data,index=index) blank = Trans.amount.cumsum (). Shift (1). Fillna (0) #Get The net total number for the FINA l element in the waterfall total = trans.sum (). Amount trans.loc[' net ']= total blank.loc[' net ' = Total #The steps Graphi Cally show the levels as OK as used for label placement step = Blank.reset_index (drop=true). Repeat (3). Shift ( -1) Step[1:: 3] = Np.nan #When plotting the last element, we want to show the full bar, #Set the blank to 0 blank.loc["NET" = 0 #Plot and label My_plot = Trans.plot (kind= ' bar ', Stacked=true, Bott Om=blank,legend=none, Figsize= (5), title= "2014 Sales Waterfall") My_plot.plot (Step.index, Step.values, ' K ') my_ Plot.set_xlabel ("Transaction Types") #Format the axis for dollars my_plot.yaxis.set_major_formatter (formatter) #Get th E y-axis position for the labels y_height = Trans.amount.cumsum (). Shift (1). Fillna (0) #Get a offset so labels don ' t sit  Right in top of bar max = Trans.max () Neg_offset = Max/25 Pos_offset = max/50 plot_offset = Int (MAX/15) #Start
  Label Loop loop = 0 for index, row in Trans.iterrows (): # for the last item in the list, we don ' t want to double count If row[' amount '] = = Total:y = Y_height[loop] Else:y = y_height[loop] + row[' amount '] # determine if we WAN T a neg or pos offset if row[' amount '] > 0:y + = pos_offset else:y-= Neg_offset my_plot.annotate ("{:,. 0 F} ". Format (row[' Amount ']), (Loop,y), ha= "center" loop+=1 #Scale up the y axis so there are room for the labels My_plot.set_ylim (0,blank.max () +int (plot_o Ffset) #Rotate the labels My_plot.set_xticklabels (trans.index,rotation=0) my_plot.get_figure (). Savefig ("
 Waterfall.png ", dpi=200,bbox_inches= ' tight ')

Running the script will generate the following beautiful chart:

The last Thought

If you weren't familiar with the waterfall map, hopefully this example will show you how useful it really is. I think some people might find it a bit bad to need so much scripting code for a chart. In some ways, I agree with this idea. If you're just making a waterfall and won't touch it later, you'll still have to use the Excel method.

However, what if the waterfall diagram is really useful and you need to copy it to 100 customers? What are you going to do next? Using excel at this point will be a challenge, and using the scripts in this article to create 100 different tables will be fairly easy. Again, the real value of this program is that it makes it easier for you to create a program that is easy to replicate when you need to extend the solution.

I really like to learn more pandas, matplotlib and Ipothon knowledge. I am glad that this method can help you, and hope that other people can learn some knowledge and apply this lesson to their daily work.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.