Introduced
The waterfall diagram is a useful tool for drawing certain types of data. It is not surprising that we can use pandas and matplotlib to create a repeatable waterfall diagram.
Before I go down, I want to tell you what kind of chart I'm referring to. I will build a 2D waterfall diagram described in wikipedia article.
A typical use of this chart is to show the value of the + and-the "bridge" effect between the start and end values. For this reason, finance people sometimes call it a bridge. Similar to the other examples I used earlier, this type of drawing is not easy to generate in Excel, certainly has the method to generate it, but it is not easy to remember.
The key point to remember about waterfall diagrams is that it is essentially a stacked bar, but the special point is that it has a blank bottom bar, so the top bar is "suspended" in the air. So, let's get started.
Create a chart
First, perform the standard input and make sure Ipython can display the Matplot diagram.
Import NumPy as NP
import pandas as PD
import matplotlib.pyplot as plt
%matplotlib Inline
Set up the data we want to draw the waterfall chart and load it into the data frame (dataframe).
The data needs to start with your starting value, but you need to give the final total. We'll figure it out below.
index = [' Sales ', ' returns ', ' credit fees ', ' rebates ', ' late charges ', ' shipping ']
data = {' Amount ': [350000,-30000,- 7500,-25000,95000,-7000]}
trans = PD. Dataframe (Data=data,index=index)
I used the handy display function in Ipython to make it easier to control what I want to display.
From Ipython.display Import display
display (trans)
The great trick of the waterfall diagram is to calculate the contents of the stacked bar at the bottom. On this point, I learned a lot from the discussion on StackOverflow.
First, we get cumulative and.
Display (Trans.amount.cumsum ())
sales 350000
returns 320000 credit
fees 312500
Rebates 287500
late charges 382500
shipping 375500
Name:amount, Dtype:int64
This looks good, but we need to move data from one place to the right.
Blank=trans.amount.cumsum (). Shift (1). Fillna (0)
display (blank)
sales 0
returns 350000
Credit Fees 320000
rebates 312500
late charges 287500
shipping 382500
Name: Amount, Dtype:float64
We need to add a net total to the trans and blank data frames.
Total = Trans.sum (). Amount
trans.loc[' net ' = Total
blank.loc[' net ' = Total
display (trans)
Display (blank)
Sales 0
returns 350000 credit
fees 320000
rebates
312500 late Charges 287500
Shipping 382500
net 375500
Name:amount, Dtype:float64
Create the steps that we use to show changes.
Step = Blank.reset_index (drop=true). Repeat (3). Shift ( -1)
step[1::3] = Np.nan
display (step)
0 0
0 nan
0 350000
1 350000
1 NaN
1 320000
2 320000
2 nan
2 312500
3 312500
3 NaN
3 287500
4 287500
4 nan
4 382500
5 382500
5 NaN
5 375500
6 375500
6 nan
6 nan
name:amount, Dtype:float64
For the "net" row, we need to make sure that the blank value is 0 in order not to double the stack.
Then, paint it and see what it looks like.
My_plot = Trans.plot (kind= ' bar ', Stacked=true, Bottom=blank,legend=none, title= "2014 Sales Waterfall")
My_ Plot.plot (Step.index, Step.values, ' K ')
It looks pretty good, but let's try formatting the y-axis to make it more readable. To do this, we use funcformatter and some python2.7+ syntax to truncate the decimal number and add a comma to the format.
def money (x, POS):
' The two args are of the value and tick position ' return
' ${:,.0f} '. Format (x) from
matplotlib. Ticker import funcformatter
formatter = funcformatter (Money)
Then, group them together.
My_plot = Trans.plot (kind= ' bar ', Stacked=true, Bottom=blank,legend=none, title= "2014 Sales Waterfall")
My_ Plot.plot (Step.index, Step.values, ' K ')
My_plot.set_xlabel ("Transaction Types")
My_plot.yaxis.set_major_ Formatter (Formatter)
Full Script
The basic graphics work, but I want to add some labels and make some minor formatting changes. Here's my final script:
Import NumPy as NP import pandas as PD import Matplotlib.pyplot as PLT from matplotlib.ticker import Funcformatter #Use Python 2.7+ syntax to format currency def money (x, POS): ' The two args are the value and tick position ' return ' ${:. 0f} ". Format (x) formatter = Funcformatter #Data to plot. Don't include a total, it'll be calculated index = [' Sales ', ' returns ', ' credit fees ', ' rebates ', ' late charges ', ' shipping ' ] data = {' Amount ': [350000,-30000,-7500,-25000,95000,-7000]} #Store data and create a blank series to use for the water Fall trans = PD. Dataframe (data=data,index=index) blank = Trans.amount.cumsum (). Shift (1). Fillna (0) #Get The net total number for the FINA l element in the waterfall total = trans.sum (). Amount trans.loc[' net ']= total blank.loc[' net ' = Total #The steps Graphi Cally show the levels as OK as used for label placement step = Blank.reset_index (drop=true). Repeat (3). Shift ( -1) Step[1:: 3] = Np.nan #When plotting the last element, we want to show the full bar, #Set the blank to 0 blank.loc["NET" = 0 #Plot and label My_plot = Trans.plot (kind= ' bar ', Stacked=true, Bott Om=blank,legend=none, Figsize= (5), title= "2014 Sales Waterfall") My_plot.plot (Step.index, Step.values, ' K ') my_ Plot.set_xlabel ("Transaction Types") #Format the axis for dollars my_plot.yaxis.set_major_formatter (formatter) #Get th E y-axis position for the labels y_height = Trans.amount.cumsum (). Shift (1). Fillna (0) #Get a offset so labels don ' t sit Right in top of bar max = Trans.max () Neg_offset = Max/25 Pos_offset = max/50 plot_offset = Int (MAX/15) #Start
Label Loop loop = 0 for index, row in Trans.iterrows (): # for the last item in the list, we don ' t want to double count If row[' amount '] = = Total:y = Y_height[loop] Else:y = y_height[loop] + row[' amount '] # determine if we WAN T a neg or pos offset if row[' amount '] > 0:y + = pos_offset else:y-= Neg_offset my_plot.annotate ("{:,. 0 F} ". Format (row[' Amount ']), (Loop,y), ha= "center" loop+=1 #Scale up the y axis so there are room for the labels My_plot.set_ylim (0,blank.max () +int (plot_o Ffset) #Rotate the labels My_plot.set_xticklabels (trans.index,rotation=0) my_plot.get_figure (). Savefig ("
Waterfall.png ", dpi=200,bbox_inches= ' tight ')
Running the script will generate the following beautiful chart:
The last Thought
If you weren't familiar with the waterfall map, hopefully this example will show you how useful it really is. I think some people might find it a bit bad to need so much scripting code for a chart. In some ways, I agree with this idea. If you're just making a waterfall and won't touch it later, you'll still have to use the Excel method.
However, what if the waterfall diagram is really useful and you need to copy it to 100 customers? What are you going to do next? Using excel at this point will be a challenge, and using the scripts in this article to create 100 different tables will be fairly easy. Again, the real value of this program is that it makes it easier for you to create a program that is easy to replicate when you need to extend the solution.
I really like to learn more pandas, matplotlib and Ipothon knowledge. I am glad that this method can help you, and hope that other people can learn some knowledge and apply this lesson to their daily work.