Introduction
Waterfall charts are a very useful tool for drawing certain types of data. Not surprisingly, we can use Pandas and matplotlib to create a repeatable waterfall diagram.
Before I go down, I want to tell you what kind of chart I'm referring to. I will create a 2D waterfall diagram described in wikipedia article.
A typical use of this kind of chart is to display the value of + and-for the "bridge" between the start and end values. For this reason, financial staff sometimes call it a bridge. Similar to the other examples I used earlier, this type of drawing is not easy to generate in Excel and certainly has a way of generating it, but it's not easy to remember.
The key point to remember about a waterfall diagram is that it is essentially a stacked bar chart, but the special point is that it has a blank bottom bar, so the top bar is "suspended" in the air. Well, let's get started.
Create a chart
First, perform the standard input and ensure that the Ipython can display the Matplot graph.
Import NumPy as Npimport pandas as Pdimport Matplotlib.pyplot as plt%matplotlib inline
Set we want to draw the data from the waterfall and load it into the data frame (DataFrame).
The data needs to start with your starting value, but you need to give the final total. We will calculate it below.
index = [' Sales ', ' returns ', ' credit fees ', ' rebates ', ' late charges ', ' shipping ']data = {' Amount ': [350000,-30000,-7500,- 25000,95000,-7000]}trans = PD. DataFrame (Data=data,index=index)
I used the convenient display function in Ipython to more easily control what I wanted to display.
From Ipython.display import Displaydisplay (trans)
The best technique for a waterfall chart is to calculate the contents of the bottom stacked bar chart. On this point, I learned a lot from the discussion on StackOverflow.
First, we get accumulated and.
Display (Trans.amount.cumsum ()) Sales 350000returns 320000credit fees 312500rebates 287500late Charges 382500shipping 375500name:amount, Dtype:int64
This looks good, but we need to move the data from one place to the right.
Blank=trans.amount.cumsum (). Shift (1). Fillna (0) display (blank) sales 0returns 350000credit fees 320000rebates 312500late charges 287500shipping 382500name:amount, Dtype:float64
We need to add a net total to the trans and blank data frames.
Total = Trans.sum (). amounttrans.loc["NET"] = totalblank.loc["NET"] = Totaldisplay (trans) display (blank)
Sales 0returns 350000credit fees 320000rebates 312500late charges 287500shipping 382500net 375500name:amount, Dtype:float64
Create the steps we use to show changes.
Step = Blank.reset_index (drop=true). Repeat (3). Shift ( -1) Step[1::3] = Np.nandisplay (step) 0 xx NaN0 3500001 3500001 NaN1 3200002 3200002 NaN2 3125003 3125003 NaN3 2875004 2875004 NaN4 3825005 3825005 NaN5 3755006 3755006 NaN6 nanname:amount, Dtype:float64
For the "net" line, to not double the stack, we need to make sure that the blank value is 0.
blank.loc["NET"] = 0
Then, draw it and see what it looks like.
My_plot = Trans.plot (kind= ' bar ', Stacked=true, Bottom=blank,legend=none, title= "Sales Waterfall") My_plot.plot ( Step.index, Step.values, ' K ')
It looks pretty good, but let's try formatting the y-axis to make it more readable. To do this, we use funcformatter and some python2.7+ syntax to truncate the decimal and add a comma to the format.
def money (x, POS): ' The both args is the value and tick position ' return ' ${:,.0f} '. Format (x) from Matplotlib.tick ER import funcformatterformatter = funcformatter (Money)
Then, they are grouped together.
My_plot = Trans.plot (kind= ' bar ', Stacked=true, Bottom=blank,legend=none, title= "Sales Waterfall") My_plot.plot ( Step.index, Step.values, ' K ') My_plot.set_xlabel ("Transaction Types") My_plot.yaxis.set_major_formatter (Formatter)
Full Script
The basic graphic works fine, but I want to add some tags and make some minor formatting changes. Here's my final script:
Import NumPy as Npimport pandas as Pdimport Matplotlib.pyplot as Pltfrom matplotlib.ticker import funcformatter #Use Pytho n 2.7+ syntax to format currencydef money (x, POS): ' The both args is the value and tick position ' return ' ${:,.0f} '. Form at (x) formatter = Funcformatter (money) #Data to plot. Don't include a total, it'll be calculatedindex = [' Sales ', ' returns ', ' credits fees ', ' rebates ', ' late charges ', ' shipping '] data = {' Amount ': [350000,-30000,-7500,-25000,95000,-7000]} #Store data and create a blank series to use for the waterfall trans = PD. DataFrame (data=data,index=index) blank = Trans.amount.cumsum (). Shift (1). Fillna (0) #Get The net total number for the final element in the Waterfalltotal = Trans.sum (). amounttrans.loc["NET"]= totalblank.loc["NET"] = Total #The steps graphically s How the levels as well as used for label placementstep = Blank.reset_index (drop=true). Repeat (3). Shift ( -1) step[1::3] = NP.N A #When plotting the last element, we want to show the full bar, #Set the blank to0blank.loc["NET"] = 0 #Plot and Labelmy_plot = Trans.plot (kind= ' bar ', Stacked=true, Bottom=blank,legend=none, figsize= ( (5), title= "Sales Waterfall") My_plot.plot (Step.index, Step.values, ' K ') My_plot.set_xlabel ("Transaction Types") ) #Format the axis for Dollarsmy_plot.yaxis.set_major_formatter (formatter) #Get the y-axis position for the Labelsy_height = Trans.amount.cumsum (). Shift (1). Fillna (0) #Get an offset so labels don ' t sit right on top of the Barmax = Trans.max () neg _offset = Max/25pos_offset = max/50plot_offset = Int (MAX/15) #Start label looploop = 0for index, row in Trans.iterro WS (): # for the last item in the list, we don't want to double count if row[' amount '] = = Total:y = Y_height[loop] E Lse:y = Y_height[loop] + row[' amount '] # determine if we want a neg or pos offset if row[' amount '] > 0:y + = P Os_offset else:y-= Neg_offset my_plot.annotate ("{:,. 0f}". Format (row[' Amount '), (loop,y), ha= "center") loop+=1 #Scal e up the Y axis so there isFor the Labelsmy_plot.set_ylim (0,blank.max () +int (Plot_offset)) #Rotate the Labelsmy_plot.set_xticklabels ( trans.index,rotation=0) My_plot.get_figure (). Savefig ("Waterfall.png", dpi=200,bbox_inches= ' tight ')
Running the script will generate the following beautiful chart:
The final idea
If you're not familiar with the waterfall diagram, hopefully this example will show you how useful it really is. I think some people might think that it's a bit of a bad thing to need so much scripting code for a chart. In some ways, I agree with this idea. If you just make a waterfall, and you don't touch it later, you can continue using Excel.
However, what if the waterfall is really useful and you need to copy it to 100 customers? What are you going to do next? Using excel at this point would be a challenge, and it would be fairly easy to use the script in this article to create 100 different tables. Again, the real value of this process is that when you need to extend the solution, it makes it easy for you to create a program that is easy to replicate.
I really enjoy learning more about pandas, matplotlib and Ipothon. I'm glad that this approach can help you and hopefully others can learn something from it and apply this lesson to their daily work.