How to use Python to create a waterfall chart of data

Source: Internet
Author: User
This article describes how to use Python to draw data waterfall diagrams. the Pandas and matplotlib libraries are used in this tutorial. For more information, see Introduction

Waterfall chart is a useful tool for drawing certain types of data. Not surprisingly, we can use Pandas and matplotlib to create a reusable waterfall chart.

Before proceeding, I want to tell you which type of chart I refer. I will create a 2D waterfall chart described in Wikipedia.

A typical use of this type of chart is to display the + and-values that act as a "bridge" between the start value and the end value. For this reason, financial personnel sometimes call it a bridge. Similar to other examples I used earlier, this type of drawing is not easy to generate in Excel. of course, there must be a way to generate it, but it is not easy to remember.

The key point to remember about a waterfall chart is that it is essentially a stacked bar chart, but it has a blank bottom bar, so the top bar will be "suspended" in the air. Let's get started.
Create a chart

First, execute the standard input and ensure that IPython can display the matplot diagram.

import numpy as npimport pandas as pdimport matplotlib.pyplot as plt %matplotlib inline

Set the data we want to plot the waterfall chart and load it into the data frame (DataFrame.

Data needs to start with your starting value, but you need to give the final total number. We will calculate it below.

index = ['sales','returns','credit fees','rebates','late charges','shipping']data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]}trans = pd.DataFrame(data=data,index=index)

I used the convenient display function in IPython to control the content that I want to display more easily.

from IPython.display import displaydisplay(trans)

The greatest skill of a waterfall chart is to calculate the content of the stacked bar chart at the bottom. I have learned a lot from the stackoverflow discussion.

First, we get the sum.

display(trans.amount.cumsum())sales      350000returns     320000credit fees   312500rebates     287500late charges  382500shipping    375500Name: amount, dtype: int64

This looks good, but we need to move data from one location to the right.

blank=trans.amount.cumsum().shift(1).fillna(0)display(blank) sales        0returns     350000credit fees   320000rebates     312500late charges  287500shipping    382500Name: amount, dtype: float64

We need to add a net total to the trans and blank data frames.

total = trans.sum().amounttrans.loc["net"] = totalblank.loc["net"] = totaldisplay(trans)display(blank)

sales        0returns     350000credit fees   320000rebates     312500late charges  287500shipping    382500net       375500Name: amount, dtype: float64

Create steps for displaying changes.

step = blank.reset_index(drop=True).repeat(3).shift(-1)step[1::3] = np.nandisplay(step) 0     00    NaN0  3500001  3500001    NaN1  3200002  3200002    NaN2  3125003  3125003    NaN3  2875004  2875004    NaN4  3825005  3825005    NaN5  3755006  3755006    NaN6    NaNName: amount, dtype: float64

For "net" rows, in order not to double the stack, we need to ensure that the blank value is 0.

blank.loc["net"] = 0

Then, draw a picture and see what it looks like.

my_plot = trans.plot(kind='bar', stacked=True, bottom=blank,legend=None, title="2014 Sales Waterfall")my_plot.plot(step.index, step.values,'k')

It looks pretty good, but let's try to format the y axis to make it more readable. To this end, we use FuncFormatter and Python2.7 + syntax to cut off decimals and add a comma to the format.

def money(x, pos):  'The two args are the value and tick position'  return "${:,.0f}".format(x) from matplotlib.ticker import FuncFormatterformatter = FuncFormatter(money)

Then, combine them.

my_plot = trans.plot(kind='bar', stacked=True, bottom=blank,legend=None, title="2014 Sales Waterfall")my_plot.plot(step.index, step.values,'k')my_plot.set_xlabel("Transaction Types")my_plot.yaxis.set_major_formatter(formatter)

Complete Script

The basic graph works normally, but I want to add some labels and make some small format modifications. The following is my final script:

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom matplotlib.ticker import FuncFormatter #Use python 2.7+ syntax to format currencydef money(x, pos):  'The two args are the value and tick position'  return "${:,.0f}".format(x)formatter = FuncFormatter(money) #Data to plot. Do not include a total, it will be calculatedindex = ['sales','returns','credit fees','rebates','late charges','shipping']data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]} #Store data and create a blank series to use for the waterfalltrans = pd.DataFrame(data=data,index=index)blank = trans.amount.cumsum().shift(1).fillna(0) #Get the net total number for the final element in the waterfalltotal = trans.sum().amounttrans.loc["net"]= totalblank.loc["net"] = total #The steps graphically show the levels as well as used for label placementstep = blank.reset_index(drop=True).repeat(3).shift(-1)step[1::3] = np.nan #When plotting the last element, we want to show the full bar,#Set the blank to 0blank.loc["net"] = 0 #Plot and labelmy_plot = trans.plot(kind='bar', stacked=True, bottom=blank,legend=None, figsize=(10, 5), title="2014 Sales Waterfall")my_plot.plot(step.index, step.values,'k')my_plot.set_xlabel("Transaction Types") #Format the axis for dollarsmy_plot.yaxis.set_major_formatter(formatter) #Get the y-axis position for the labelsy_height = trans.amount.cumsum().shift(1).fillna(0) #Get an offset so labels don't sit right on top of the barmax = trans.max()neg_offset = max / 25pos_offset = max / 50plot_offset = int(max / 15) #Start label looploop = 0for index, row in trans.iterrows():  # For the last item in the list, we don't want to double count  if row['amount'] == total:    y = y_height[loop]  else:    y = y_height[loop] + row['amount']  # Determine if we want a neg or pos offset  if row['amount'] > 0:    y += pos_offset  else:    y -= neg_offset  my_plot.annotate("{:,.0f}".format(row['amount']),(loop,y),ha="center")  loop+=1 #Scale up the y axis so there is room for the labelsmy_plot.set_ylim(0,blank.max()+int(plot_offset))#Rotate the labelsmy_plot.set_xticklabels(trans.index,rotation=0)my_plot.get_figure().savefig("waterfall.png",dpi=200,bbox_inches='tight')

Run the script to generate the following beautiful chart:

Last thought

If you are not familiar with the waterfall chart before, I hope this example can show you how useful it is. I think some people may think it is a bit bad to need so much script code for a chart. In some respects, I agree with this idea. If you just make a waterfall chart and will not touch it again later, you should continue to use the Excel method.

However, what if a waterfall chart is really useful and you need to copy it to 100 customers? What will you do next? Using Excel is a challenge, and using the scripts in this article to create 100 different tables is quite easy. Once again, the true value of this program is that it facilitates you to create a program that is easy to copy when you need to expand this solution.

I really like learning more about Pandas, matplotlib, and IPothon. I'm glad that this method can help you, and hope other people can learn some knowledge from it and apply it to their daily work.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.