A tour of the waterfall chart that uses Python to draw data

Source: Internet
Author: User
Introduction

Waterfall charts are a very useful tool for drawing certain types of data. Not surprisingly, we can use Pandas and matplotlib to create a repeatable waterfall diagram.

Before I go down, I want to tell you what kind of chart I'm referring to. I will create a 2D waterfall diagram described in wikipedia article.

A typical use of this kind of chart is to display the value of + and-for the "bridge" between the start and end values. For this reason, financial staff sometimes call it a bridge. Similar to the other examples I used earlier, this type of drawing is not easy to generate in Excel and certainly has a way of generating it, but it's not easy to remember.

The key point to remember about a waterfall diagram is that it is essentially a stacked bar chart, but the special point is that it has a blank bottom bar, so the top bar is "suspended" in the air. Well, let's get started.
Create a chart

First, perform the standard input and ensure that the Ipython can display the Matplot graph.

Import NumPy as Npimport pandas as Pdimport Matplotlib.pyplot as plt%matplotlib inline

Set we want to draw the data from the waterfall and load it into the data frame (DataFrame).

The data needs to start with your starting value, but you need to give the final total. We will calculate it below.

index = [' Sales ', ' returns ', ' credit fees ', ' rebates ', ' late charges ', ' shipping ']data = {' Amount ': [350000,-30000,-7500,- 25000,95000,-7000]}trans = PD. DataFrame (Data=data,index=index)

I used the convenient display function in Ipython to more easily control what I wanted to display.

From Ipython.display import Displaydisplay (trans)

The best technique for a waterfall chart is to calculate the contents of the bottom stacked bar chart. On this point, I learned a lot from the discussion on StackOverflow.

First, we get accumulated and.

Display (Trans.amount.cumsum ()) Sales      350000returns     320000credit fees   312500rebates     287500late Charges  382500shipping    375500name:amount, Dtype:int64

This looks good, but we need to move the data from one place to the right.

Blank=trans.amount.cumsum (). Shift (1). Fillna (0) display (blank) sales        0returns     350000credit fees   320000rebates     312500late charges  287500shipping    382500name:amount, Dtype:float64

We need to add a net total to the trans and blank data frames.

Total = Trans.sum (). amounttrans.loc["NET"] = totalblank.loc["NET"] = Totaldisplay (trans) display (blank)

Sales        0returns     350000credit fees   320000rebates     312500late charges  287500shipping    382500net       375500name:amount, Dtype:float64

Create the steps we use to show changes.

Step = Blank.reset_index (drop=true). Repeat (3). Shift ( -1) Step[1::3] = Np.nandisplay (step) 0     xx    NaN0  3500001  3500001    NaN1  3200002  3200002    NaN2  3125003  3125003 NaN3 2875004  2875004    NaN4  3825005  3825005    NaN5  3755006  3755006    NaN6    nanname:amount, Dtype:float64

For the "net" line, to not double the stack, we need to make sure that the blank value is 0.

blank.loc["NET"] = 0

Then, draw it and see what it looks like.

My_plot = Trans.plot (kind= ' bar ', Stacked=true, Bottom=blank,legend=none, title= "Sales Waterfall") My_plot.plot ( Step.index, Step.values, ' K ')

It looks pretty good, but let's try formatting the y-axis to make it more readable. To do this, we use funcformatter and some python2.7+ syntax to truncate the decimal and add a comma to the format.

def money (x, POS):  ' The both args is the value and tick position '  return ' ${:,.0f} '. Format (x) from Matplotlib.tick ER import funcformatterformatter = funcformatter (Money)

Then, they are grouped together.

My_plot = Trans.plot (kind= ' bar ', Stacked=true, Bottom=blank,legend=none, title= "Sales Waterfall") My_plot.plot ( Step.index, Step.values, ' K ') My_plot.set_xlabel ("Transaction Types") My_plot.yaxis.set_major_formatter (Formatter)

Full Script

The basic graphic works fine, but I want to add some tags and make some minor formatting changes. Here's my final script:

Import NumPy as Npimport pandas as Pdimport Matplotlib.pyplot as Pltfrom matplotlib.ticker import funcformatter #Use Pytho n 2.7+ syntax to format currencydef money (x, POS): ' The both args is the value and tick position ' return ' ${:,.0f} '. Form at (x) formatter = Funcformatter (money) #Data to plot. Don't include a total, it'll be calculatedindex = [' Sales ', ' returns ', ' credits fees ', ' rebates ', ' late charges ', ' shipping '] data = {' Amount ': [350000,-30000,-7500,-25000,95000,-7000]} #Store data and create a blank series to use for the waterfall trans = PD. DataFrame (data=data,index=index) blank = Trans.amount.cumsum (). Shift (1). Fillna (0) #Get The net total number for the final element in the Waterfalltotal = Trans.sum (). amounttrans.loc["NET"]= totalblank.loc["NET"] = Total #The steps graphically s How the levels as well as used for label placementstep = Blank.reset_index (drop=true). Repeat (3). Shift ( -1) step[1::3] = NP.N A #When plotting the last element, we want to show the full bar, #Set the blank to0blank.loc["NET"] = 0 #Plot and Labelmy_plot = Trans.plot (kind= ' bar ', Stacked=true, Bottom=blank,legend=none, figsize= ( (5), title= "Sales Waterfall") My_plot.plot (Step.index, Step.values, ' K ') My_plot.set_xlabel ("Transaction Types")  ) #Format the axis for Dollarsmy_plot.yaxis.set_major_formatter (formatter) #Get the y-axis position for the Labelsy_height = Trans.amount.cumsum (). Shift (1). Fillna (0) #Get an offset so labels don ' t sit right on top of the Barmax = Trans.max () neg _offset = Max/25pos_offset = max/50plot_offset = Int (MAX/15) #Start label looploop = 0for index, row in Trans.iterro WS (): # for the last item in the list, we don't want to double count if row[' amount '] = = Total:y = Y_height[loop] E Lse:y = Y_height[loop] + row[' amount '] # determine if we want a neg or pos offset if row[' amount '] > 0:y + = P Os_offset else:y-= Neg_offset my_plot.annotate ("{:,. 0f}". Format (row[' Amount '), (loop,y), ha= "center") loop+=1 #Scal e up the Y axis so there isFor the Labelsmy_plot.set_ylim (0,blank.max () +int (Plot_offset)) #Rotate the Labelsmy_plot.set_xticklabels ( trans.index,rotation=0) My_plot.get_figure (). Savefig ("Waterfall.png", dpi=200,bbox_inches= ' tight ')

Running the script will generate the following beautiful chart:

The final idea

If you're not familiar with the waterfall diagram, hopefully this example will show you how useful it really is. I think some people might think that it's a bit of a bad thing to need so much scripting code for a chart. In some ways, I agree with this idea. If you just make a waterfall, and you don't touch it later, you can continue using Excel.

However, what if the waterfall is really useful and you need to copy it to 100 customers? What are you going to do next? Using excel at this point would be a challenge, and it would be fairly easy to use the script in this article to create 100 different tables. Again, the real value of this process is that when you need to extend the solution, it makes it easy for you to create a program that is easy to replicate.

I really enjoy learning more about pandas, matplotlib and Ipothon. I'm glad that this approach can help you and hopefully others can learn something from it and apply this lesson to their daily work.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.