8 Productivity Hacks for Data scientists & Business analystsintroduction
I was catching up with one of my friends from a past organization. She had always been interested in data science, but is only able to broke into it about months ago. She had joined an organization as a data scientist and is clearly learning a lot in her (relatively) new role. Over our conversation, she mentioned a fact/question, which have stuck with me since then. She said that irrespective's how well she performs, she ends up doing every project/analysis multiple times before it I s satisfactory for her manager. She also mentioned that these iterations cause hers work to take lot more time than it should actually require in hindsight !
Does that familiar? Do you repeat your analysis multiple times before it becomes presentable and throws out answers to the required questions? Or you end up writing codes for similar activities again and again? If It does, you is at the right place. I ' ll share a few ways in which you can increase your productivity and kill these unwanted iterations.
p.s. Don ' t get me wrong here. I am not saying that iterations is bad in entirety. In fact, the data science as a subject requires your to does things in iterations at times. Iterations is healthy and it's those unhealthy iterations we need to avoid. They'll is the focus of this article.
What causes these iterations?
I am Defining an iteration healthy/unhealthy using the following definition. Any iteration in analysis, which are happening due to any reason apart from flow of new information are an unhealthy iterati On (there are one exception to this, which is mentioned below). Let me explain a few such scenarios:
- The business problem is not laid out correctly. The problem customer wanted to solve are different from the one that has been working on.
- iteration because you need to collect more variables, which do not think you'll need upfront.
- iteration because you do not think on the biases or the assumptions impacting your analysis.
On the other hand, if your iteration are happening because you built a model 6 months back and your now has new information , it is a healthy iteration. Another scenario for healthy iteration are when you deliberately start with simple models to develop better understanding a nd then build complex models.
Now, I am sure I has not covered all possible scenarios here, but I am sure these examples is good enough for your to Jud GE whether an iteration in your analysis is healthy or unhealthy.
Impact of these productivity killers
Let's get this clear–no one want ' s unhealthy iterations and productivity killers in their analysis. Missing out in a few variables initially and then running the entire analysis again after collecting them would not intere St any data scientist. Also, there is no fun in doing the same analysis again!
This productivity loss and iterations create frustation and dis-satisfaction among the Analysts/data scientists and Henc E should is avoided at all costs.
Tips to avoid unhealthy iterations and increasing productivitytip 1:focus on big problems (with big problems only):
I am sure every organization have a lot of small problems, which can is solved using data. But, they is not the most use of the data scientists. Focus on just those 3-4 problems, which can has huge impact on the organization. These problems would is challenging and would give you the maximum leverage for your analysis as well. You should not try to solve a smaller problem, if the bigger problem is unsolved.
This might-trivial, but the number of organizations which make this mistake is non-trivial! I See banks working in marketing analytics when their risk scoring can be improved. Or Insurance companies trying to build a reward program for agents when their customer retention can is improved using an Alytics.
Tip 2:create A presentation of your analysis before your start (with possible layouts and branches)
I do this all the time and I can ' t tell how beneficial. The first thing you should does as soon as you start a project are to layout the presentation of your analysis. This might-counter-intuitive to start with, but once you develop this habit, it can reduce your project turn around Time to a fraction the what it takes otherwise.
So, what does?
You layout the stories in form of a presentation/a Word document or just a stories on pen and paper. The actual form is immaterial. What's important is so you layout all possible outcomes at the start of the journey. For example, if-looking to reduce the charge offs, a structure-lay out on your presentation would is something Like this:
Next, you can take up each factor and define "what does you need" to "see" conclude whether it has driven the increase in Cha Rge-off and how would you go about doing this? For example, if the charge-offs for the bank has increased because of increase in credits limit of customers, you would:
- First, need to ascertain so the customers who were not offered a credits limit increase did not worsen off in the charge Offs.
- Next, put a mathematical equation trying to size the effect.
Once you had done this with every possible branch of your analysis, you had created a good starting point for yourself.
TIP 3:define Data requirements Upfront:
This flows from the last step directly. If you have a laid out of the analysis comprehensively, you would know the data requirements by the end of it. Here is a few tips to help you out:
- Try and put a structure to your data requirement: Instead of putting-a list of variables, you should design the tables-would for want analysis. In the case above (increased charge offs), you'll need a customer demographic table, a table for past marketing Campaign s, transactions done by customers for last months, credits policy changes for the bank etc.
- Collect All the data might need: Even if you aren't 100% sure, whether you'll need all the variables in the data set, you should go ahead and collect th Em at the this stage. Normally, it's very little incremental work to include additional variables at this stage, rather than re-asking for Vari Ables to is collected at a later point in the analysis.
- Define the time period of the data you is interested in
Tip 4:make sure your analysis is reproducible:
Again, this might sound as a simple tip–but you see both the beginners as well as the advanced people falter on it. The beginners would perform steps in Excel, which would include copy paste of data. For the advanced users, through command line interface might is not reproducible.
Similarly, need to extra cautious and working with notebooks. Should control your urge to go back and change any previous step which uses the data set which have been computed later In the flow. Notebooks can very powerful, if the flow is maintained. If the flow isn't maintained, they can be very tardy as well.
Tip 5:keep Standard libraries of codes ready and accessible:
There is no point in re-writing codes for simple operations again and again. Not only it takes extra time, but it might leads to possible syntax errors. Another tip to make the most of this is to create a library of these common operations and share it across your entire tea M.
This won't be sure the entire team uses the same code, but also make them more efficient.
Tip 6:similarly, keep a library of intermediate Datamarts:
A lot of times, you need same piece of information again and again. For example, you'll need total customer spend on a credits card for several analysis and reporting. While you can calculate it every time your need from the transaction tables, it's much better to again create intermediate Datamarts of these tables to save time and efforts spent in creating these tables. Similarly, think of summary tables for marketing campaigns. There is no point in re-inventing the wheels every time.
Tip 7:always Use a holdout sample/cross-validation to avoid over-fitting
A lot of beginners under-estimate the power of holdout or cross-validation. A lot of tend to believe so if train is sufficiently large, there was hardly/no chances of over-fitting and hence a CR Oss-validation or holdout sample is not required.
More often that isn't, this turns out to be blooper in the end. Don ' t believe Me–check out Kaggle public and private leader boards for any competition. You'll always find a few entries in top ten who end up dropping their ranks as they ended up overfitting their solutions. And you would hope these to is more advanced data scientists.
Tip 8:work in chunks and take breaks regularly:
When does I work the best? It's when I provide myself a 2-3 hours windows to work on a problem/project. You can ' t multi-task as a data scientist. You need to is focuses on a single problem at a time to make sure you get the best out of yourself. 2-3 hour chunks work best for me, but can decide yours.
End Notes:
So, those were some productivity hacks I use for increasing my productivity. I can ' t emphasize the importance of getting things right the first time enough. You have the to-get into a habit of getting it right every time–that are what would make your an awesome data scientist.
Does any of the tips which makes you more productive? If Yes, share it with us in comments below.
If you are just read & want to continue your analytics learning, subscribe to US emails, follow we on Twitt Er or like our Facebook page.
8 Productivity Hacks for Data scientists & Business Analysts