One Facts about the Data science which you must know

Source: Internet
Author: User

One Facts about the Data science which you must know

Statistics, machine learning, Data science, or analytics–whatever-call it, this discipline was on rise in the last Quarte R of Century primarily owing to increasing data collection abilities and exponential increase in computational power. Field is drawing from pool of engineers, mathematicians, computer scientists, and statisticians, and increasingly, is Dema Nding multi-faceted approach for successful execution. In fact, no branch of engineering, science, or business are far from touch of the analytics in any industry. Perhaps you, too, is interested in being, or already is, a data scientist.

However, as one journeys through his/her career in analytics, some truths start becoming evident over time. And while none of them is ground-shattering, they often surprise novices in the field. So, it's worthwhile to know one absolute facts of data science.

1. Data is never clean

Analytics without real data is mere collection of hypotheses and theories. Data helps test them and find the right one suitable in context of end-use in hand. However, in real world data are never clean. Even in organizations which has well established data science centers for decades, data isn ' t clean. Apart from missing or wrong values, one of the biggest problems refers to joining multiple datasets into coherent whole. Join key may is consistent or granularity or format may is suitable. And it ' s not intentional. Data storage Enterprises is designed and tightly integrated with front-end software and user who is generating data, and is often independently created. Data scientist enters the scene quite late, and often is just "taker" of data as-in and not part of design.

2. You'll spend most of your time cleaning and preparing data

Corollary to above are that large part of your time would be spent in just cleaning and processing data for model Consumptio N. This usually annoys people new to industries. With brilliant mind bursting with sophisticated machine learning methods, spending Three-fourth of the time with just data Wrangling seems waste of talent and time. Often This leads to dissatisfaction and lack of attention–errors from which can come to bite even the most fanciest of T He algorithms. If You cannot does this with the equanimity and focus on the big picture, then the perhaps you should the aim for the The statistics rat Her than career in data science.

3. There is no full automated data science. You need to get your hands dirty

Since data is not clean and requires quite a lot of data processing, there are no ready set of scripts or buttons to push t o Develop analytic model. Each data and problem is different. There is no substitute for exploring data, testing models, and validating against business sense and domain experts. Depending on problem and your prior experience, if you could dirty your hands less, but dirty you'll. Only exception are if you get data in specific format and does the same thing over and over, but that already sounds boring, Isn ' t it?!

4.95% of the tasks do not require deep learning

95% is obviously a made up number–but the idea was that most real life problems don ' t require advance analytic capabiliti Es. Solving real-world problems involves IoT more understanding Real-world, problem domain, decision makers and end-users, tha N Understanding latest and greatest discovery in statistics. What moves the needle, and moves it quick, was much more valuable than what is rigorous and pure. Often, simplest models like linear regression, logistic regression, and K-means clustering work wonders as long as problem is well formulated. Even for complex problems, simple models can provide large gains which complex models can only improve marginally. That's not to say that complicated models has no place. In fact, depending-on-money riding, 0.1% increase in prediction accuracy may be worth millions of dollars.

5. Big Data is just a tool

With the hype around Big Data getting louder every day, I won ' t blame your for being enamored of the idea. However, key thing to remember the Big Data is just collection of tools to work with large volume of the data in reasonable t IME and with commodity grade computer hardware. Underlying analytic problem design, modeling best practices, and scrutinizing eyes of astute analyst aren ' t replaceable WI Th Big Data. That isn't to say this competency in Big Data techniques isn ' t handy–it are, more so since world is moving towards Big D ATA and there may not be "Small" Data in couple of years anymore. But tools would come and go; Your machine learning experience would only persist. Big data is like analogous to AK47 rifle forpoliceman rather than flintlock carbine rifle. Sure, better tool is preferable to inferior, but being trained in policing are more important than rifle.

6. You should embrace the Bayesian approach

Data Science is sequence of hypotheses testing. You have the going-in belief which you want to prove right or wrong based on observation from data. Stronger is your going-in belief, more counter-evidence you need to prove belief wrong. That, in essence, is Bayesian approach. But while the proving your hypothesis right through data are important, proving alternative hypothesis wrong is also equally im Portant. Take This fun puzzle from the New York times to figure out how to think Bayesian.

Alternative to Bayesian thinking are to let your data tell you stories. This can is problematic because sliced and diced some, data would always tell a story. But without A-priori belief, the story may is not being true in reality. This was often case of hindsight bias and poor (and often staple of motivational and self-help books). If you want to find differences in and groups (successful business versus non-successful, athletes versus slobs, rich vers US poor), you can always find some. There is hundreds of thousands of human characteristics that some would come out different just by chance. That's doesn ' t mean that those characteristics made someone different from others. On the other hand, if you had reasonable hypothesis about what could is causing difference, can verify if you are rig HT or not. In the end, either you explain results from model based to your understanding, or you modify your understandings. There is no. saying that length of Nose-hair is predictive of incomE of the person in the year fifty because model says so.

7. No One cares how do you do it

Consumers of data science models is decision makers and executives, and they want workable and useful model. While it's tempting for data scientists to explain technical expertise behind the model and show-off the analytic rigor, t He is often counter-productive. Your audience cares about outcome and end-use and isn ' t bothered on the decision engine you have put together. In fact, the complicated explanations about mathematics of model is sure to bore your users and intimidate against use. Save your expertise with technical discussions among your data science peers.

8. Academia and business is different worlds

This applies to almost all disciplines and analytics are no exception. Focus in academics are on discovering new methods and proving new theorems. Focus in business are on solving a problem and making. Doesn ' t matter if analytics behind the solution is fancy or not, and no one cares on that anyway. Speed was often of more essence than accuracy. Every business analytic solution should solve a real-life problem and directly or indirectly should contribute to bottom l Ine.

9. Presentation is key

Since End-User and decision maker is often non-mathematical person, selling a analytic solution isn ' t different from othe R sells. Can sell on quality–analytic accuracy–but you can also sell on emotions, aesthetics, stories, human angle, and money . Being able to explain your method in simple terms and align with end-users ' interest are art that all data scientists who w Ants to make significant non-theoretical mark in world must master. At least for a while, that is means, story-telling through PowerPoint should remain key weapon in your arsenal.

All models is wrong, but some is useful

Models, by definition, model some ' truth ' on the world. Since World was infinitely complex (think Quantum mechanics!), models is approximations of reality. Some models is more wrong than others, but all is wrong. However, they can be, and often is, useful since they is better than alternative of no model and no prediction. Realizing what is aiming for and why we are competing against can is important in shaping our analytic design process –and checking our egos.

Just because analytic model is great doesn ' t mean it'll see light of day

As fun as data science is, there are more to the world than your analytical model. If you see about a third or more of the your work getting implemented or used then consider yourself lucky. Notwithstanding analytic capabilities, analytic project get shelved for various reasons all the time, including, Data chan GED, problem changed, no one interested in solution, implementation too expensive, benefit don't worth the cost, someone ELS E did it first, and solution too advanced for its time. Be calm and carry on.

I realize that perhaps there is more than 11. And perhaps some of these could be clubbed together. Point was not about counter, but on importance of internalizing these realities of industry we want to being part of. Difference companies and industries might be at different spectrum of these facts, but collectively knowing and understand ing these ' facts ' would make one a more satisfied, broad minded, and better data scientist.

(Did I miss any fundamental fact of the data science? Share in comments below.)
Most facts is picked from reddit.com

Other articles by the same author

Curse dimensionality

semi-supervised Clustering

Other related Links the

Overview of Text Mining

Role of Business Analyst

One Facts about the Data science which you must know

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.