A self-taught data scientist tells you 5 of the right posture to learn big data!

Source: Internet
Author: User

For data science, it is the golden age of development. It's a new field, but it's growing fast and the gap between data scientists is huge, with an average annual salary of $100,000, it says. Where there is a high salary, it attracts people, but the gap in data science skills means that many people need to study hard.

The first step, of course, is to ask "How do I Learn Data science", the answer to this question is often a long list of courses and books, from linear algebra to statistical data, which I have learned over the years. I don't have a programming background, but I know I like to work with data.

I don't understand how to give a long book or skill sheet to a person who doesn't understand the background. It's like a teacher gives you a bunch of textbooks and tells you to "read all this." I questioned this approach when I was at school, and I would never do that when I was studying data science.

Some people can study well through books, but I think the best way to learn is through practice, know what you really need to learn , and most importantly, when you learn in this way, you can get the skills you can use immediately. This is also a point I share with many beginners.

That's why I don't think your first goal should be to learn linear algebra or statistics. If you want to learn big data, your first goal should be to learn to love data .

1 Learning love data

No one is talking about motivation in learning. Data science is a broad and vague field that makes it difficult to learn. Without motivation, you end up stopping losing faith in yourself.

You need something to motivate you to learn, even if the formula has begun to blur in the middle of the night, you still want to explore the meaning of neural networks. You need some motivation to find the link between statistics, linear algebra, and neural networks, and when you're confused "what should I learn next?" "At the time.

The entrance to my study was to use data to predict the stock market, even though I was completely unfamiliar at the time. The first projects I coded were used to predict stocks with little statistics, but I knew they were not doing well, so I worked day and night to make them better.

I am obsessed with improving the performance of the program, I am obsessed with the stock market, I learn to love data. I'm going to learn all the skills that will make this project a better result.

Not everyone will be obsessed with stock market forecasts, but it's important to find what you want to learn.
A map of mobile device usage around the world

The data can calculate a lot of new and interesting things about your city, such as the mapping of all the devices on the Internet, the location of the real NBA players, the places where there are refugees this year, or something else. The great thing about data science is that there are infinitely interesting things to discover-that is, ask questions and find a way to get answers.

2 Learning in practice

Learning about neural networks, image recognition, and other cutting-edge technologies is important, but most data science work does not involve these:

    • 90% of the work will be data cleanup.

    • Mastering several algorithms is better than knowing a little bit about algorithms.

    • If you know linear regression, K–means clustering, and logistic regression, you can interpret and interpret their findings, and you can use these to complete a project, you will be better than if you know each algorithm, but do not use them better.

    • Most of the time, when you use an algorithm, it will be a version of the library (you will rarely encode your own support vector machine implementation-it takes too long).

All of this means that the best way to learn is to work on a project and you can get useful skills through projects .

One way to do this is to find a data set you like in a project and answer an interesting question.

Here are some good places:

    • 100+ interesting data sets for statistical data http://rs.io/100-interesting-data-sets-for-statistics/

    • Data Set subreddit https://www.reddit.com/r/datasets

    • UCI Machine Learning Library http://archive.ics.uci.edu/ml/

Another approach is to find a deep-seated problem, such as predicting the stock market and then breaking it down into small steps . I first connected to the Yahoo Finance API and climbed down the daily price data. Then I created some indicators, such as the average price over the past few days, and used them to predict the future (there are no real algorithms, just technical analysis). This effect is not very good, so I learned some statistical knowledge, and then use linear regression. Then connect to another API, clean up every minute of data, and store it in a SQL database, and so on until the algorithm works well.

The advantage of doing this is that I study in a learning environment. Not only did I learn SQL syntax, I used it to store price data, but I learned more than 10 times times more than just learning grammar. Learning without applying knowledge is difficult to retain and will not be ready when you do the actual work.

3 Learning to communicate

data scientists need to constantly demonstrate their findings . This process can differentiate the level of data scientists.

Part of communication is understanding and theory of the subject, and the other is understanding how to organize your results. The last part is to be able to clearly explain your analysis.

It's hard for me to find the concept of effective communication, but there are some things you should try:

      • Start blogging. Show the results of your data analysis.

      • Try to teach people who have little idea about data science and technology, such as your friends and family, that can help you understand concepts.

      • Try to give a speech at the party.

      • use GitHub to manage all your analytics .

      • Active in some communities, such as Quora, Datatau, machine learning subreddit.

4 Learning from peers

You can't imagine how much you'll learn from your peers, and teamwork is important in data work.

    • Find some companions in the party.

    • Open source packages.

    • Send a message to those who write interesting data analysis blogs to see if there is any possibility of cooperation.

    • Try participating in the Kaggle contest to see if you can find teammates.

5 Increasing the difficulty of learning

Are you fully acquainted with the work of this project? The last time you used a new concept was a week ago? Then it's time to make some more difficult challenges. If you stop climbing, then behind.

If you find yourself too comfortable, here are some suggestions:

    • Handle larger datasets. Learn to use Spark.

    • See if you can make your algorithm faster.

    • How will you extend the algorithm to multiple processors? Can you do it?

    • Understand more theoretical algorithms and use them. Is this going to change your assumptions?

    • Try to teach a novice to do the same thing you are doing right now.

6 Summary

These are at least one idea that tells you what to do when you start learning data science. If you have done this, you will find that your abilities have naturally increased.

I don't like the idea of "a list" because it makes it difficult for me to do it on the go. I find many people halfway through the course of a book or MOOC study. I personally believe that if you have the right goals, anyone can learn data science.

I am also the founder of Dataquest. This is a website that helps you to learn big data, which includes a lot of excellent learning experiences and discussions. You can analyze some interesting datasets, such as CIA files and player stats. You can also complete a number of projects, such as creating a portfolio. If you don't know how to analyze it, it's not a problem, we'll teach you python. We teach Python because it is the most elementary and friendly language for the scientific work of mass production data and can be used in a wide variety of applications.

-end-

Gmt
Detect languageAlbanianArabicAzerbaijani languageIrishEstonianBasque languageBelarusian languageBulgarianIcelandicPolishBosnianPersianBoolean language (Afrikaans)DanishGermanRussianFrenchFilipinoFinnishKhmer languageGeorgian languageGujaratiKazakhHaitian CreoleKoreanHausa languageDutchGalicianCatalanCzechKannada languageCroatianLatin languageLatvianLao languageLithuanianRomanian languageMalagasy languageMalteseMarathiMalayalamMalayFYRO MacedonianMaoriMongolianBengaliBurmese languageHmongZulu, South AfricaNepalese languageNorwegianPunjabiPortugueseChichewa languageJapaneseSwedishSerbian languageSesotho languageSinhala languageWorld languageSlovakSlovenianSwahiliCebu languageSomalia languageTajik languageTeluguTamilThaiTurkishWelshUrdu languageUkrainianUzbek languageHebrewGreekSpanishHungarianArmenianIgbo languageItalianYiddishHindiIndonesian SundaIndonesian languageIndonesian JavaneseEnglishYorubaVietnameseChinese SimplifiedChinese Traditional AlbanianArabicAzerbaijani languageIrishEstonianBasque languageBelarusian languageBulgarianIcelandicPolishBosnianPersianBoolean language (Afrikaans)DanishGermanRussianFrenchFilipinoFinnishKhmer languageGeorgian languageGujaratiKazakhHaitian CreoleKoreanHausa languageDutchGalicianCatalanCzechKannada languageCroatianLatin languageLatvianLao languageLithuanianRomanian languageMalagasy languageMalteseMarathiMalayalamMalayFYRO MacedonianMaoriMongolianBengaliBurmese languageHmongZulu, South AfricaNepalese languageNorwegianPunjabiPortugueseChichewa languageJapaneseSwedishSerbian languageSesotho languageSinhala languageWorld languageSlovakSlovenianSwahiliCebu languageSomalia languageTajik languageTeluguTamilThaiTurkishWelshUrdu languageUkrainianUzbek languageHebrewGreekSpanishHungarianArmenianIgbo languageItalianYiddishHindiIndonesian SundaIndonesian languageIndonesian JavaneseEnglishYorubaVietnameseChinese SimplifiedChinese Traditional
Language features limited to 100 character options: History: Help: Anti-feedback

A self-taught data scientist tells you 5 of the right posture to learn big data!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.