The complete learning Path of data science

Source: Internet
Author: User
Tags python script

Reference Link: Https://www.tuicool.com/articles/QBZzquY

The journey from Python rookie to Python Kaggler (Kaggle is a data modeling and data analysis competition platform)

If you want to be a data scientist, or already a data scientist, you want to expand your skills, then you've come to the right place. The purpose of this article is to provide a complete learning path for Python novices in data analysis. This path provides a complete overview of all the steps you need to learn to use Python for data analysis. If you already have some background knowledge, or you don't need all the content in the path, you can adjust your own learning path and let people know how you are adjusting.

Step 0: Warm up

Before you begin your study journey, answer the first question: Why use Python? Or how Python works.
Watch Datarobot founder Jeremy's 30-minute speech at Pycon Ukraine 2014 to see how useful python is.

Step 1: Set up your machine environment

Now that you have decided to study hard, it is time to set up your machine environment. The easiest way is to download the Anaconda from the Continuum.io. Anaconda will pack up most of the things you might use later. The main disadvantage of adopting this approach is that you still need to wait for continuum to update the Anaconda package even though there may already be updates to the underlying libraries available. Of course, if you are a beginner, this should be fine.

If you encounter any problems during the installation process, you can find more detailed installation instructions in different operating systems here.

Step 2: Learn the basics of the Python language

You should first understand the basics, libraries, and data structures of the Python language. The Python lesson on Codecademy is one of your best choices. After completing this course, you can easily use Python to write small scripts, as well as understand the classes and objects in Python.

Specific Learning content: List lists, tuple tuples, dictionary dictionaries, List derivation, dictionary derivation type.
Task: Solve some of the Python tutorials on Hackerrank, which will give you a better way to think about problems in a Python script.
Alternative resources: If you don't like the learning style of interactive coding, you can also learn Google's Python lessons. This 2-day course series contains not only the Python knowledge mentioned earlier, but also some of the things that will be discussed behind it.

Step 3: Learn regular expressions in the Python language

You will often use regular expressions to clean up data, especially when you are working with text data. The best way to learn regular expressions is to take a Google Python course that will make it easier for you to use regular expressions.

Task: To do regular expression exercises about a child's name.

If you need more practice, you can participate in this text cleanup tutorial. The various processing steps involved in data preprocessing will be a challenge for you.

Step 4: Learn the science library in Python-numpy, scipy, matplotlib, and pandas

From this point on, the learning journey will be interesting. Below is an introduction to each library, you can do some common operations:

• Complete the exercise based on the NumPy tutorial, especially to practice array arrays. This will lay a good foundation for the learning journey below.
• Learn the SciPy tutorial next. After reading scipy introduction and basic knowledge, you can learn the remaining content according to your own needs.
• There is no need to learn matplotlib tutorials here. Matplotlib's content is too broad for our needs here. Instead, you can learn the first 68 lines of this note.
• Finally learn pandas. Pandas provides Dataframe functionality (similar to r) for Python. This is where you should spend more time practicing. Pandas will be the most effective tool for all medium-scale data analysis. As a start, you can read a brief 10-minute introduction about Pandas and then learn a more detailed pandas tutorial.
You can also learn about two blogs exploratory the contents of the data analysis with pandas and the data munging with pandas.

Additional Resources:
• If you need a book on pandas and NumPy, it is recommended that Wes McKinney write "Python for Data analysis".
• There are also a number of pandas tutorials in the pandas documentation, which you can view here.

Task: Try to solve this task of Harvard CS109 Course.

Step 5: Useful data visualization

Take part in this course of CS109. You can skip the front 2 minutes, but the contents are dry. You can follow this task to complete the course of study.

Step 6: Learn the Scikit-learn library and machine learning content

Now, we're going to start learning the real part of the whole process. Scikit-learn is the most useful Python library in the field of machine learning. Here is a brief overview of the library. Completing the Harvard CS109 Course 10 to course 18, these courses include an overview of machine learning, as well as unsupervised algorithms such as regression, decision tree, overall model, and clustering. You can complete the course according to the tasks of each course.

Additional Resources:

• If there is a book that you must read, recommend programming Collective Intelligence. Although a little old, the book is still one of the best books in the field.
• In addition, you can also attend machine learning courses from Yaser Abu-mostafa, one of the best machine learning courses. If you need a more understandable explanation of machine learning techniques, you can choose from Andrew Ng's machine learning course and use Python to do related course exercises.
· Scikit-learn's Tutorial

Task: Try this challenge on the Kaggle

Step 7: Practice, practice, practice

Congratulations, you have completed the whole study journey.

You have now learned all the skills you need. Now is the question of how to practice, and what better way to practice than to compete with data scientists on kaggle. Go deep into a current kaggle game and try to use all the knowledge you've learned to finish the game.

Step 8: Deep learning

Now that you've learned most of the machine learning techniques, it's time to focus on the depth of your studies. Most likely you already know what deep learning is, but if you still need a brief introduction, you can look here.

I am also a novice in depth study, so please have a selective acceptance of some of the suggestions below. Deeplearning.net has the most comprehensive resources in depth learning, where you will find everything you want-lectures, datasets, challenges, tutorials, etc. You can also try to participate in the Geoff Hinton course to understand the basics of neural networks.

PS: If you need large data library, you can try Pydoop and Pymongo. The large data learning route is not the scope of this article, because it is a complete subject itself.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.