Complete data science learning path (Python version), path python

Source: Internet
Author: User

Complete data science learning path (Python version), path python

Reprinted from: http://python.jobbole.com/80981/

Connection: https://www.analyticsvidhya.com/learning-paths-data-science-business-analytics-business-intelligence-big-data/learning-path-data-science-python/

A journey from Python cainiao to Python Kaggler)

If you want to become a data scientist or already a data scientist, and you want to expand your skills, you have come to the right place. The purpose of this article is to provide a complete learning path for new Python beginners in data analysis. This path provides a complete overview of all the steps to use Python for data analysis. If you already have some background knowledge or do not need all the content in the path, you can adjust your learning path at will and let everyone know how you adjust it.

Step 0: warm up

Before you start learning the journey, answer the first question: Why is Python used? Or, how does Python work?
Watch the 30-minute speech by DataRobot Founder Jeremy at PyCon Ukraine 2014 to learn how useful Python is.

Step 1: Set your machine environment

Now you are determined to study hard and set up your machine environment. The simplest method is to download the distribution package Anaconda from Continuum. io. Anaconda packs most of the things you may use in the future. The main disadvantage of using this method is that you still need to wait for Continuum to update the Anaconda package even if there may already be updates to the available underlying library. Of course, if you are a beginner, this should be fine.

If you encounter any problems during the installation process, you can find more detailed installation instructions for different operating systems.

Step 2: learn the basic knowledge of the Python language

You should first understand the basic knowledge, library, and data structure of the Python language. The Python course on Codecademy is one of your best options. After completing this course, you can easily use Python to write small scripts and understand classes and objects in Python.

Details: Lists, Tuples, Dictionaries, list derivation, and dictionary derivation.
Task: solve some Python tutorial questions on HackerRank. These questions can help you better think about problems using Python scripts.
Alternative Resources: if you do not like interactive encoding, you can also take Google's Python course. This two-day course series includes not only the Python Knowledge mentioned earlier, but also some things to be discussed later.

Step 3: Learn Regular Expressions in Python

You often use regular expressions to clean up data, especially when processing text data. The best way to learn regular expressions is to take Google's Python course, which makes it easier for you to use regular expressions.

Task: Perform regular expression exercises on children's names.

If you need more exercises, you can take part in this text cleanup tutorial. The various processing steps involved in data preprocessing will be a great challenge for you.

Step 4: Learn the Python scientific libraries-NumPy, SciPy, Matplotlib, and Pandas

Starting from this step, the learning journey will become interesting. Below is a brief introduction to each database. You can perform some common operations:

• Complete the exercises based on the NumPy tutorial, especially the array arrays. This will lay a solid foundation for the following learning journey.
• Next, we will learn the Scipy tutorial. After reading Scipy introduction and basic knowledge, you can learn the remaining content as needed.
• The Matplotlib tutorial is not required here. For our requirements here, Matplotlib is too extensive. Instead, you can learn the first 68 rows in this note.
• Learn Pandas at last. Pandas provides DataFrame (similar to R) for Python ). This is where you should spend more time practicing. Pandas will become the most effective tool for all medium-scale data analysis. At the beginning, you can first take a brief introduction to Pandas in 10 minutes, and then learn a more detailed Pandas tutorial.
You can also learn about the Exploratory Data Analysis with Pandas and Data munging with Pandas blogs.

Additional resources:
• If you need a book about Pandas and Numpy, we recommend that you write "Python for Data Analysis" by Wes McKinney ".
• There are also many Pandas tutorials in Pandas documents. You can view them here.

Task: try to solve this task of the Harvard CS109 course.

Step 5: Useful data visualization

Take this CS109 course. You can skip the last two minutes, but the subsequent content is dry. You can complete the course based on this task.

Step 6: learn the Scikit-learn library and machine learning content

Now, we are going to start learning the essence of the entire process. Scikit-learn is the most useful Python library in the machine learning field. Here is a brief overview of the database. Complete courses ranging from 10 to 18 for the Harvard CS109 course. These courses include an overview of machine learning and introduce supervised algorithms such as regression, decision tree, and overall model, as well as unsupervised algorithms such as clustering. You can complete the corresponding courses based on the tasks of each course.

Additional resources:

• If you have a book that you must read, Programming Collective Intelligence is recommended. Although this book is a bit old, it is still one of the best books in this field.
• You can also take machine learning courses from Yaser Abu-Mostafa, one of the best. If you need an easy-to-understand explanation of machine learning technology, you can select a machine learning course from Andrew Ng and use Python for relevant course exercises.
• Scikit-learn tutorial

Task: Try this challenge on Kaggle

Step 7: practice, practice, and then

Congratulations, you have completed the entire learning journey.

You have learned all the skills you need. The question is how to practice now. Is there a better way to practice than to compete with data scientists in Kaggle? Go deep into a game in progress on the current Kaggle and try to use all the knowledge you have learned to complete the game.

Step 8: Deep Learning

Now you have learned most of the machine learning technologies. It's time to take a look at deep learning. You may already know what deep learning is, but if you still need a brief introduction, you can refer to it here.

I am also a beginner in deep learning, so please take the following suggestions selectively. Deeplearning.net has the most comprehensive resources for deep learning. Here you will find everything you want-lectures, datasets, challenges, tutorials, etc. You can also take the Geoff Hinton course to learn about neural networks.

Note: If you need a big data library, try Pydoop and PyMongo. The big data learning route is not the scope of this article because it is a complete topic.

 

Reprinted from: http://python.jobbole.com/80981/

Used for learning and communication.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.