Comprehensive learning path – Data Science in Python

最後更新：2015-03-13 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：python 機器學習 big data library

http://blog.csdn.net/pipisorry/article/details/44245575

關於怎麼學習python，並將python用於資料科學、資料分析、機器學習中的一篇很好的文章

Comprehensive(綜合的) learning path – Data Science in Python Journey from a Pythonnoob(新手) to a Kaggler on Python

So, you want to become a data scientist or may be you are already one and want toexpand(擴張) your toolrepository(貯藏室). You have landed at the right place. The aim of this page is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensiveoverview(綜述) of steps you need to learn to use Python for data analysis. If you already have some background, or don’t need all thecomponents(成分), feel free toadapt(適應) your own paths and let us know how you made changes in the path.

Step 0: Warming up

Before starting your journey, the first question to answer is:

Why use Python?

How would Python be useful?

Watch the first 30 minutes of this talk from Jeremy, Founder of DataRobot at PyCon 2014, Ukraine to get an idea of how useful Python could be.

Step 1: Setting up your machine

Now that you have made up your mind, it is time to set up your machine. The easiest way toproceed(開始) is to justdownload Anaconda from Continuum.io . It comes packaged with most of the things you will need ever. The majordownside(下降趨勢) of taking thisroute(路線) is that you will need to wait for Continuum to update their packages, even when there might be an update available to theunderlying(潛在的) libraries. If you are a starter, that should hardly matter.

If you face any challenges in installing(安裝), you can find moredetailed instructions for various OS here

Step 2: Learn the basics of Python language

You should start by understanding the basics of the language, libraries and datastructure(結構). The python track fromCodecademy is one of the best places to start your journey. By end of this course, you should be comfortable writing small scripts on Python, but also understand classes and objects.

Specifically learn: Lists, Tuples, Dictionaries, List comprehensions(理解), Dictionary comprehensions

Assignment: Solve the python tutorial(輔導的) questions on HackerRank. These should get your brain thinking on Python scripting

Alternate resources: If interactive(互動) coding is not your style of learning, you can also look at TheGoogle Class for Python. It is a 2 day class series and also covers some of the parts discussed later.

Step 3: Learn Regular Expressions in Python

You will need to use them a lot for data cleansing(淨化), especially if you are working on text data. The best way to learn Regular expressions is to go through the Google class and keep this cheat sheet handy.

Assignment: Do the baby names exercise

If you still need more practice, follow this tutorial(個別指導) for text cleaning. It will challenge you on various stepsinvolved(包含) in datawrangling(爭論).

Step 4: Learn Scientific libraries in Python – NumPy, SciPy, Matplotlib and Pandas

This is where fun begins! Here is a brief introduction to various libraries. Let’s start practicing some common operations.

Practice the NumPy tutorial thoroughly, especially NumPy arrays(數組). This will form a goodfoundation(基礎) for things to come.
Next, look at the SciPy tutorials. Go through the introduction and the basics and do the remaining onesbasis(基礎) your needs.
If you guessed Matplotlib tutorials next, you are wrong! They are too comprehensive(綜合的) for our need here. Instead look at thisipython notebook till Line 68 (i.e. till animations(活潑))
Finally, let us look at Pandas. Pandas provide DataFrame functionality(功能) (like R) for Python. This is also where you should spend good time practicing. Pandas would become the mosteffective(有效) tool for all mid-size data analysis. Start with a short introduction,10 minutes to pandas. Then move on to a more detailedtutorial on pandas.

You can also look at Exploratory(勘探的) Data Analysis with Pandas andData munging with Pandas

Additional Resources:

If you need a book on Pandas and NumPy, “Python(巨蟒) for Data Analysis by Wes McKinney”
There are a lot of tutorials(個別指導) as part of Pandasdocumentation(檔案材料). You can have a look at themhere

Assignment: Solve this assignment(分配) from CS109 course from Harvard.

Step 5: Effective Data Visualization

Go through this lecture form CS109. You can ignore(駁回訴訟) the initial 2 minutes, but what follows after that isawesome(可怕的)! Follow this lecture up withthis assignment

Step 6: Learn Scikit-learn and Machine Learning

Now, we come to the meat of this entire process. Scikit-learn is the most useful library onpython(巨蟒) for machine learning. Here is abriefoverview(綜述) of the library. Go through lecture 10 to lecture 18 fromCS109 course from Harvard. You will go through an overview of machine learning, Supervised learningalgorithms(演算法) likeregressions(迴歸), decision trees,ensemble(全體) modeling and non-supervised learning algorithms likeclustering(聚集). Followindividual(個人的) lectures with theassignments from those lectures.

Additional Resources:

If there is one book, you must read, it is Programming Collective Intelligence – a classic(經典的), but still one of the best books on the subject.
Additionally(附加的), you can also follow one of the best courses onMachine Learning course from Yaser Abu-Mostafa. If you need more lucid(明晰的) explanation for the techniques, you can opt for theMachine learning course from Andrew Ng and follow the exercises on Python.
Tutorials(個別指導) on Scikit learn

Assignment: Try out this challenge on Kaggle

Step 7: Practice, practice and Practice

Congratulations, you made it!

You now have all what you need in technical skills. It is a matter of practice and what better place to practice than compete with fellow Data Scientists on Kaggle. Go, dive into one of the live competitions currently running onKaggle and give all what you have learnt a try!

Step 8: Deep Learning

Now that you have learnt most of machine learning techniques, it is time to give Deep Learning a shot. There is a good chance that you already know what is Deep Learning, but if you still need a briefintro(介紹),here it is.

I am myself new to deep learning, so please take these suggestions with apinch(匱乏) of salt. The mostcomprehensive(綜合的) resource isdeeplearning.net. You will find everything here – lectures, datasets, challenges, tutorials. You can also try thecourse from Geoff Hinton a try in a bid to understand the basics of Neural Networks.

P.S. In case you need to use Big Data libraries, give Pydoop and PyMongo a try. They are not included here as Big Data learning path is an entire topic in itself.

from:http://blog.csdn.net/pipisorry/article/details/44245575

ref:http://www.analyticsvidhya.com/learning-paths-data-science-business-analytics-business-intelligence-big-data/learning-path-data-science-python/

Comprehensive learning path – Data Science in Python

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More