http://blog.csdn.net/pipisorry/article/details/44245575
A good article on how to learn python and use Python for data science, data analysis, and machine learning
Comprehensive(integrated) Learning Path–data Science in Python
Journey from a pythonnoob(Novice) to a kaggler on Python
So, you want to become a data scientist or May is you is already one and want toExpand(expansion)Your toolRepository(storage room). You are landed at the right place. The aim of this page was to provide a comprehensive learning path to people new to Python for data analysis. This path provides a comprehensiveOverview(summary)Of steps you need to learn-use Python for data analysis. If you already has some background, or don ' t need all the Components(ingredient), feel free toAdapt(adaptation)Your own paths and let us know how to made changes in the path.
Step 0:warming up
Before starting your journey, the first question to answer are:
Why use Python?
Or
How would Python is useful?
Watch The first minutes of this talk from Jeremy, founder of Datarobot at Pycon, Ukraine to get a idea of what use Ful Python could be.
Step 1:setting up your machine
now that you had made up your mind, it was time to set up your machine. the easiest Toproceed (start) is to Justdownload Anaconda from Continuum.io. It comes packaged with most of the things you'll need ever. The Majordownside (downtrend) of taking Thisroute (route) is so you'll need to wait for Continuum to update their packages, even when there might is an update available to Theunderlying (potentially) libraries. If you is a starter, that should hardly matter.
If you face any challenges in installing(install), you can find moredetailed instructions for various OS here
Step 2:learn The basics of Python language
You should start by understanding the basics of the language, libraries and datastructure(structure). The Python track Fromcodecademy are one of the best places to start your journey. By end of this course, you should is comfortable writing small scripts on Python, but also understand classes and objects.
specifically learn: Lists, tuples, dictionaries, List comprehensions(understanding), Dictionary comprehensions
Assignment: Solve the Python tutorial(Tutoring) questions on Hackerrank. These should get your brain thinking on Python scripting
Alternate Resources: If Interactive(interactive) coding isn't your style of learning, you can also look at Thegoogle Class for Pyth On. It is a 2 day class series and also covers some of the parts discussed later.
Step 3:learn Regular Expressions in Python
You'll need to use them a IoT for data cleansing(purify), especially if you is working on text data. The best of learn Regular expressions is to go through the Google class and keep this cheat sheet handy.
Assignment: Do the baby names exercise
If you still need more practice, follow this tutorial(individual guide) for text cleaning. It'll challenge you on various stepsinvolved(included) in datawrangling(controversy).
Step 4:learn Scientific libraries in Python–numpy, SciPy, matplotlib and Pandas
This is the WHERE fun begins! Here are a brief introduction to various libraries. Let ' s start practicing some common operations.
- Practice the NumPy tutorial thoroughly, especially NumPy arrays(array). This would form a goodFoundation(Base) for things to come.
- Next, look at the SciPy tutorials. Go through the introduction and the basics and do the remaining onesbasis(Basic) your needs.
- If you guessed matplotlib tutorials Next, you are wrong! They is too comprehensive(integrated) for our need here. Instead look at Thisipython notebook till line (i.e. till animations (lively))
- Finally, let us look at Pandas. Pandas provide DataFrame functionality(function) (like R) for Python. This is also the where you should spend good time practicing. Pandas would become the mosteffective(valid) tool for all mid-size data analysis. Start with a short introduction,10 minutes to pandas. Then move over to a more detailedtutorial on pandas.
can also look at exploratory(exploration) Data analysis with Pandas anddata munging with Pandas
Additional Resources:
- If you need a book on Pandas and NumPy, "Python(Monty Python) for Data analysis by Wes McKinney"
- There is a lot of tutorials(individual guidance) as part of Pandasdocumentation(document material). You can has a look at Themhere
Assignment: Solve this assignment(allocated) from CS109 course from Harvard.
Step 5:effective Data visualization
Go through this lecture form CS109. You can ignore(dismiss lawsuit) the initial 2 minutes, but what follows after that isawesome(scary) ! Follow this lecture up Withthis assignment
Step 6:learn Scikit-learn and machine learning
Now, we come to the meat of this entire process. Scikit-learn is the most useful library onpython(python)For machine learning. Here is AbriefOverview(summary)The library. Go through lecture lecture fromCS109 course from Harvard. You'll go through an overview of machine learning, supervised learningAlgorithms(algorithm)Likeregressions(return), Decision Trees,Ensemble(All)Modeling and non-supervised learning algorithms likeClustering(aggregation). Followindividual(personal)Lectures with theassignments from those lectures.
Additional Resources:
- If There is a book, you must read, it's programming collective intelligence–a classic (classic) , but still one of the Best books on the subject.
- additionally (additional) , you can also follow one of the best courses onmachine learning course from Yaser Abu-mostafa. If you need more explanation for the techniques, you can opt for Themachine learning course from Andrew Ng and follow The exercises on Python.
- tutorials (Individual guidance) On Scikit Learn
Assignment: Try out this challenge on Kaggle
Step 7:practice, practice and practice
Congratulations, you made it!
You are now having all the need in technical skills. It is a matter of practice and what better place to practice than compete with fellow Data scientists on Kaggle. Go, dive into one of the "live competitions currently running Onkaggle and give all-you has learnt a try!
Step 8:deep Learning
Now so you had learnt most of the machine learning techniques, it was time to give deep learning a shot. There is a good chance so already know what's deep learning, but if you still need a briefintro(Introduction) ,here it is.
I am myself new to deep learning, so please take the these suggestions with apinch(deficient) of salt. The mostcomprehensive(integrated) resource isdeeplearning.net. You'll find everything here–lectures, datasets, challenges, tutorials. You can also try Thecourse from Geoff Hinton a try in a bid to understand the basics of neural Networks.
P.S need to use Big Data libraries, give Pydoop and Pymongo a try. They is isn't included here as the Big Data learning path is a entire topic in itself.
Fromhttp://blog.csdn.net/pipisorry/article/details/44245575
ref:http://www.analyticsvidhya.com/ learning-paths-data-science-business-analytics-business-intelligence-big-data/ learning-path-data-science-python/
Comprehensive learning Path–data Science in Python