# Now, in statistics or (theory/application) econometrics, can python be a perfect substitute for R and Stata?

Source: Internet
Author: User
Tags statsmodels

update :Thank you for your approval, thanks and comments. I'm going to stick to a python-made data processing that defines a more complex new variable, a simple feature engineering. This task will be a headache if it is done with Stata. In addition, this example can also be used to experience the Ipython Notebook (to use the web version, the mobile version of the effect is poor).
－－－－－－
Original answer :I would like to share my own experience with Python and Stata, which I have rarely discussed with R. I would like to emphasize that I can only be a primer on stata and Python, so the comparison between the two is likely to be limited to my level and not pertinent. Also look at the correct.

first, the conclusion.：for the application of data analysis, from the use of only Stata, to a more fluent use of Python, it is likely to benefit, and accompanied by the enlightened experience of pleasure. These skills are more widely used than stata, and it is not too difficult to learn basic things as long as you are willing to take some effort. In addition to the huge increase in learning efficiency in communities like stack overflow, learning Python has a high return on investment.

I do the application of micro, most of the research projects do not involve any advanced measurement methods, is basically to go with the heart to ask questions, and then painstaking effort to collect data from the "bitterness" route (the study of the direction of economic history is the data collected from the original historical records, The study of partial management science is a performance appraisal data that is evaluated by the employees of a company using each other. So the need for software is mainly data cleaning, transformation, visualization and so on.

I was originally using Stata. At that time it was very convenient to think of Stata, especially the definition of new variables (Bysort:gen such as the syntax is very useful), and run Ols/logit regression, and then enter the form to latex. These basic functions stata very convenient to implement. However, in the ointment is, once to write their own functions, began not accustomed to stata programming way, so the code is not easy to reuse, do file a long, slowly feel a bit chaotic. And then the matrix operation and computing function is not very good.

Later, with his interest in data science and machine learning, there were some Python-based courses on the edx, Coursera, Udacity and other platforms. One of the most rewarding lessons for the utility is EdX's two Python courses (6.00.1x and 6.00.2X) on the MIT and intro to Data Science on Udacity. After these lessons, I did some machine learning small project. The purpose of the study was not to apply to their own economic research-that time, in addition to a game theory of the model I can not help analysis, using Python to do a bit agent-based simulation depicts the nature of the equilibrium, and did not really take to complete a project.

The interesting thing is that I started a new project a few months later, and although it still doesn't require advanced statistics and metering, it's more complicated than before in data processing--you need to aggregate the data into some transformation matrices (transition matrix), then do some calculations, and do a lot of data visualization. When I started a new project with Stata, I tried to use Python pandas to do data manipulation and draw with matplotlib. Another reason is that when I start using Ipython notebook, I can't stop--the Code and Analysis results (charts) are integrated into a single document (a piece of code that follows the output) and is ideal for sorting and sharing. Who knows with whom.

The first time from Stata to Python, still not very accustomed to pandas DataFrame, especially for reshaping, Multiindex, pivot_table and other functions. So I still miss Stata. Then slowly feel the pandas powerful data operation function.

In short, after using Python, my most satisfying efficiency is that all the analysis is automated, from raw data to the final required charts and results, without the need for some semi-automated manual adjustments. And the amount of reusable code is significantly improved. In addition, with Python, thanks to the increased data manipulation capabilities, I have become more frequent than before to visualize data, and almost all regression analysis I will do to do the corresponding descriptive analysis and visualization.

Finally, we have to mention the power of the Python-related community. I'm not going to. Google, the search-out stack overflow questions, and some technical blog content, basically can solve the problem. However, the use of stata, often have a sense of powerlessness, stuck on the stuck, tangled for a long time can only rely on reading documents and then explore their own.

－－－
Add:A friend asked me what I used to make a diagram. I'll use Matplotlib. Although it is not very useful, but the basic function is almost enough. Here are some of the figures in my study of economic history. are very basic things, just to let interested friends know about my use of the situation. Laughed at:)

You're a statisticians, not a programmer, you're a statisticians, not a programmer, you're a statistical biologist, not a programmer.

Computer language is a tool to implement your ideas, but it's not Python or R that supports your ideas, it's probability and statistics, it's math.

I've had a similar puzzle before, so I've been talking to a professor, and that's the answer I got.

Of course I'm not trying to make excuses for every year I'm calling to learn Python, and R Dafa is good. One advantage of R is that it is written by statisticians, and the disadvantage of R is that it is written by statisticians.
In my definition, R/python/matlab, is basically can replace each other, the more difficult to choose the more the explanation can be. When I was repairing the ML, I asked the teacher which is the most suitable one, and the teacher replied. Regardless of statistical measurement or time series, I have been using R, quite satisfied, after all, play statistics for their own use, know what they need, professional enough.
As for Stata, I am a class with spss/eviews, called the Metering software, and r This statistical language still has the essential difference. Thank you for inviting me. In this regard, I have only touched the fur of biological information. Force a reply.
Some of the knowledge to do biological information is also specialized in data analysis, with Python can be, after all, data mining convenience.
Python, all aspects can be, but all aspects are not the best (inevitably there is no best, only better).
Instead should not, after all, R and other professions do this, I think the academic and industry situation will be different.
Academia is like the current highest-level answer the Lord says, R or Python is just a tool, and more importantly, thought. So the advent of Python just gives researchers some new tools. It seems that the boss of operations research was using Python more (another research professor used C ...). 。 Perhaps this is largely decided by the professor's own style and research direction. So when Python does not appear to be enough to crush other languages, R should still not be replaced.

The industry is not the same. Python has the opportunity to replace R because it is easy to get started, readable, and so on. If you just want to do data processing, R is good.

For a PAT. Of course, compared to Python, R is a high-level programming language compared to the user friendly but the limitations are greater if the programming is strong enough of course you can use Python to do all r can do things and faster is actually a tool, proficient one, the rest is not difficult. Check it out, Statsmodels's developers have said something:

' I can see that. Much of Python stats strikes me as poor imitation of R. Like Matplotlib:matlab, Oo:ms Office '

Referring to Statsmodels

I ' m not sure whether the implied criticism are on "poor" or "imitation"

I would "officially" correct this:)

Statsmodels is isn't only a poor imitation of R, it's also a poor imitation of Stata. It is in some parts a poor imitation of SAS, and maybe even in some parts a poor imitation of Matlab or GAUSS or ...., and Maybe in some parts it ' s even a good imitation.

But I think it was a good imitation of statsmodels,
Although with still some very important gaps in coverage of statistics and econometrics. "

• Related Keywords:
Related Article

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

## A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

• #### Sales Support

1 on 1 presale consultation

• #### After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

• Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.