Now, can python perfectly replace R and stata in the field of statistics or (theory/Application) Metering economics?

Source: Internet
Author: User
Tags statsmodels
0 reply content: Update:Thank you for your support, thanks, and comments. I will post another piece of data processing that was previously done using Python and define a complicated new variable. It's a simple feature engineering. If Stata is used for this task, it will be a headache. In addition, this example can also be used to experience IPython Notebook (use a web version to see if the mobile version is ineffective ).
Link to GitHub: Machine-Learning-Mini-Project/Feature Engineering. ipynb
------
Original answer:I would like to share my experience on Python and Stata (I will not discuss it if I am a little bit R ). I want to emphasize that both Stata and Python can be regarded as entry points, so the comparison between them may be limited to my level, but not pertinent enough. Still hope to correct.

Conclusion: For the application of data analysis, from using Stata only to fluent use of Python, it is likely to benefit a lot, and with the open-minded pleasant experience. These skills are more widely used than Stata, and it is not difficult to learn basic things as long as you are willing to make some effort. In addition, communities such as Stack Overflow greatly improve the learning efficiency, and the return on investment for learning Python is very high.

I am engaged in micro-application. Most research projects do not involve any advanced measurement methods. Basically, they ask questions with your heart, then I worked hard to collect the "bitter" line of first-hand data (the research in the direction of economic history is the data collected from the original historical archives, from the perspective of management science, performance appraisal data is evaluated by employees of a company ). Therefore, the main requirement for software is data cleaning, transformation, visualization, and so on.

I initially used Stata. at that time, I thought Stata was quite convenient, especially defining new variables (the syntax such as bysort: gen is very useful), running OLS/Logit regression, and entering the table to Latex. these basic functions are very convenient to implement Stata. However, the fly in the ointment is that once you write a function on your own, you are not used to the Stata programming method, so the code is not easy to reuse. Do File is a long time and you will feel a little messy. Furthermore, matrix operations and operations are not very useful.

Later, I became interested in data science and machine learning and offered Python-based courses on platforms such as edX, Coursera, and Udacity. The most useful learning tools are the two Python courses (6.00.1x and 6.00.2x) at MIT on edX and the Intro to Data Science on Udacity. after attending these courses, I made some small projects of machine learning. At that time, I did not study for the purpose of applying it to my own economic research. During that time, I had no choice but to analyze a game theory model, I used Python to perform agent-based simulation to depict the balanced nature, and did not really use it as a complete project.

Interestingly, a few months later, I started a new project. Although it still does not require advanced statistics and metering, in terms of data processing, it is more complex than before -- data needs to be summarized into some transformation matrices, and then some calculations and a large amount of data visualization are made. Thanks to my inability to use Stata, I tried to use Python Pandas to perform data manipulation and use Matplotlib to draw images when I started a new project. Another reason is that when I started using IPython Notebook, I couldn't stop it-the code and analysis results (charts) were integrated into one document (a piece of code followed by the output results ), it is ideal for sorting and sharing. Who knows who to use.

When switching from Stata to Python, Pandas DataFrame is not used to many functions, such as Reshaping, MultiIndex, and effect_table. So I miss Stata. Later I began to feel Pandas's powerful data operation functions.

Simply put, after using Python, my most satisfactory improvement is that all the analysis is automated, from raw data to the final charts and results required, manual adjustment is not required. In addition, code reusability is significantly improved. In addition, since Python is used, I have become more frequent in visualizing data thanks to the enhancement of data operation capabilities. I will perform descriptive analysis and visualization for almost all regression analysis.

Finally, we have to mention the power of the Python community. If I don't know how to Google it, the Stack Overflow Q & A that I searched out, and the content in some technical blogs can basically solve the problem. However, when Stata is used, it often gets stuck when it gets stuck. If it gets stuck for a long time, you can only read the document and explore it on your own.

---
Supplement:A friend asked me what I used to make a picture. I used Matplotlib. Although I think it is not very easy to use, the basic functions are almost enough. I will post some pictures in my research on economic history. These are some basic things, just to let Interested friends know about my usage. Laugh at it :)
You are a statistician, not a programmer.

Computer Language is a tool for implementing your ideas, but it doesn't support python or R, probability and statistics, or mathematics.

I have had a similar problem before, so I talked to the Professor specially. The above is my answer.

Of course, I am not shouting for every year that I want to learn python, and I haven't made any excuses yet, as well as R. One advantage of R is that it is written by statisticians, and R is also written by statisticians.
In my definition, R, python, and matlab can basically replace each other. When I was repairing ML, I asked the teacher which one is the most suitable. The teacher can answer all the questions. Regardless of the statistical measurement or time series, I have always been using R, which is quite satisfactory. After all, I wrote statistics to myself and knew what I needed, which is professional enough.
As for stata, I classify it as a type of metering software with spss/eviews, which is essentially different from the statistical language of R. Thank you for your invitation. In this regard, I have only been in touch with the biological information of fur. Just answer it.
Some of the things we know about biological information are specially designed for data analysis. It's okay to use python. After all, it's easier to mine data.
Python can be used in all aspects, but it is not the best in all aspects (there must be no better, only better ).
It should not be replaced. After all, R and other professionals do this. I think the academic and industrial situations will be different.
In the academic world, as said by the top respondents, R or PYTHON is only a tool, and more importantly, an idea. So the emergence of Python only provided some new tools for researchers. Previously, the boss of operations research seems to be using Python (another operational research professor uses C ...). Maybe this is largely determined by the professor's style and research direction. Therefore, R should not be replaced when python does not have the advantages of other languages.

The industry is different. Python has a chance to replace R because it is easy to get started and readable. If you only want to process data, R is good.
If you want to crawl the web page and dig for data while processing data in the future, it is easier to learn Python at the beginning. Give me a thick answer. Focus on the field of statistics, big data, and data science. Stata is not comparable. Not to mention stata .. Sas has gradually become incomparable. Python and r have their own advantages. To put it simply, these two tools are very popular in cs and statistics. In contrast, python can better reflect the thinking of cs, and r is basically an application carried from theory to implementation for statistics. Of course, this has a profound relationship with the history of these two languages. Python is closely related to c, and r is based on s language. Although they all have their own ability to call other underlying languages, these historical reasons also make them special. Python is more like a full-lifecycle tool. It's okay to write the ui or something (this doesn't mean that r cannot do the ui, but it's hard to use it ). In addition, python is much more compatible with big data. Python nlp is also a great advantage. The obvious advantage of r is also related to his statistical thinking. As some people have the answer, the statistics are written for their own use. Therefore, there are too many r packages for the analysis, once there is a small breakthrough in statistics, the developer will write a package to convert the theory into a function that is easy to operate, that is, to implement a theoretical algorithm. Of course, there is one thing that can be used in other fields. Someone will write a package for r... If you do not pay attention to statistics for a long time, you may not be able to understand it without in-depth use of python and r. Python also has this character, but it is not comparable with r. This is also related to the cs background rather than the statistical analysis background for traditional python users. The ease of interaction between python and the underlying language is also a major feature, so you can see that a lot of quantitative transactions do not select r rather than python (of course, the underlying language has an unshakable say in this field ). In comparison, I personally think r is suitable for strategy and model development. Python is suitable for implementing your model as a whole. In general, these two languages are constantly learning from each other. In the future, these two languages are likely to exist for a long time and will continue to eat the market share of other analysis languages such as sas and matlab. The strategy that allowed many analysis languages to be commercialized and subsequently promoted on a large scale has gradually become unable to adapt to this new Big Data era. Open source has become a great advantage, and sharing knowledge sharing results is very important. After all, the speed of closed doors is too slow. The other is the old language, especially sas. To put it bluntly, there is also a big reason: there are a large number of traditional users and enterprise users. However, it is difficult for these traditional users to create great value today. Especially those enterprises represented by many of the top 500 companies in coperate america. However, there are also many underlying development requirements in addition to analysis in data science. Therefore, it is a great advantage to be able to understand one or two underlying languages. 2. scala is also very promising in the future.

Take a pat. Of course, compared with python, r is a high-level programming language, which is relatively user friendly, but has more limitations. If programming is strong enough, of course, you can use python to do what r can do and the speed is faster. It is actually a tool. but, master one, and the rest is not difficult. The statsmodels developer once said:

"From twitter:

'I can see that. much of python stats strikes me as poor imitation of R. like matplotlib: matlab, OO: MS Office'

Referring to statsmodels

I'm not sure whether the implied criticism is on "poor" or "imitation"


I wowould like to "officially" correct this :)

Statsmodels is not only a poor imitation of R, it is also a poor imitation of Stata. it is in some parts a poor imitation of SAS, and maybe even in some parts a poor imitation of Matlab or GAUSS or ...., and maybe in some parts it's even a good imitation.

But I think it is a good imitation of statsmodels,
Although with still some very important gaps in coverage of statistics and economics ."

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.