Already has the last more than 10 years all A shares of the daily line data (the data is probably hundreds of trillion size), want to test some strategies back, see its yield, the largest drawdown and so on. Python is self-taught and has been studying for about one months and will have some basic programming. MATLAB has not contacted. Want to ask the strategy backtesting which tool is more efficient?
--- --- ---
To add, the following people mentioned that hundreds of trillion data is not big, but I used Python processing this data often reminds me memoeyerror, how should I solve this problem?
Reply content:
The first ten years of the daily level of data is really small, using Python should not appear memoryerror, should be in the programming need to pay more attention to, we use on the ricequant minute data is about 200-300 GB or so, Python and Java are also working together to do this.
Language is just a language, there may be different grammatical differences, but when we talk about language we need to understand the toolbox and the community behind it, and why it is better to deal with something than some other language.
The initial python used to do financial backtesting should be abandoned, used to develop strategies should also be abandoned, because compared to the matrix operation of Matlab to do development, it is too convenient. But then Python launched a series, pandas and other powerful Library,pandas grammar basically "shameless" imitation of Matlab and R, and pandas developers is the United States famous hedge fund AQR, so the data Crunching and some of the data to facilitate the operation, in addition, packaging a large number of open-source community of mathematical and scientific Computing library, can also handle a variety of Machin learning and so on.
From the perspective of the development of scientific computing language, from the initial people to the demand for floating-point calculation added Fortran, and then all the way, so that the tool more easy to make scientific computing easier (Python also encapsulates a large number of early mathematicians written in Fortran mathematical computing Base Library, These have undergone decades of testing, accelerating, etc.):
Let's look at Python's current technology stack:
- Numpy:basic array manipulation-base processing of arrays
- Scipy:scientific computing in Python, including signal processing and optimization-scientific calculations, including signal processing and optimization
- Matplotlib:visualization and plotting-a few lines of code can be graphically displayed
- Ipython:write and run Python code interactively in a shell or a notebook-interactive programming environment, which is an essential to replace MATLAB in the future, that is, in one line of code input, display process secondary Learning, improvement
- Pandas:data manipulation-The most important matrix operations, etc.
- Scikit-learn:machine Learning-Machine learning
But as the future development of Python's open source properties will become more and more powerful, can let more people enjoy their convenience and contribution, including Quantopian also released the Zipline Python framework, only need to introduce the Yahoo data can be backtesting, And Python's speed due to a good combination with C can achieve very fast speed, but also in the future and other systems can easily integrate docking real trading interface.
As many investment banks and hedge funds in Europe and the US have moved closer to Python's technology stack, Python is the key tool and does not have to be bundled with a privatized company.
Of course, at the end of the final, all the Python backtesting you can come to Ricequant-beta
Completed, we support a huge amount of market, financial data, as well as the continuous accession and big data companies to cooperate with the public opinion data, etc., while the strategy back test can also do real-time simulation transactions, enjoy real-time data calculation. Almost all of the Python scientific computing libraries have been supported on the cloud platform, eliminating the need to spend time installing, testing, and so on. I have no money, support free open source
Aside from copyright, initial strategy testing, data analysis with MATLAB very convenient
However, the strategy test method, the framework to make clear, to do regular backtesting, or Python convenience, here is the normal refers to strict event flow driven, although slow, but to avoid future function impact, close to real disk logic. Python has a lot of libraries in this area, Quantopian zipline should be the originator, the domestic excellent mining network and ricequant are like Zipline, In addition to the vn.py,pyalgotrade of the great God and other languages are used to work, intermediate files with HDF5 or CSV dump, need time series analysis time on R. After all, the Python SM Library is still rotten, but PCA and a lot of multi-factor calculations, Python R MATLAB is similar.
The efficiency of the PY (relative to MATLAB) is higher when the parameter tuning and some MC methods are evaluated with the concurrency efficiency and some multi-parameter backtesting.
Overall, do not take the language seriously, and eat with chopsticks or spoon or fork, according to the ingredients to hundreds of trillion data to calculate a hammer ah ... 64-bit Python to eat a few gigabytes of memory is not what ah ... If the error is true, remember to check the code and algorithm for any problems. In case the original data hundreds of trillion but the algorithm in the spatial complexity of the number of levels that would be hopeless, right?
The same thing, if the MATLAB can run, Python generally can also run, in case of running, split run parallel, both will do. It is recommended that you do not understand programming and do not intend to learn how to program the use of MATLAB and so on, whereas programming is interested in choosing Python. What to use, hundreds of g of data is very small, of course, if you directly readfile such as reading the entire file ...
The simplest is data import into the database, Mysql,postgresql and so on, and then easy. I have two employees with MATLAB, but I myself generally use Python and r if the money to buy authorization can choose MATLAB, if no money recommended not to use piracy, Python very good 2 can be, but no data, so use the day soft bar! As for the introduction of the main topic, I suggest choosing Python,
First, Python is already learning, Matlab has not learned
Second, Python is open source, and there are many quantitative trading platforms that use the Python language, such as the one I'm using
https://www. joinquant.com
, before oneself in the local do backtesting also often appeared the title of Memoeyerror, now I no longer need to worry about, Joinquant engineering lions have helped me to solve, and here the backtesting does not limit the memory, but also my favorite analog trading alert function.
Third, it must be said that Python is a quantitative investment weapon, the advantages of slowly introduced.
The main purpose of the topic is to test some of the strategy back to see its yield, the largest drawdown, in fact, this is called Quant researcher, for them, programming and quantification strategy is indispensable. Unlike Quant Dev, researcher needs more of a quantitative strategy to interpret, analyze, and develop new strategies.
The master also needs to add his own Python programming capabilities, the most fundamental is that data analysis is necessary. Some reference materials to share with you
Recommend some quantitative investment learning materials (continue to add ... )
Recommend some python introductory learning materials (continue to add ... )
"Data sharing" Python, research reports, econometrics, investment books, r language, etc. (Book+video)
"Scattered sand" python Scientific Computing series
I was transferred from the MATLAB junior Quant researcher, dare to summarize share their own Python learning path
the first is the basics of Python:
Python's basic knowledge must be mastered, especially in Python's very useful iterators and parsing
"Quantitative investment weapon Python" basic syntax-List of data type 1 "Quantitative investment weapon Python" basic syntax-Dictionary of data type 2
The basic syntax of "quantitative investment weapon Python"-Data type 3 tuple, collection
"Quantitative investment weapon python" condition and cyclic-if, while, for
"Quantitative investment weapon Python" magic iterator and parsing
"Quantitative investment weapon Python" basic syntax-functions
Next is the common base Class library
Time Library
There are three commonly used time formats, which are commonly used to convert each other.
1) time stamp (timestamp) mode:
Typically, a timestamp represents an offset that is calculated in seconds, starting January 1, 1970 00:00:00. We run "type (Time.time ())" and return the float type. The function that returns the Timestamp method mainly has time (), clock () and so on.
2) formatted time string, such as "%y-%m-%d%h:%m:%s"
3) tuple (struct_time) mode
"Quantitative investment weapon Python" basic Class library-time
NumPy
The NumPy system is an open-source numerical extension of Python.
NumPy (Numeric Python) offers a number of advanced numerical programming tools such as matrix data types, vector processing, and sophisticated operations libraries. Designed for rigorous digital processing.
Numpy-numpy
1. NumPy Basics:arrays and vectorized computation
Can be a reference book Python scientific calculation of the NumPy
SciPy
The scipy is a handy, easy-to-use Python toolkit designed for science and engineering. It includes statistics, optimization, integration, linear algebra modules, Fourier transforms, signal and image processing, ordinary differential equation solvers, and more.
scipy.org-scipy.org
Can be a reference book Python scientific calculation of the scipy
Pandas
The Python data analysis Library or pandas is a numpy-based tool that was created to solve the data analytics task. Pandas incorporates a number of libraries and a number of standard data models, providing the tools needed to efficiently manipulate large datasets. Pandas provides a number of functions and methods that enable us to process data quickly and easily. You can also download "Data sharing" python, research reports, econometrics, investment books, r language, and more in the reference book "Python for Data analysis"! (Book+video)
"Quantitative investment weapon Python" basic class library-pandas Getting Started 1 data structure
"Quantitative investment weapon Python" basic class library-pandas Getting Started 2 data processing
"Quantitative investment weapon Python" basic Class library-pandas Advanced
After studying the above three tutorials, most of pandas's functions are already available, and more can be found on the website pandas:powerful Python data Analysis toolkitctrl+f search function Help
Ta-lib
Ta-lib in Chinese can be called Technical Analysis Library, is a widely used in programmatic transactions in financial market data of the technical analysis of the function library, provides a variety of technical analysis functions, can greatly facilitate our quantitative investment in programming.
Ta-lib Usage Introduction!
Programming tool for indicator calculation and morphological recognition--ta-lib
Talib specific examples of use in quantitative investments
Quantifying the MACD of investment learning "Ta-lib"
Quantitative investment Learning "Ta-lib" Bollinger Bands
Quantitative investment Learning "ta-lib" of the Stoch (KD indicator)
The ATR of "ta-lib" in Quantitative investment learning
Quantify the RSI of investment learning "Ta-lib"
Scikit-learn
is a python-based machine learning module based on BSD open source licenses. The basic functions of scikit-learn are mainly divided into six parts, classification, regression, clustering, data dimensionality reduction, model selection, data preprocessing. The machine learning model in Scikit-learn is very rich, including SVM, decision Tree, GBDT,KNN and so on, can choose the appropriate model according to the type of problem.
Specific examples of use in quantitative investments
"Machine learning" time series volatility estimation
The ten-year trend of "machine learning" index
The problem of parameter optimization in trading strategy
Linear regression in "machine learning" wrapping theory
"Machine learning" non-parametric clustering analysis
Introduction to Deep learning
SVR predicts stock opening price
"Research on machine learning methods"--thinking arrangement, support vector machine
some other specific quantitative strategies mentioned above are applied in the research.
"QLS" linear regression
Linear correlation analysis of "QLS"
"QLS" spearman rank correlation coefficient
"QLS-6" over fitting
Instability of "QLS7" parameter estimation
The model setting of "QLS-8" in Quantitative investment learning
The violation of the hypothesis of "QLS9" regression model
"QLS10" Regression analysis
"QLS12" Arbitrage Pricing theory
"QLS15" Maximum Likelihood method (MLE)
"QLS16" Arch and Garch
"QLS17" Multi-empty policy
"QLS19" momentum trading strategy
"QLS20" measures momentum
Pairing trading Strategy
Convex optimization (convex optimization) Introduction!
Finally, I would like to say that Python is just a tool, important or strategic thinking, can be read through a large number of data acquisition, and then their own experiments, analysis of the gradual summary. Recommend some classic books and research reports to "data sharing" Python, research reports, econometrics, investment books, r language, etc. (Book+video) Get
Quant Interview Books
- Frequently asked Questions on Quant interviews
- [Mark Joshi] Quant Job interview Questions and Answers
- [Xinfeng Zhou] A Practical Guide to quantitative finance interviews
- Frequently-asked-questions-quant-interview
- Heard on the street quantitative Questions from Wall Street Job Interviews
- The Investment Banking interview Questions & Answers you need to Know
Investing in Reading books
- Algorithmic trading winning strategies and their rationale
- Barra Handbook US
- Encyclopedia of Trading Strategies (Encyclopedia of Trading Strategies)
- Inside the Black box-a Simple Guide to quantitative and high Frequency Trading (2nd.edition)
- Nassim Taleb-dynamic Hedging
- Options Futures and other derivatives 8th-john Hull
- Quantative Trading Strategies
- Quantitative Equity Portfolio Management:modern techniques and applications
- Quantitative Trading how to Build Your Own algorithmic Trading Business
- Quantitative Trading how to Build Your Own algorithmic Trading Business
- "New Trading Systems and Methods" Perry J.kaufman 4th edition.pdf
- "Professional speculation Principle" full version (United States). Vicodo Spoe Lang
- Capital protected investment law non-falling Stocks (HD)
- Open the black box of quantitative investment
- Stock Market Trend Technical Analysis (original book 9th edition-collector edition)
- Turtle Trading Rules
- Interpreting quantitative Investing: The story of Simmons beating the market with a formula
- Deciphering hedge fund indices and strategies
- Smart Trader-Kaufman
- Quantify trading how to build your own algorithmic trading (HD)
- Quantitative trading strategies-using quantitative analysis techniques to create profitable trading procedures
- Quantitative data analysis examines ideas through social research
- Quantitative investment strategy-how to achieve excess returns alpha
- Quantitative investment strategy and technical revision
- Option Investment Strategy 4th edition (HD)
- Quantitative investment: System and strategy
- The path to freedom in the financial kingdom
- Statistical arbitrage (Chinese version)
- Grid Trading Method Math + traditional wisdom beats Wall Street
- I am a high-frequency trading engineer: I know the choice of Dong-ji ("Salt" series)
- Proactive portfolio management to create high-yield and risk-controlled quantitative investment Methods (original book 2nd edition) (HD)
- Out of illusion to mature financial empire anthology
Metrology economics
- Financial metrology from beginner to advanced modeling technology
- Harvard Textbook Applied econometrics Stata
- The Zinai of higher econometrics
- Tsai chest-analysis of financial time Series-financial econometrics (2002) Financial Timing analysis
- Phoebus J. Dhrymes, Mathematics for econometrics, 4e
- Osborne,rubinstein-a Course in Game theory
- Model Building in Mathematical Programming (5e)
- Hayashi-econometrics
- Gujarati-essentials of econometrics Metrology essentials
- Akira takayama-mathematical Economics
- A Handbook of time-series Analysis, Signal processing, and Dynamics-1999
- 2013 Financial Mathematics
- Angel de la Fuente economic mathematical methods and Models (Sufe edition 2003)
- The structure of economics-the method of mathematical Analysis (Tsinghua Edition) Eugene Silberberg, Wing Suen
- Kamien & Schwartz, Dynamic optimization (2ed,1991)
- Csz-an Introduction to mathematical analysis for Economic theory and econometrics (draft version)
Research Reports
- Guo Xin Securities Finance Project
- 2016 Annual investment strategy report of the big brokers
- Everbright Securities
- Haitong Securities
- Shenwan Master Series
- He Mountain Stone series
- Citic Securities
- GF Securities
MATLAB to do matrix computing super convenient, almost as simple as the handwriting equation, the basic need not learn too much programming language things, and it draws a variety of two-dimensional three-dimensional curve is very simple, is the ideal choice for algorithm research.
But Matlab die expensive, the actual use of fewer people, if you want to find open source resources more difficult.
Python is easy to use, open source resources, but the syntax is much more complicated than MATLAB, to implement a lot of algorithms, drawing is a soft rib.
It might be better to actually use Python + Matlab.
Python can be used for the acquisition and preprocessing of raw data.
MATLAB is used for data analysis and algorithm research.