Machine Learning: R Language – Python
Despite having been published for a few years, the information contained seems excellent to us and can help us clarify the ground we are treading when we negotiate in the financial market, the order of the day they are the machines that is why the market sometimes seems more erratic than ever but the reality is that the operations are so fast due to the presence of automated systems that it gives that impression but in the end supply and demand is what rules , the article was translated by our work team, below is the transcription, we hope you enjoy it as much as we do:
Over the past year we have written several stories about using machine learning for trading. Developing trading programs with artificial intelligence is not easy, not only because it is hard work to develop something predictive but also because of the limited tools within the best trading platforms for machine learning.
Other programming languages have tools for time series forecasting and machine learning. The two main ones are the R language and Python. R does not have a good interface for developing and testing trading systems in the way that traders would like, although there are many libraries that try to enable backtesting of trading systems and charting of market data. The backtesting in Python is somewhat better, it also has a lot of machine learning libraries, but it doesn’t have as good statistical prediction libraries as R has.
One solution to this is that several trading platforms have added an interface to R, native to the language, so that information can always be passed through files. One such company with a native R interface is the InfoReach platform. Another is the TradersStudio Turbo 2017 version. This can be achieved through a plugin.
In upcoming articles we will show you how R can be used to develop machine learning and trade modeling technologies, and use them to backtest and trade trading systems.
The end of average traders
In 2015, Ray Dalio’s $165 billion hedge fund, Bridgewater Associates, launched a six-person AI unit. This team is led by David Ferrucci, who joined Bridgewater in late 2012 after leading the International Business Machines (IBM) engineers who developed Watson, the computer that beat human players on “Jeopardy.” This new Bridgewater unit will have extensive resources to apply forecasting technology to markets, which adapts to changing market conditions; it is a new paradigm that could spell the end of the average trader.
Developing adaptive and machine learning systems for trading systems is very expensive for two reasons. First, you need to code the machine learning method, as is the case with neural network algorithms, wavelet algorithms, and deep learning algorithms. This software requires high level math and programming skills to be written. Developing an Excel or TradeStation plugin with an algorithm from scientific publications could take several months of development using world-class minds. The algorithm alone could cost up to $100,000 to develop and it might not work for you. The ideal would be to have a collection of dozens of tools that would cost millions. This approach for large hedge funds like Bridgewater is ideal because it allows you better integration between learning algorithms and trading signals.
Quantitative investment firms such as the $24 billion Two Sigma Investments and the $25 billion Renaissance Technologies are increasingly hiring programmers and engineers to augment their AI workforces. Machine learning gives hedge funds a competitive advantage in markets where trading has been hurt by rich asset prices, according to Gustavo Dolfino, CEO of trading firm WhiteRock Group. Dolfino says: “Machine learning is the new wave of investment for the next 20 years and smart players are focusing on it.”
The other expensive skill needed is the knowledge engineer with experience in the market and trading. They have to use the tools and understand the algorithms, but not at the level of the developers who have written the code.
The cost of creating algorithms delayed the research. This is why R has become so valuable. R has thousands of libraries built with open source by a community of developers. If a company were to develop these libraries, it would cost tens of millions of dollars, but with R you can take advantage of them and only have to worry about knowledge engineering and system development.
The R programming language is now one of the most popular programming languages for machine learning and trading.
R offers libraries for machine learning, statistical analysis, fractal and wavelet analysis, natural language processing, and much more. R requires a learning curve, but when you consider the advanced tools in terms of the modeling it can do, it’s worth it. Backtesting in R is very primitive. Because of this, the best use of R for traders is to do analysis that aids system development or exit. The output produced by your market studies done in R can be used as input data in a strategy. Some trading platforms may use R-Scripts. We currently have R available for TradersStudio as a plugin. At the time of writing this it is in beta with a release in the year 2017.
These are free sources for learning R. There are also plenty of paid courses for learning R and cheap services with lots of good courses on R and machine learning on DataCamp.
Trader tools for everyone
Now we can all use the trading tools used by major hedge funds using R and also to some extent Python, although R has the best interface to external programs allowing for better integration between trading strategies and intelligence. These tools range from state-of-the-art time series analysis tools such as hybrid Arima/Garch models, through wavelets and machine learning methods, to a host of neural network algorithms, rule induction, and evolutionary algorithms, from genetic programming to swarm technology. In addition, there are chaos theory and game theory modeling tools, as well as hidden Markov models.
Even advanced time series modeling tools can take trading where it hasn’t been before. TradersStudio has developed a hybrid Arima/Garch system. This model predicts the close of the S&P 500 for the next day minus the close of the current day. This is fine because futures trade after 4:00 p.m. and the SPY is liquid until about 6:00 p.m., so it is possible to get a signal on the close and place a trade a minute or two later. Raw predictions profited by over 2,800 points, proving that this type of technology is important and valuable to traders. Here is how machine learning methods that build a recursive tree can be used in trading.
Money grows on trees
Recursive partitioning is a statistical method for multivariate analysis. Recursive partitioning creates a decision tree that tries to correctly classify the members of the population by dividing it into subpopulations based on several dichotomous independent variables. The process is called recursive because each subpopulation can be divided in turn an indefinite number of times.
Recursive partitioning methods have been developed since the 1980s. Among the best-known recursive partitioning methods are Ross Quinlan’s ID3 algorithm and its successors, C4.5 and C5.0, and classification and regression trees. Co-learning methods, such as random forests, help to overcome the usual problem with these methods (their vulnerability to overfitting of the data) by employing different algorithms and combining their results in some way.
A variation is the Cox linear recursive partition. These methods can be used to choose stocks based on fundamentals and judge whether a stock is overvalued, correctly valued, or undervalued (we will only use undervalued stocks).
These methods can also be used to predict market returns in the next period on categorical variables, such as: big rise, rise, stability, fall and big decline. Let’s look at a simple example. The goal is not to give you the holy grail, but to show you how you can start building these models yourself. R is a great language for data preprocessing, but it requires a deeper understanding of R’s data types. For example, some libraries use Arrays, Data Frames while others use xts which is based on the zoo library and you need to pass the right type. There are functions to do this conversion on the fly. Another option if you are not an R expert is to do your pre-processing in TradersStudio or TradeStation as we feel more confident doing data processing on those platforms and we know how to call all the indicators we need as well as write the new ones that come to mind. We’ll create a CSV file that we’ll load into R and test these tree algorithms.
We will make life easy for ourselves and use TradersStudio’s print terminal to output the data. Next, we will save the print terminal to a file. We will do our analysis using weekly data from the S&P 500. We will use earnings from the S&P 500, Dow Transports, and S&P Dividends in addition to price data from the S&P 500. Our goal is to predict how the S&P 500 will move one week into the future. We will do this using the Regression Tree using the RPart Library in R. We originally predicted the direction of the S&P 500 using a series of inputs. We fail a lot. The tree could resolve any split that added information to the output class split. The key in developing these tree algorithms is to intelligently define the predictor variable.
The problem is the noise. One day direction prediction is very noisy and we can have whipsaw trades, which would cost additional slippage and commissions. We really don’t want to trade when the market may move slightly against us, and we also want to filter out the noise. These factors are offset by profit. In this way, we should test our objective as if we knew it perfectly to see the profitability it would have had. We have chosen to use the following objective.
The ForClose series is the price series for the close moved one bar into the future. You can see that we do not change positions when the price does not move more than 10% of the range against you one bar in the future.
Due to the noise reduction, this target is much easier to predict and still makes crazy money if we could predict it perfectly. Doing a backtest from January 10, 1980 to December 9, 2016, without slippage or commissions we earned more than 23,000 points if we knew this objective perfectly. Knowing the price change perfectly, one bar in the future makes more than 26,700 points, approximately 12% more but it is much more difficult to predict. We will use my new target for the rest of this review.
We use weekly data from the S&P 500, S&P 500 Earnings, S&P 500 Dividends, and DJ Transports for our model. We call this file DataWkPred.csv. We’ll use it together with R’s RPart package to implement a recursive tree. We will use 80% of our data for training and 20% for testing, outside of the sample data. (Both codes are available online.)
This code produces a lot of information. We will study the most interesting and useful information in this output (see “Explanation of the output” below).
The complexity parameter explains the amount by which the relative error will change if a division is performed. Number of splits shows that the number of splits performed is zero and therefore the total number of nodes is 0+1 = 1, which is the parent node; nsplit = 1 indicates that a split has been performed, that is, the parent node has been split and now the total number of nodes becomes 1+1 =2. The relative error of the splits is measured relative to the parent node. The parent node is considered to have the 1,000 error, assuming the largest number of misclassifications.
The table shows that nsplit=0 which indicates the parent node. The relative error in this case is 1.0. The value of the complexity parameter is 0.18969849, indicating that 1 split is performed (nsplit = 1). The relative error becomes 1-0.18969848 = 0.8103015, which is evident in the second line. The output is shown up to where the value of the complexity parameter is very small. In this case in nsplit = 4 because there is only a marginal reduction of the error. The ten-fold cross-validation error (xerror) is the data set in 10 samples. Each of the samples is divided into test and training in a ratio of 90:10. The model is then trained and the error is calculated and the sample averaged.
In this case, the error is calculated at each division. It is also measured relative to the parent node. The standard error is generally used to obtain the optimal number of divisions in which the standard error is minimum and 1-SE is maximum.
This procedure is followed because initially the standard error decreases with the number of divisions, but after a certain division the standard error increases. This division value can be considered as the optimal division value. However, there is another general rule used to determine the optimal number of splits and prune the tree, which is relative error + SE < xerror.
It follows from the above examples that according to the “1-SE” rule, the optimal number of splits is 2, since the value of SE at nsplit=3 is greater than the value of SE at SE=2.
As a general rule, the optimal number of divisions should be taken as 3 because at this stage, the relative error +SE is less than the xerror. The importance of the variable is emitted when we execute the R code for the recursive tree.
We also have a lot of information about how the nodes are partitioned and using which critical values. This information tells us how much each division helps to separate the output class and at the leaf level it gives us the precision of the system. Each variable can be used multiple times in the same tree. We also performed an analysis of the output distribution of our data. For example, 57% of the cases correspond to the “buy” output class. It also creates an output that can be translated into a tree (see “Money Tree” below).
The output of the code also gives us all kinds of information about the prediction of each split and the amount of information that is obtained. This code uses the rpart.plot function, which outputs a nice tree, with more details (see “Tree Pruning” below).
Pruning is a technique used to reduce the size of a decision tree and preserve important contributions to instance classification. Too large a tree can cause overfitting of the data and misclassify a new observation. If it is too small, it may not contain all the structural information. The code in R includes the pruning code. The pruning code will remove two leaves from 7 to 5 and improve the robustness of the tree.
A confusion matrix or error matrix is a table that ranks predictions based on whether they match actual values. One of the addresses in the table lists possible categories for predicted values, and the other includes categories for other values. When the predicted value matches the original value, it is called the correct classification. This falls on the diagonal of the confusion matrix. Off-diagonal elements indicate incorrect predictions (see “Prediction Statistics,” below).
For the Buy class: The correct classifications are (True Positive Rate) = 174/205 = 0.8487. This is also known as sensitivity. Misclassification is known as the false positive rate = 71/158 = 0.4493.
For the Class Sell: Correct classifications (True Negative Rate) = 87/158 = 0.5507. This is also known as specificity. Incorrect Classification is known as False Negative Rate = 31/205 = 0.1513.
Machine learning in trading is not just the new thing, but an essential tool for traders to succeed in today’s markets. Here we show how we can use recursive trees, in future installments we will cover methods that will help you maintain your advantage in trading the markets.
This article was written by MURRAY A. RUGGIERO JR.
and published in Futures magazine on February 19, 2017
Quantified Models Youtube Channel
On our YouTube channel we have several videos available that you may find very useful for developing trading systems. To access, click this link: Quantified Models YouTube Channel
We hope this information has been useful to you.