Data Science and ML (Part 43): Hidden Patterns Detection in Indicators Data Using Latent Gaussian Mixture Models (LGMM)

Latent Gaussian Mixture Model alongside a classifier model

Almost all trading strategies available that we use as traders are based on some pattern identification and detection. We examine indicators for patterns and confirmations, and sometimes we even draw objects and lines, such as support and resistance lines, to identify the market’s state.

While pattern detection is an easy task for us humans in financial markets, it is challenging to program and automate this process because of the nature of the markets (noisy and chaotic).

Some traders have adopted to the use of Artificial Intelligence (AI) and machine learning for this particular task using various computer vision-based techniques which process images data similar to what humans do, as we discussed in one of the previous articles.

In this article, we will discuss a probabilistic model named Latent Gaussian Mixture Model (LGMM), which is capable of detecting patterns. Given the indicators data, we will explore this model’s effectiveness in detecting hidden patterns and making accurate predictions in the financial markets.

Latent Gaussian Mixture Model is a probabilistic model that assumes data is generated from a mixture of multiple Gaussian distributions, each associated with a latent (hidden) variable.

It is an extension of the Gaussian Mixture Model (GMM) that incorporates latent variables that explain cluster assignment for each observation.

The probability distribution of the data is a weighted sum of several Gaussian distributions.

This involves two steps, Expectation and Minimization.

Step 01, Expectations.

This involves estimating the posterior probability that each data point belongs to each Gaussian.

This step involves updating the parameters using the .

In training, both step 01 and step 02 are repeated until the model converges.

We know that inside the indicators data, there are patterns that, as traders, we use in making informed trading decisions. Our goal is to use LGMM to detect those patterns first.

We start by collecting indicators data from MetaTrader 5 first using MQL5 language.

Inside a Python script (Jupyter Notebook), the first thing we do is load this data shortly after importing the dependencies and initializing the MetaTrader 5 desktop app.

Filename: main.ipynb

We Import the data from the common path (folder), which is where we saved it using MQL5.

Let’s prepare the target variable for a classification problem for later usage in classifier machine learning models. We drop non-indicator features along the way.

LGMM has produced an array of 3 elements on every row of predictions, each column representing the probability of the received data input belonging to one of the 3 clusters. The sum of probability for all 3 columns is equal to 1 on every row.

Since this is challenging to interpret as it stands, let’s convert this model into ONNX format, visualize the clusters in MQL5, and see what conclusions we can draw upon the outputs produced by this probabilistic model.

This model has a strange architecture with two outputs in the final node, one for the predicted label and the other for probabilities. We need to have this in mind when implementing the code for loading for this model in MQL5.

We need the output structure that takes multiple arrays of values to accommodate two output nodes, each with an array of outputs.

We made the predict method from this class to return two variables, the predicted label and a probability vector in a structure.

Let’s call the predict function inside the main function of an indicator to provide us with latent features.

Filename: LGMM Indicator.mq5

Inside the getX() function, we have to collect all indicator buffers in the same way as we did in the script when collecting the data for training.

Note: All indicators were initialized inside the Init function right after the model was initialized from the common folder, which is where we saved it using Python.

Finally, we run this indicator on the XAUUSD chart and the same timeframe that the model was trained on.

This indicator is still hard to interpret, but one pattern seems dominant, and that is the component presented in red color. It seems this pattern appears when the market is volatile (volatility is high) on either an uptrend or a downtrend. The remaining components are still not yet clear, this could be because we are not certain of the number of components we used for this model so, let us find the best number of components for this model.

Since the Mixture Model offered by Scikit-Learn produces information criterion values, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Let’s plot these values against their component range in one plot and spot the elbow point(s).

Now that we have 5 components inside of 3, meaning the model produces 5 probabilities that we can plot, we have to increase the number of colors in the indicator to 5 for the colored histogram and handle 5 different cases for the predicted labels.

It looks great, but still difficult to read as we are often used to dealing with simple oscillators that often show oversold and overbought regions. Don’t hesitate to explore this indicator and let us know your thoughts in the discussion section.

Now, let’s use the LGMM alongside a machine learning model.

We’ve now seen how we can use LGMM to produce latent features that represent the probability of a label belonging to a certain cluster, since it is difficult to understand these features. Let’ use them in a Random forest classifier model alongside the indicator features hoping that this machine learning model can figure out how latent features affect the trading signals.

Filename: main.ipynb

We already created the target variable before when splitting the training and testing data, here it is again for reference.

After training the LGMM, we used it to make predictions on the training and testing data.

Since this data is difficult to read, let’s add some feature names to it, making the features identifiable.

Let us stack these features alongside primary indicators data.

Outputs.

Let’s pass this combined data to a random forest classifier.

The resulting model has a bad performance on the validation sample, there is a lot we can do to improve it, but for now, let’s observe the feature importance plot produced by the model.

Latent features are proving to be important to the model, meaning they carry some patterns and information that contribute to the model’s predictions.

The reason for this underperforming model might be due to the nature of the target variable deployed. The lookahead value of 1 might be wrong.

Now, evaluating the model on both training and validation data produces a different outcome.

The model had an overall 54% accuracy, not a good one, but decent enough to make us believe what we are seeing on the feature importance plot.

Given this information, you now have a starting point for understanding the indicator.

The arrangement of the colors resembles the latent features.

Inside the Expert Advisor (EA), we start by importing necessary libraries.

Filename: LGMM BASED EA.mq5

Again, we have to ensure that we are using the same symbol and timeframe as the one used in the training data.

We initialize both models, the LGMM and the Random forest classifier model, inside the OnInit function.

Inside the getX function, we call LGMM to prepare latent features that can be used alongside the indicators data for final inputs of the Random forest classifier model.

Finally, we make a simple trading strategy that relies on trading signals produced by the random forest classifier model.

We close trades after a LOOKAHEAD number of bars have passed on the timeframe the model was trained on. LOOKAHEAD value needs to match the one used in making the target variable inside the training script.

Like this:

Related

Share this:

Like this:

Related

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.