
In the previous article, we became familiar with the theoretical aspects of a multi-task learning framework based on the ResNeXt architecture, proposed for building financial market analysis systems. Multi-Task Learning (MTL) uses a single encoder to process the input data and multiple specialized “heads” (outputs), each designed to solve a specific task. This approach offers a number of advantages.
First, the use of a shared encoder facilitates the extraction of the most robust and universal patterns in the data that prove useful across diverse tasks. Unlike traditional approaches, where each model is trained on a separate subset of data, a multi-task architecture forms representations that capture more fundamental regularities. This makes the model more general-purpose and more resilient to noise in the raw data.
Second, joint training of multiple tasks reduces the likelihood of model overfitting. If one of the subtasks encounters low-quality or weakly informative data, the other tasks compensate for this effect through the shared encoder structure. This improves the stability and reliability of the model, especially under the highly volatile conditions of financial markets.
Third, this approach is more efficient in terms of computational resources. Instead of training several independent models that perform related functions, multi-task learning enables the use of a single encoder, reducing computational redundancy and accelerating the training process. This is particularly important in algorithmic trading, where model latency is critical for making timely trading decisions.
In the context of financial markets, MTL provides additional benefits by enabling the simultaneous analysis of multiple market factors. For example, a model can concurrently forecast volatility, identify market trends, assess risk, and incorporate the news background. The interdependence of these aspects makes multi-task learning a powerful tool for modeling complex market systems and for more accurate price dynamics forecasting.
One of the key advantages of multi-task learning is its ability to dynamically shift priorities among different subtasks. This means that the model can adapt to changes in the market environment, focusing more on the aspects that have the greatest impact on current price movements.
The ResNeXt architecture, chosen by the framework authors as the basis for the encoder, is characterized by its modularity and high efficiency. It uses grouped convolutions, which significantly improve model performance without a substantial increase in computational complexity. This is especially important for processing large streams of market data in real time. The flexibility of the architecture also allows model parameters to be tailored to specific tasks: varying network depth, convolutional block configurations, and data normalization methods, making it possible to adapt the system to different operating conditions.
The combination of multi-task learning and the ResNeXt architecture yields a powerful analytical tool capable of efficiently integrating and processing diverse information sources. This approach not only improves forecast accuracy but also allows the system to rapidly adapt to market changes, uncovering hidden dependencies and patterns. Automatic extraction of significant features makes the model more robust to anomalies and helps minimize the impact of random market noise.
In the practical part of the previous article, we examined in detail the implementation of the key components of the ResNeXt architecture using MQL5. During this work, a grouped convolution module with a residual connection was created, implemented as the CNeuronResNeXtBlock object. This approach ensures high system flexibility, scalability, and efficiency in processing financial data.
In the present work, we move away from creating the encoder as a monolithic object. Instead, users will be able to construct the encoder architecture themselves, using the already implemented building blocks. This will not only provide greater flexibility but will also expand the system’s ability to adapt to various types of financial data and trading strategies. Today, the primary focus will be on the development and training of models within the multi-task learning framework.
Model Architecture
Before proceeding with the technical implementation, it is necessary to define the key tasks solved by the models. One of them will perform the role of an Agent, responsible for generating the parameters of trading operations. It will produce trade parameters, similar to the architectures discussed earlier. This approach helps avoid excessive duplication of computations, improves the consistency of forecasts, and establishes a unified decision-making strategy.
However, such a structure does not fully employ the potential of multi-task learning. To achieve the desired effect, an additional model will be added to the system, trained to forecast future market trends. This predictive block will improve forecast accuracy and enhance the model’s resilience to sudden market changes. Under conditions of high market volatility, this mechanism enables the model to quickly adapt to new information and make more precise trading decisions.
Integrating multiple tasks into a single model will create a comprehensive analytical system capable of accounting for numerous market factors and interacting with them in real time. This approach is expected to provide a higher degree of knowledge generalization, improve forecast accuracy, and minimize risks associated with erroneous trading decisions.
The architecture of the trained models is defined in the CreateDescriptions method. The method parameters include two pointers to dynamic array objects, into which the model architectures will be written.
bool CreateDescriptions(CArrayObj *&actor, CArrayObj *&probability) { CLayerDescription *descr; if(!actor) { actor = new CArrayObj(); if(!actor) return false; } if(!probability) { probability = new CArrayObj(); if(!probability) return false; }
A key implementation feature is the creation of two specialized models: the Actor and a predictive model responsible for the probabilistic assessment of the upcoming price movement direction. The environment state Encoder is integrated directly into the Actor architecture, allowing it to form rich representations of market data and capture complex dependencies. In turn, the second model receives its input from the Actor’s latent space, using its learned representations to generate more accurate predictions. This approach not only improves forecasting efficiency but also reduces computational load, ensuring coordinated operation of both models within a unified system.
In the method body, we first verify the validity of the received pointers and, if necessary, create new instances of the dynamic array objects.
Next, we proceed to build the architecture of the Actor, starting with the environment encoder. The first component is a base neural layer used to record the raw input data. The size of this layer is determined by the volume of the analyzed data.
actor.Clear(); if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronBaseOCL; int prev_count = descr.count = (HistoryBars * BarDescr); descr.activation = None; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; }
No activation functions are applied, since, in essence, the output buffer of this layer directly stores the raw data obtained from the environment. In our case, these data are received directly from the terminal, which allows their original structure to be preserved. However, this approach has a significant drawback: the lack of preprocessing can negatively affect the model’s trainability, as the raw data contain heterogeneous values that differ in scale and distribution.
To mitigate this issue, a batch normalization mechanism is applied immediately after the first layer. It performs preliminary data standardization, bringing the inputs to a common scale and improving their comparability. This significantly enhances training stability, accelerates model convergence, and reduces the risk of gradient explosion or vanishing. As a result, even when working with highly volatile market data, the model gains the ability to form more accurate and consistent representations, which is critically important for subsequent multi-task analysis.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronBatchNormOCL; descr.count = prev_count; descr.batch = 1e4; descr.activation = None; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; }
Next, we use a convolutional layer that transforms the feature space, bringing it to a standardized dimensionality. This makes it possible to create a unified data representation, ensuring consistency at subsequent processing stages. The Leaky ReLU (LReLU) activation function is used, which helps reduce the influence of minor fluctuations and random noise while preserving the important characteristics of the original data.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronConvOCL; descr.count = HistoryBars; descr.window = BarDescr; descr.step = BarDescr; descr.window_out = 128; descr.activation = LReLU; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; }
After completing the preliminary data preprocessing, we proceed to designing the architecture of the environment state Encoder, which plays a key role in analyzing and interpreting the raw input data. The primary objective of the Encoder is to identify stable patterns and hidden structures within the analyzed dataset, enabling the formation of an informative representation for subsequent processing by decision-making models.
Our Encoder is built from three sequential ResNeXt architecture blocks, each of which uses grouped convolutions for efficient feature extraction. In each block, a convolutional filter is applied with a window size of 3 elements of the analyzed multidimensional time series and a convolution stride of 2 elements. This ensures that the dimensionality of the original sequence is halved in each block.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronResNeXtBlock; { int temp[] = {128, 256}; if(ArrayCopy(descr.windows, temp) < int(temp.Size())) return false; } { int temp[] = {HistoryBars, 4, 32}; if(ArrayCopy(descr.units, temp) < int(temp.Size())) return false; } descr.window = 3; descr.step = 2; descr.window_out = 1; descr.batch = 1e4; descr.activation = None; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; } int units_out = (descr.units[0] – descr.window + descr.step – 1) / descr.step + 1;
In accordance with the principles of the ResNeXt architecture, the reduction in the dimensionality of the analyzed multidimensional time series is compensated by a proportional increase in feature dimensionality. This approach preserves the informativeness of the data while providing a more detailed representation of the structural characteristics of the time series.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronResNeXtBlock; { int temp[] = {256, 512}; if(ArrayCopy(descr.windows, temp) < int(temp.Size())) return false; } { int temp[] = {units_out, 4, 64}; if(ArrayCopy(descr.units, temp) < int(temp.Size())) return false; } descr.window = 3; descr.step = 2; descr.window_out = 1; descr.batch = 1e4; descr.activation = None; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; } units_out = (descr.units[0] – descr.window + descr.step – 1) / descr.step + 1;
In addition, as the dimensionality of the feature space increases, we proportionally expand the number of convolution groups while keeping the size of each group fixed. This allows the architecture to scale efficiently, maintaining a balance between computational complexity and the model's ability to extract complex patterns from the data.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronResNeXtBlock; { int temp[] = {256, 512}; if(ArrayCopy(descr.windows, temp) < int(temp.Size())) return false; } { int temp[] = {units_out, 4, 64}; if(ArrayCopy(descr.units, temp) < int(temp.Size())) return false; } descr.window = 3; descr.step = 2; descr.window_out = 1; descr.batch = 1e4; descr.activation = None; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; } units_out = (descr.units[0] – descr.window + descr.step – 1) / descr.step + 1;
After three ResNeXt blocks, the feature dimensionality increases to 1024, with a proportional reduction in the length of the analyzed sequence.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronResNeXtBlock; { int temp[] = {512, 1024}; if(ArrayCopy(descr.windows, temp) < int(temp.Size())) return false; } { int temp[] = {units_out, 4, 128}; if(ArrayCopy(descr.units, temp) < int(temp.Size())) return false; } descr.window = 3; descr.step = 2; descr.window_out = 1; descr.batch = 1e4; descr.activation = None; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; } units_out = (descr.units[0] – descr.window + descr.step – 1) / descr.step + 1;
Next, the ResNeXt architecture provides for compressing the analyzed sequence along the time dimension, retaining only the most significant characteristics of the analyzed environment state. For this, we first transpose the resulting data:
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronTransposeOCL; descr.count = units_out; descr.window = 1024; descr.batch = 1e4; descr.activation = None; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; }
Then, we use a pooling layer, which reduces the dimensionality of the data while preserving the most important characteristics. This enables the model to focus on key features, eliminating unnecessary noise and providing a more compact representation of the original data.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronProofOCL; descr.count = 1024; descr.step = descr.window = units_out; descr.batch = 1e4; descr.activation = None; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; }
Remember the ordinal number of this layer. This is the final layer of our environment state Encoder, and it is from this layer that we will take the input data for the second model.
Next comes the Decoder of our Agent, consisting of two sequential fully connected layers.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronBaseOCL; descr.count = 256; descr.activation = SIGMOID; descr.batch = 1e4; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; } if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronBaseOCL; descr.count = NActions; descr.activation = SIGMOID; descr.batch = 1e4; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; }
Both layers use the sigmoid function as the activation function and gradually reduce the tensor dimensionality to the predefined action space of the Agent.
It should be noted here that the Agent created above analyzes only the raw environment state and is completely devoid of a risk management module. We compensate for this limitation by adding a risk management Agent layer, implemented within the MacroHFT framework.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronMacroHFTvsRiskManager; { int temp[] = {3, 15, NActions, AccountDescr}; if(ArrayCopy(descr.windows, temp) < int(temp.Size())) return false; } descr.count = 10; descr.window_out = 16; descr.step = 4; descr.batch = 1e4; descr.activation = None; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; }
We also add a convolutional layer with a sigmoid activation function, which maps the Agent's outputs into the specified value space. We use a convolution window of size 3, which corresponds to the parameters of a single trade. This approach makes it possible to obtain consistent trade characteristics.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronConvOCL; descr.count = NActions / 3; descr.window = 3; descr.step = 3; descr.window_out = 3; descr.activation = SIGMOID; descr.optimization = ADAM; if(!actor.Add(descr)) { delete descr; return false; }
At the next stage, we move on to describing the model for forecasting the probabilities of upcoming price movements. As mentioned above, our predictive model receives its input data from the Agent's latent state. To ensure dimensional consistency between the latent state and the input layer of the second model, we decided to abandon manual architectural adjustments. Instead, we extract the description of the latent state layer from the Agent's architecture description.
probability.Clear(); CLayerDescription *latent = actor.At(LatentLayer); if(!latent) return false;
The parameters of the extracted latent state description are then transferred to the input layer of the new model.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronBaseOCL; prev_count = descr.count = latent.count; descr.activation = latent.activation; descr.optimization = ADAM; if(!probability.Add(descr)) { delete descr; return false; }
Using the latent state of another model as input data allows us to work with already processed and mutually comparable data. Consequently, there is no need to apply a batch normalization layer for primary input preprocessing. Moreover, the outputs of the ResNeXt blocks are already normalized.
To obtain predictive values for the forthcoming price movement direction, we use two sequential fully connected layers with a sigmoid activation function between them.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronBaseOCL; descr.count = 256; descr.activation = SIGMOID; descr.batch = 1e4; descr.optimization = ADAM; if(!probability.Add(descr)) { delete descr; return false; } if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronBaseOCL; prev_count = descr.count = NActions / 3; descr.activation = None; descr.batch = 1e4; descr.optimization = ADAM; if(!probability.Add(descr)) { delete descr; return false; }
The outputs of the fully connected layers are then mapped into a probabilistic space using the SoftMax function.
if(!(descr = new CLayerDescription())) return false; descr.type = defNeuronSoftMaxOCL; prev_count = descr.count = prev_count; descr.step = 1; descr.activation = None; descr.batch = 1e4; descr.optimization = ADAM; if(!probability.Add(descr)) { delete descr; return false; } return true; }
It is important to note that our model predicts probabilities for only two directions of price movement: upward and downward. The probability of flat (sideways) movement is deliberately not considered, since even a sideways market in practice represents a sequence of short-term price fluctuations with approximately equal amplitude and opposite directions. This approach allows the model to focus on identifying fundamental dynamic market patterns without wasting computational resources on describing complex but less significant flat states.
After completing the description of the model architectures, all that remains is to return the logical result of the performed operations to the calling program and terminate the method execution.
Model Training
Now that we have defined the model architectures, we can move on to the next stage — training. For this purpose, we will use the training dataset collected during the development of the MacroHFT framework. The dataset construction process is described in detail in the corresponding article. Let me remind you that this training dataset was built using historical data of the EURUSD currency pair for the entire year 2024 on the M1 timeframe.
However, to train the models, we need to introduce several modifications to the Expert Advisor algorithm located at …MQL5ExpertsResNeXtStudy.mq5. Within the scope of this article, we will focus exclusively on the Train method, since it is where the entire training process is organized.
void Train(void) { vector probability = vector::Full(Buffer.Size(), 1.0f / Buffer.Size());
At the beginning of the training method, we usually compute probability vectors for selecting different trajectories based on their profitability. This makes it possible to correct the imbalance between profitable and unprofitable episodes, since in most cases the number of losing sequences significantly exceeds the number of profitable ones. However, in the present work, the models are planned to be trained on nearly ideal trajectories, where the sequence of the agent’s actions is formed in accordance with historical price movement data. As a result, the probability vector is filled with equal values, ensuring uniform representation of the entire training dataset. This approach allows the model to learn the key characteristics of market data without artificially biasing priorities toward certain scenarios at the expense of others. This improves generalization capability and model robustness.
Next, we declare a number of local variables required for temporary data storage during the execution of operations.
vector result, target, state; matrix fstate = matrix::Zeros(1, NForecast * BarDescr); bool Stop = false; uint ticks = GetTickCount();
This concludes the preparatory work. We then proceed to create the system of training loops for the models.
It should be noted that the ResNeXt architecture itself does not use recurrent blocks. Therefore, for its training, it is reasonable to apply learning within a single loop of random state sampling from the training dataset. However, we have added a risk management agent that uses memory modules of past decisions and changes in account state resulting from their execution. Training this module requires preserving the historical sequence of the input data.
In the body of the outer loop, we sample the initial state of a mini-batch historical sequence from the training dataset.
for(int iter = 0; (iter < Iterations && !IsStopped() && !Stop); iter += Batch) { int tr = SampleTrajectory(probability); int start = (int)((MathRand() * MathRand() / MathPow(32767, 2)) * (Buffer[tr].Total – 2 – NForecast – Batch)); if(start

