
In this series of articles, we are trying to create an automatic optimization system that allows finding good combinations of parameters of one trading strategy without human intervention. These combinations will then be combined into one final EA. The objective is set in more detail in part 9 and part 11. The process of such a search itself will be controlled by one EA (optimizing EA), and all data that will need to be saved during its operation is set in the main database.
In the database, we have tables to store information about several classes of objects. Some have a status field that can take values from a fixed set of values (“Queued”, “Process”, “Done”), but not all classes use this field. More precisely, for now it is used only for optimization tasks (task). Our optimizing EA searches the task table (tasks) for the Queued tasks to select the next task to run. After each task is completed, its status in the database changes to Completed.
Let’s try to implement status auto updates not only for tasks, but also for all other classes of objects (jobs, stages, projects) and arrange automatic execution of all necessary stages up to obtaining the final EA, which can work independently without connecting to the database.
First of all, we will take a close look at all classes of objects in the database that have a status, and formulate clear rules for changing the status. If this can be done, then we can implement these rules as calls to additional SQL queries either from the optimizing EA or from the stage EAs. Or it might be possible to implement them as triggers in the database that are activated when certain data change events occur.
Next, we need to agree on a method for determining the order, in which tasks are completed. This was not much of an issue before, since during the development we trained each time on a new database, and added project stages, work and tasks exactly in the order in which they had to be completed. But when moving to storing information on multiple projects in the database, or even automatically adding new projects, it will no longer be possible to rely on this method of determining the order of priority of tasks. So let’s spend some time on this issue.
To test the operation of the entire conveyor, where all the tasks of the auto optimization project will be executed in turn, we need to automate a few more actions. Before this, we performed them manually. For example, after completing the second stage of optimization, we have the opportunity to select the best groups for use in the final EA. We performed this operation by running the third stage EA manually, that is, outside the auto optimization conveyor. To set the launch parameters for this EA, we also manually selected the IDs of the second stage passes with the best results using a third-party database access interface in relation to MQL5. We will try to do something about this as well.
So after the changes made, we expect to finally make a fully finished conveyor for performing auto optimization stages to obtain the final EA. Along the way, we will consider some other issues related to increasing the work efficiency. For example, it seems that the second and subsequent stages of the EAs will be the same for different trading strategies. Let’s check if this is true. We will also see what is more convenient – to create different smaller projects or to create a larger number of stages or works in a larger project.
Let’s start by formulating the rules for changing statuses. As you might remember, our database contains information about the following objects that have a status field (status):
The possible status values are the same for each of these four object classes, and can be one of the following:
Let us describe the rules for changing the statuses of objects in the database in accordance with the normal cycle of the project auto optimization conveyor. The cycle begins when a project is queued for optimization, i.e. it is assigned the Queued status.
Thus, changing the project status to Queued will lead to a cascading update of the statuses of all stages, works and tasks of this project to Queued. All these objects will be in this status until the Optimization.ex5 EA is launched.
After the launch, at least one task in the Queued status must be found. Later, we will consider sorting order with multiple tasks. The task status changes to Process. This causes the following actions:
After this, tasks will be carried out sequentially within the framework of the project stages. Further status changes can only occur after the completion of the next task. At this point, the task status changes to Done and may cause this status to be cascaded to higher-level objects.
Thus, when the last task of the last job of the last stage is completed, the project itself will move to the completed state.
Now that all the rules are formulated, we can move on to creating triggers in the database that implement these actions.
Let’s start with the trigger for handling the change of the project status to Queued. Here is one possible way to implement it:
After its completion, the statuses for the project stages will also be changed to Queued. Thus, we should launch the corresponding triggers for stages, jobs and tasks:
The task launch will be handled by the following trigger, which sets the task start date, clears the pass data from the previous launch of the task, and updates the job status to Process:
Next, the stage and project statuses, within which this job is being performed, are cascaded to Process:
In the trigger that is activated when the task status is updated to Done, that is, when the task is completed, we update the task completion date and then (depending on the presence or absence of other tasks in the queue for execution within the current task job) we will update the job status to either Process or Done:
Let’s do the same with the stage and project statuses:
We will also provide the ability to transfer all project objects to the Done state when setting this status to the project itself. We did not include this scenario in the list of rules above, since it is not a mandatory action in the normal course of auto optimization. In this trigger, we set the status of all unexecuted or ongoing tasks to Done, which will result in setting the same status for all project jobs and stages:
Once all these triggers are created, let’s figure out how to determine the task execution order.
So far we have only worked with one project in the database, so let’s start by looking at the rules for determining the order of tasks for this case. Once we have an understanding of how to determine the order of tasks for one project, we can think about the order of tasks for several projects launched simultaneously.
Obviously, optimization tasks related to the same job and differing only in the optimization criterion can be performed in any order: sequential launch of genetic optimization on different criteria does not use information from previous optimizations. Different optimization criteria are used to increase the diversity of good parameter combinations found. It has been observed that genetic optimizations with the same ranges of tried inputs, albeit with different criteria, converge to different combinations.
Therefore, there is no need to add any sorting field to the task table. We can use the order, in which the tasks of one job were added to the database, that is, sort them by id_task.
If there is only one task within one job, then the order of execution will depend on the execution order of the jobs. The jobs were conceived to group or, more precisely, divide tasks into different combinations of symbols and timeframes. If we consider an example that we have three symbols (EURGBP, EURUSD, GBPUSD) and two timeframes (H1, M30) and two stages (Stage1, Stage2), then we can choose two possible orders:
With the first method of grouping (by symbol and timeframe), after each completion of the second stage, we will be able to receive something ready, that is, the final EA. It will include sets of single copies of trading strategies for those symbols and timeframes that have already passed both stages of optimization.
With the second method of grouping (by stage), the final EA will not be able to appear until all the work of the first stage and at least one work of the second stage are completed.
For jobs that only use the results of the previous steps for the same symbol and timeframe, there will be no difference between the two methods. But if we look a little ahead, there will be another stage where the results of the second stages for different symbols and timeframes will be combined. We have not reached its implementation as an automatic optimization stage yet, but we have already prepared a stage EA for it and even launched it, albeit manually. For this stage, the first grouping method is not suitable, so we will use the second one.
It is worth noting that if we still want to use the first method, then perhaps it will be enough for us to create several projects for each combination of symbol and timeframe. But for now the benefits seem unclear.
So, if we have several jobs within one stage, then the order of their execution can be any, and for jobs of different stages, the order will be determined by the order of priority of the stages. In other words, as is the case with tasks, there is no need to add any sorting field to the jobs table. We can use the order, in which the jobs of one stage were added to the database, that is, sort them by id_job.
To determine the order of stages, we can also use the data already available in the table of stages (stages). I have added the parent stage field to this table (id_parent_stage) at the very beginning, but it has not been used yet. Indeed, when we have only two rows in the table for two stages, there is no difficulty in creating them in the right order – first a row for the first stage, and then for the second. When there are more of them, and stages for other projects appear, it becomes more difficult to manually maintain the correct order.
So let’s take the opportunity to build a hierarchy of executing stages, where each stage is executed after its parent stage completes. At least one stage should not have a parent in order to occupy the top position in the hierarchy. Let’s write a test SQL query that will combine data from the tasks, jobs and stages tables and show all tasks of the current stage. We will add all the fields to the list of columns of this query so that we can see the most complete information.
Figure 1. Results of a query to get tasks of the current stage after starting one task
Later on, we will reduce the number of columns displayed when we use a similar query to find another task. In the meantime, let’s make sure that we correctly receive the next stage (along with its jobs and tasks). The results shown in Figure 1 correspond to the time when the task with id_task=3 was started. This is the task belonging to id_job=10, which is part of id_stage=10. This stage is called “First”, belongs to the project with id_project=1 and has no parent stage (parent_stage=NULL). We can see that having one running task leads to the Process status appearing for both the job and the project, within which this job is being performed. But the other job with id_job=5 still has the Queued status, since none of the job task have been started yet.
Let’s now try to complete the first task (by simply setting the status field in the table to Done) and look at the results of the same query:
Fig. 2. Results of a query to get tasks of the current stage after the completion of a running task
As you can see, the completed task has disappeared from this list, and the top line is now occupied by another task, which can be launched next. So far everything is correct. Now let’s launch and complete the top two tasks from this list, and launch the third task with id_task=7 for execution:
Fig. 3. Results of a query to get tasks of the current stage after completing the tasks of the first job and starting the next task
Now the job with id_job=5 has received the Process status. Next, we will run and complete the three tasks that are now shown in the results of the last query. They will disappear from the query results one by one. After the last one is completed, run the query again and get the following:
Fig. 4. Results of a query to get tasks of the current stage after all tasks of the first stage are completed
Now the query results include tasks from the jobs related to the following stages. id_stage=2 is the clustering of the results of the first stage, while id_stage=3 is the second stage, at which grouping of good examples of trading strategies obtained at the first stage is performed. This stage does not use clustering, so it can be run immediately after the first stage. So, its presence on this list is not a mistake. Both stages have a parent stage named First, which is now in the Done state.
Let’s simulate the launch and completion of the first two tasks and look at the query results again:
Fig. 5. Results of a query to get tasks after all clustering stage tasks are completed
The top lines of the results are expectedly occupied by two tasks of the second stage (named “Second”), but the last two lines now contain tasks of the second stage with clustering (named “Second with clustering”). Their appearance is somewhat unexpected, but does not contradict the acceptable order. Indeed, if we have already completed the clustering stage, then we can also launch the stage that will use the clustering results. The two steps shown in the query results are independent of each other, so they can be performed in any order.
Let’s run and complete each task again, selecting the top one in the results each time. The list of tasks received after each status change behaved as expected, the statuses of jobs and stages changed correctly. After the last task was completed, the query results were empty, since all assigned tasks of all jobs of all stages were completed, and the project moved to the Done state.
Let’s integrate this query into the optimizing EA.
We will need to make changes to the method for getting the ID of the next optimizer task, where there is already an SQL query that performs this task. Let’s take the query developed above and remove the extra fields from it, leaving only id_task. We can also replace sorting by a couple of jobs table fields (j.symbol, j.period) with j.id_job, since each job has only one value of these two fields. At the end, we will add a limit on the number of rows returned. We only need to get one line.
Now the GetNextTaskId() method looks like this:
Since we have decided to work with this file, let’s make another change along the way: remove passing the status via the method parameter from the method for obtaining the number of tasks in the queue. Indeed, we never use this method to get the number of tasks with Queued and Process status, which will then be used individually, not as a sum. Therefore, let;s modify the SQL query in the TotalTasks() method so that it always returns the total number of tasks with these two statuses, and remove the status input of the method:
Let’s save the changes to the Optimizer.mqh file of the current folder.
In addition to these modifications, we will also need to replace the old status name “Processing” with “Process” in several files, since we agreed to use it above.
It would also be useful to provide the ability to obtain some information about errors that may have occurred during the execution of the task that launches the Python program. Now, when such a program terminates abnormally, the optimizing EA simply gets stuck at the stage of waiting for the task to complete, or more precisely, for information about this event to appear in the database. If the program ends with an error, it is unable to update the task status in the database. Therefore, the conveyor will not be able to move further at this stage.
So far, the only way to overcome this obstacle is to manually re-run the Python program with the parameters specified in the task, analyze the causes of errors, eliminate them, and re-run the program.
Next, we planned to automate the third stage, where for each job of the second stage (which differ in the symbol and timeframe used) we select the best pass for inclusion in the final EA.
So far, the stage 3 EA has been taking a list of stage 2 pass IDs as input, and we have had to manually somehow select those IDs from the database. Apart from that, this EA only performed the creation, drawdown assessment and saving a group of these passes to the library. The final EA did not appear as a result of launching the third stage EA, since it was necessary to perform a number of other actions. We will return to the automation of these actions later, but for now let’s work on modifying the third stage EA.
There are different methods that can be used to automatically select pass IDs.
For example, from all the results of the passes obtained within the framework of one work of the second stage, we can select the best one in terms of the indicator of the normalized average annual profit. One such pass in turn will be the result of a group of 16 single instances of trading strategies. Then the final EA will include a group of several groups of instances of single strategies. If we take three symbols and two timeframes, then at the second stage we have 6 jobs. Then, at the third stage, we will get a group that will include 6 * 16 = 96 copies of single strategies. This method is the easiest to implement.
An example of a more complex selection method is this: for each second-stage job, we take a number of the best passes and try different combinations from all the selected passes. This is very similar to what we did in the second stage, only now we will be recruiting a group not from 16 single instances, but from 6 groups, and in the first of the six groups we will take one of the best passes of the first job, in the second – one of the best passes of the second job, and so on. This method is more complicated, but it is impossible to say in advance that it will significantly improve the results.
Therefore, we will first implement the simpler method and postpone the complication until later.
At this stage, we will no longer need to optimize the EA parameters. This will now be a single pass. To do this, we need to specify the appropriate parameters in the stage settings in the database: the optimization column should be 0.
In the EA code, we will add the optimization task ID to the inputs so that this EA can be launched in the conveyor while correctly saving the results of the pass to the database:
The passes_ parameter can be removed, but I will leave it for now just in case. Let’s write an SQL query that gets a list of the best pass IDs for the second stage jobs. If the passes_ parameter is empty, we take the IDs of the best passes. If the passes_ parameter passes on some specific IDs, then we will apply them.
This completes the modification of the third stage EA. Let’s move the project in the database to the Queued state and launch the optimizing EA.
Despite the fact that we have not yet implemented all the planned stages, we now already have a tool that automatically provides an almost ready final EA. After completing the third stage, we have two entries in the parameter library (strategy_groups table):
The first one contains the ID of the pass, in which the best groups of the second stage are combined without clustering. The second is the ID of the pass, in which the best groups of the second stage with clustering are combined. Accordingly, we can obtain initialization strings from the passes table for these pass IDs and look at the results of these two combinations.
Fig. 7. Results of the combined group of instances obtained without using clustering
Fig. 8. Results of the combined group of instances obtained using clustering
The variant without clustering shows the higher profit. However, the variant with clustering has higher Sharpe ratio and better linearity. But we will not analyze these results in detail for now, since they are not yet final.
The next step is to add stages for assembling the final EA. We need to export the library to get the ExportedGroupsLibrary.mqh include file in the data folder. Then we should copy this file to the working folder. This operation can be performed either using a Python program or using the system copy functions from the DLL. At the last stage, we just need to compile the final EA and launch the terminal with the new EA version.
All this will require a significant amount of time to implement, so we will continue its description in the next article.
So, let’s look at what we have got. We have put in order the auto execution of the first stages of the auto optimization conveyor, achieving their correct operation. We can look at the intermediate results and decide, for example, to abandon the clustering step. Or, on the contrary, leave it and remove the option without clustering.
Having such a tool will help us to conduct experiments in the future and try to answer difficult questions. For example, suppose we perform optimization on different ranges of inputs in the first stage. What is better – to combine them separately or together by the same symbols and timeframes?
By adding stages to the conveyor, we can implement the gradual assembly of increasingly complex EA.
Finally, we can consider the issue of partial re-optimization and even continuous re-optimization by conducting an appropriate experiment. Re-optimization here means repeated optimization at a different time interval. But more about that next time.
