A comparison of evolutionary computation and deep reinforcement learning for portfolio optimization
In portfolio management, asset allocation is one of the most crucial and difficult challenges investors face. Asset allocation is defined as a decision making process of spreading available funds into various financial assets. The most famous and widely used models for tackling asset allocation problems are mean variance, mean valueatrisk and Sharpe ratio. These models are solvable by quadratic programming, and they all rely heavily on the mean and standard deviation with the assumption that the data distribution is symmetrical. Unfortunately, a majority of the realworld problems exhibit asymmetric distributions; as a result, the modified Sharpe ratio is introduced to include skewness and kurtosis as the third and fourth moments of return. The results obtained in this study are based on the modified Sharpe ratio, and they apply and compare genetic algorithm, particle swarm optimisation, and deep deterministic policy gradient to solve the asset allocation problem. The former algorithms (genetic algorithms and particle swarm optimisation) are widely employed to generate high quality solutions in optimisation problems whilst the latter (deep deterministic policy gradient) has proved to be more effective in solving complex problems that cannot be solved by conventional techniques. The algorithms learn to evolve portfolio weights in maximising the modified Sharpe ratio. The dataset used is extracted from the banking sector of the Johannesburg stock exchange and wellknown stocks in the United States stock exchange. In measuring the performance of the three algorithms, a uniform allocation is used as a baseline asset allocation strategy. Uniform allocation divides portfolio weights equally among the assets in a portfolio. The results presented show that all three algorithms outclass the uniform allocation on numerous occasions. In general, the genetic algorithm and particle swarm optimisation provide relatively better results than deep deterministic policy gradient. The results are then tested on buyandhold. Even though the deep deterministic policy gradient did not perform well in evolving portfolio weights and took too long to run in training, it is comparable with the uniform allocation. The genetic algorithm outperforms the other algorithms with particle swarm optimisation following.
- Engineering