Reinforcement Learning for Option Hedging


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date2 Aug 2022


Option hedging is one of the most fundamental and significant decision-making problems in the financial field. Many strict restrictions and assumptions in traditional methods encounter many difficulties in the real world. With the development of machine learning, many researchers resort to data-driven means to deal with the problem of option hedging. In this thesis, we formulate the option hedging problem in a reinforcement learning framework without making any assumptions on the underlying dynamics by defining the appropriate state and action space, as well as effective reward functions. Model-free reinforcement learning methods can relax the dependence on strict constraints on the market, which is superior to many traditional financial methods.

Chapter 1 is devoted to an introduction to the background for option hedging problems, which includes both traditional methods in the financial field, and some recent works based on machine learning approaches, particularly the reinforcement learning methods. The main research methods and main contributions of this thesis are also introduced in this chapter based on the current development of option hedging problems.

In Chapter 2, we detailedly review some papers highly relevant to this thesis and demonstrate the evolution of option hedging in the reinforcement learning framework over the past few years. Special attention is paid to some convex risk measures, which can be utilized in our hedging criterion.

Classical and representative reinforcement learning methods, including value-based algorithms and policy-based algorithms, are reviewed in Chapter 3, especially the policy gradient algorithms, which are flexible in how to parameterize the policy function and are the primary methods of this thesis.

Chapter 4 presents a discrete formulation of the option hedging problem in the reinforcement learning framework. We design the reward to encourage a preferable policy in the view of an option seller. The Monte Carlo policy gradient method with baseline is adopted to seek a stochastic policy on the discrete action space. Several sets of experiments based on both synthetic and real data are designed to evaluate the good performance of our method compared with the Black Scholes model. We show that strict assumptions on the market like no transition cost in the Black Sholes model can be omitted in our method.

In Chapter 5, we extend the toy formulation in Chapter 4 to the more general and complex case. The formulation is refined to be adaptive to the continuous action space and the extended state space, which is able to target multiple options. We propose new reward functions based on hedging errors to lead to perfect hedging or profit-preferred hedging, and specially design a terminal reward combined with some convex risk measures to control the risk of loss. Also, we look into the performance of our model under this formulation using the revised proximal policy optimization (PPO) method. In addition to generative data, we conduct many experiments based on real data to demonstrate the practical value of our model.