Stochastic and Dynamic Click-through and Sell-through Optimization: A Machine Learning and Optimal Control Approach

Project: Research

View graph of relations


The product click-through and sell-through rates are two major indexes that attract attention of e-retailers most of the time. The click-through and sell-through rates represent the number of interested (clicked) and purchased (sold) customers for a particular product in a unit of time, respectively. To improve click-through and sell-through rates, the e-retailer can choose to make an advertisement and to introduce a price reduction for the product. Note that both the click-through and sell-through rates are dynamic and stochastic, and the effects of the advertisement and the price are random too. Therefore, by taking these dynamic and stochastic factors into consideration, it is challenging for the e-retailer to reach the sales target while balancing gains and losses of the advertisement for the click-through rate and the price discount for the sell-through rate. We have collaborated with e-retailers in the last few years. From these collaborations, we learned that many major e-retailers and e-commerce platforms record customer shopping behaviors. This observation leads us to consider formulating the higher and lower levels of click-through and sell-through rates into two independent Markov chains with prevailing transitional probability of each other. Specifically, choosing a product for advertisement is a multi-armed bandit problem with restless multi-Markovian chains where the click-through rate is increased when the product is advertised; choosing when to reduce the product price is an optimal control problem characterized by a two-state Markovian chain. In literature, studies of multi-armed bandit problem are limited to single and rest Markovian chains; and the optimal control for Markovian chains are limited to the deterministic demand. More interestingly, in this modeling approach, the learning result of MAB cascades into the above-mentioned optimal control model for price reduction. To summarize, we plan to tackle this dynamic and stochastic problem with tools of machine learning and optimal control, where we propose a multi-armed bandit (MAB) model with Thompson sampling algorithm to select the product for advertisement and to learn the random demand while tracking a pre-defined sales target. We also plan to use a stochastic optimal control approach to design an optimal pricing strategy. Results generated in this project will be documented for publications in the top-tier journals. At the same time, we also believe that these results will provide useful guidelines for practical applications. 


Project number9043246
Grant typeGRF
StatusNot started
Effective start/end date1/01/22 → …