Joint Advertising and Pricing Optimization for E-commerce Platform: Learning and Optimal Control with Long-run Average Objectives
電商平臺上的聯合定價和廣告決策:平均意義下的學習和最優控制策略
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 5 Mar 2021 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(8f2c50ea-fd24-4b4c-906a-934723fdfe64).html |
---|---|
Other link(s) | Links |
Abstract
This paper studies the joint pricing and advertisement policy with unknown realized sales for the first time. Because the advertisement contract is signed periodically, e-retailers can only replace the advertised product at discrete time. While they can change the price at any time. At the beginning of each period, e-retailers choose a product to make advertisement. During each period, e-retailers propose pricing policy for the advertised and unadvertised products. If retailers know the mean of demand rate for each product, then they get the optimal joint pricing and advertising strategy by minimizing the long-run average cost.
In reality, e-retailers do not know the mean value of realized sales in advance and they face a exploration-exploitation trade-off problem. We propose a Thompson Sampling based algorithm. The reason that we select Thompson Sampling algorithm is that it balances exploration and exploitation adaptively by sampling and updating processes. At the beginning of each period, e-retailers choose the advertised product based on Thompson Sampling, then they choose the pricing strategy for each product by minimizing the estimated optimal objective function.
In previous literature about Markovian MAB, they assume that the state of each product evolves as a Markov chain when the advertising selection is fixed. In this paper, the state of each product changes as a Markov Decision Process when we fix the advertised product. The action set in the Markov Decision Process is the pricing decision for each product. We show that the upper bound of regret for our proposed algorithm is Õ(√T). We compare the mean and standard deviation of average cost when we implement the optimal policy, the UCB algorithm, Thompson Sampling algorithm, and the greedy algorithm to choose advertisement.
In reality, e-retailers do not know the mean value of realized sales in advance and they face a exploration-exploitation trade-off problem. We propose a Thompson Sampling based algorithm. The reason that we select Thompson Sampling algorithm is that it balances exploration and exploitation adaptively by sampling and updating processes. At the beginning of each period, e-retailers choose the advertised product based on Thompson Sampling, then they choose the pricing strategy for each product by minimizing the estimated optimal objective function.
In previous literature about Markovian MAB, they assume that the state of each product evolves as a Markov chain when the advertising selection is fixed. In this paper, the state of each product changes as a Markov Decision Process when we fix the advertised product. The action set in the Markov Decision Process is the pricing decision for each product. We show that the upper bound of regret for our proposed algorithm is Õ(√T). We compare the mean and standard deviation of average cost when we implement the optimal policy, the UCB algorithm, Thompson Sampling algorithm, and the greedy algorithm to choose advertisement.