Researchers propose methods that theoretically guarantee minimal loss for
worst case scenarios with minimal prior information for heavy-tailed reward
distributions
The exploration algorithms for stochastic multi-armed bandits (MABs)–sequential
decision-making problems under uncertain environments–typically assume
light-tailed distributions for reward noises. However, real-world datasets
often show heavy-tailed noise. In light of this, researchers from Korea propose
an algorithm that can achieve minimax optimality (minimum loss under maximum
loss scenario) with minimal prior information. Superior to existing algorithms,
the new algorithm has potential applications in autonomous trading and
personalized recommendation systems.
Title: Set of cryptocurrencies on dollar bills.
Caption: The problem of sequential decision-making for noisy rewards in data science usually assumes a light-tailed noise distribution. However, many real-world datasets actually show a heavy-tailed noise. To address this, researchers from Korea proposed a minimax optimal robust upper confidence bound (MR-UCB) method and tested its validity with cryptocurrency datasets. Their method could identify the most profitable cryptocurrency with the least time-averaged cumulative regret.
Credit: Marco Verch from flickr (https://www.flickr.com/photos/149561324@N03/32889807948)
License Type: CC BY 2.0
In data science, researchers typically deal with data that contain noisy
observations. An important problem explored by data scientists in this context
is the problem of sequential decision making. This is commonly known as a “stochastic
multi-armed bandit”(stochastic MAB). Here, an intelligent agent sequentially
explores and selects actions based on noisy rewards under an uncertain
environment. Its goal is to minimize the cumulative regret–the difference
between the maximum reward and the expected reward of selected actions. A
smaller regret implies a more efficient decision making.
Most existing studies on stochastic MABs have performed regret analysis under
the assumption that the reward noise follows a light-tailed distribution.
However, many real-world datasets, in fact, show a heavy-tailed noise
distribution. These include user behavioral pattern data used for developing
personalized recommendation systems, stock price data for automatic transaction
development, and sensor data for autonomous driving.
In a recent study, Assistant Professor Kyungjae Lee of Chung-Ang University
and Assistant Professor Sungbin Lim of the Ulsan Institute of Science and
Technology, both in Korea, addressed this issue. In their theoretical analysis,
they proved that the existing algorithms for stochastic MABs were sub-optimal
for heavy-tailed rewards. More specifically, the methods employed in these
algorithms–robust upper confidence bound (UCB) and adaptively perturbed
exploration (APE) with unbounded perturbation–do not guarantee a minimax
(minimization of maximum possible loss) optimality.
“Based on this analysis, minimax optimal robust (MR) UCB and APE methods
have been proposed. MR-UCB utilizes a tighter confidence bound of robust mean
estimators, and MR-APE is its randomized version. It employs bounded
perturbation whose scale follows the modified confidence bound in MR-UCB,” explains Dr. Lee, speaking of their work, which was published in the IEEE Transactions on
Neural Networks and Learning Systems on 14 September 2022.
The researchers next derived gap-dependent and independent upper bounds of the
cumulative regret. For both the proposed methods, the latter value matches the
lower bound under the heavy-tailed noise assumption, thereby achieving minimax
optimality. Further, the new methods require minimal prior information and
depend only on the maximum order of the bounded moment of rewards. In contrast,
the existing algorithms require the upper bound of this moment a priori–information that may not be
accessible in many real-world problems.
Having established their theoretical framework, the researchers tested
their methods by performing simulations under Pareto and Fréchet noises. They
found that MR-UCB consistently outperformed other exploration methods and was
more robust with an increase in the number of actions under heavy-tailed noise.
Further, the duo verified their approach for real-world data using a
cryptocurrency dataset, showing that MR-UCB and MR-APE were beneficial–minimax optimal
regret bounds and minimal prior knowledge–in tackling heavy-tailed synthetic
and real-world stochastic MAB problems.
“Being vulnerable to heavy-tailed noise, the existing MAB algorithms show
poor performance in modeling stock data. They fail to predict big hikes or
sudden drops in stock prices, causing huge losses. In contrast, MR-APE can be
used in autonomous trading systems with stable expected returns through stock
investment,” comments Dr. Lee, discussing the potential applications of the present work. “Additionally,
it can be applied to personalized recommendation systems since behavioral data
shows heavy-tailed noise. With better predictions of individual behavior, it is
possible to provide better recommendations than conventional methods, which can
maximize the advertising revenue,”
he concludes.
Reference
Authors
Title of original paper
Journal |
Kyungjae
Lee1, Sungbin Lim2
Minimax
Optimal Bandits for Heavy Tail Rewards
IEEE
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS |
|
|
DOI
Affiliations |
1Department of Artificial Intelligence, Chung-Ang
University, Seoul, South Korea 2Artificial Intelligence Graduate School and the
Department of Industrial Engineering, Ulsan National Institute of Science and
Technology (UNIST), Ulsan, South Korea
|
Your Press Release Source
Chung-Ang University
About Chung-Ang University
Chung-Ang University is a private
comprehensive research university located in Seoul, South Korea. It was started
as a kindergarten in 1916 and attained university status in 1953. It is fully
accredited by the Ministry of Education of Korea. Chung-Ang University conducts
research activities under the slogan of “Justice and Truth.” Its new vision for
completing 100 years is “The Global Creative Leader.” Chung-Ang University
offers undergraduate, postgraduate, and doctoral programs, which encompass a
law school, management program, and medical school; it has 16 undergraduate and
graduate schools each. Chung-Ang University’s culture and arts programs are
considered the best in Korea.
Website: https://neweng.cau.ac.kr/index.do
About Assistant Professor Kyungjae Lee
Professor Kyungjae Lee received his B.S. and Ph.D. degrees in Electrical Engineering and Computer Engineering, respectively, from Seoul National University, Korea in 2015 and 2020, respectively. He is currently an Assistant Professor with the Department of Artificial Intelligence at Chung-Ang University, Seoul, Korea. His current research interests include multi-armed bandits, combinatorial bandits, reinforcement learning, and their applications.