A Multi-Armed Bandit is a mathematical model used to represent a decision-making problem in which a player must repeatedly choose one of several “arms” (options) with unknown reward probabilities, while observing the resulting rewards.
The player must balance exploration (choosing arms with uncertain reward probabilities to learn more about them) with exploitation (choosing the arm with the highest known reward probability).
The Multi-Armed Bandit is used in various fields, including machine learning, online advertising, and recommendation systems.