What is a Multi-Armed Bandit
A Multi-Armed Bandit is a mathematical model used to represent a decision-making problem in which a player must repeatedly choose one of several “arms” (options) with unknown reward probabilities, while observing the resulting rewards. The player must balance exploration (choosing arms with uncertain reward probabilities to learn more about them) with exploitation (choosing the arm […]