Download PDFOpen PDF in browserMulti-Armed Bandit Algorithms for a Mobile Service Robot's Spare Time in a Structured Environment13 pages•Published: September 17, 2018AbstractWe assume that service robots will have spare time in between scheduled user requests, which they could use to perform additional unrequested services in order to learn a model of users’ preferences and receive rewards. However, a mobile service robot is constrained by the need to travel through the environment to reach users in order to perform services for them, as well as the need to carry out scheduled user requests. We assume service robots operate in structured environments comprised of hallways and floors, resulting in scenarios where an office can be conveniently added to the robot’s plan at a low cost, which affects the robot’s ability to plan and learn.We present two algorithms, Planning Thompson Sampling and Planning UCB1, which are based on existing algorithms used in multi-armed bandit problems, but are modified to plan ahead considering the time and location constraints of the problem. We compare them to existing versions of Thompson Sampling and UCB1 in two environments representative of the types of structures a robot will encounter in an office building. We find that our planning algorithms outperform the original naive versions in terms of both reward received and the effectiveness of the model learned in a simulation. The difference in performance is partially due to the fact that the original algorithms frequently miss opportunities to perform services at a low cost for convenient offices along their path, while our planning algorithms do not. Keyphrases: multi armed bandit, robot learning, robot planning, service robot In: Daniel Lee, Alexander Steen and Toby Walsh (editors). GCAI-2018. 4th Global Conference on Artificial Intelligence, vol 55, pages 121-133.
|