Download PDFOpen PDF in browserGrounding Subgoals of Temporal Logic Tasks in Online Reinforcement LearningEasyChair Preprint no. 1241812 pages•Date: March 7, 2024AbstractRecently, there has been a surge of research papers investigating reinforcement learning (RL) algorithms for solving temporal logic (TL) tasks. However, these algorithms are built upon the assumption of a labeling function which can map raw observations into symbols of subgoals in completing the TL task. In many practical applications, however, this labeling function often is not readily available. In this work, we propose an online RL algorithm, referred to as GSTLO, that takes nonsymbolic raw input observations from the collected trajectories and learn to ground the subgoal symbols of TL tasks. In other words, it learns to label important states that are associated with the subgoals in the TL task. Specifically, to associate an important state to one of the subgoals in the TL formula, the RL agent actively explores the environment by collecting trajectories and gradually reconstructs a finite state machine (FSM) of the TL task composed by the discovered important states. Then, by comparing the reconstructed FSM and the ground truth FSM extracted from the task formula, the mapping from the important states to subgoal symbols is obtained, i.e. resulting in the labeling function. In order to discover these important states, GSTLO formulates a contrastive learning objective based on the firstoccupancy representations (FR) of collected trajectories. To facilitate the exploration, the firstoccupancy feature (FF) of important states is also learned, driving the agent to visit any selected subgoal and complete unseen tasks without further training. The proposed GSTLO algorithm is evaluated on three environments, showing significant improvement over baseline methods. Keyphrases: generalization, Reinforcement Learning, symbol grounding, temporal logic
