Abstract: Vision-and-Language Navigation (VLN) is a significant natural navigation task in human-robot interaction environments, which requires a robot to navigate according to natural language ...