Spatial-temporal-aware safe multi-agent reinforcement learning of connected autonomous vehicles in challenging scenarios

IEEE International Conference on Robotics and Automation (ICRA 2023)

Department of Computer Science and Engineering, University of Connecticut

Video




Intersection ((a),(b),(c)) and Highway ((d),(e),(f)) scenarios: one hazard vehicle runs the red light in Intersection scenario and one takes a sudden hard-brake in Highway scenario. 1a, 1d: scenario initialization; 1b, 1e: successful cases of collaborative collision-avoidance from test runs of our method; 1c, 1f: collision cases from test runs of baseline model. Connected autonomous vehicles (CAVs) are in green; unconnected vehicles (UCVs) are in red; unconnected hazard vehicles (HAZV) are in red with yellow triangle marks. Without the safety shield or coordination, CAVs are likely to collide with HAZV or other vehicles as in 1c, 1f.

Abstract

Communication technologies enable coordination among connected and autonomous vehicles (CAVs). However, it remains unclear how to utilize shared information to improve the safety and efficiency of the CAV system in dynamic and complicated driving scenarios. In this work, we propose a framework of constrained multi-agent reinforcement learning (MARL) with a parallel Safety Shield for CAVs in challenging driving scenarios that includes unconnected hazard vehicles. The coordination mechanisms of the proposed MARL include information sharing and cooperative policy learning, with Graph Convolutional Network (GCN)-Transformer as a spatial-temporal encoder that enhances the agent's environment awareness. The Safety Shield module with Control Barrier Functions (CBF)-based safety checking protects the agents from taking unsafe actions. We design a constrained multi-agent advantage actor-critic (CMAA2C) algorithm to train safe and cooperative policies for CAVs. With the experiment deployed in the CARLA simulator, we verify the performance of the safety checking, spatial-temporal encoder, and coordination mechanisms designed in our method by comparative experiments in several challenging scenarios with unconnected hazard vehicles. Results show that our proposed methodology significantly increases system safety and efficiency in challenging scenarios.

Contribution

  • We propose a Constrained Multi-Agent Advantage Actor Critic method with a Safety Shield to improve safety and efficiency of the CAV system in challenging scenarios. The coordination mechanisms include information-sharing and cooperative policy-learning in CMAA2C.
  • We design a GCN-Transformer encoder for the neural network structure of CMAA2C to utilize the shared spatial and temporal information among CAVs and improve the situation awareness of CAVs.
  • We validate that the proposed CMAA2C MARL framework significantly improves the collision-free rate and overall returns of the CAV system with experiments. Our results show that cooperation among CAVs, the Safety Shield, and the GCN-Transformer encoder design all contribute to the improvement.

  • Method



    Model pipeline for a single agent. The state information as time series \( \{ s^{t-\tau} \}_{\tau} \) will be processed as graphs first and sequentially enter the GCN-Transformer module and the Actor's policy network; meanwhile, \( s^t \) is input to the CBF safety checking module for computing safe actions. During training, the outputs of GCN-Tranformer will be input to the Critic and Cost network for advantage, constraint and TD error calculation.


    Qualitative  Results

     


    We trained our model (GCN-Transformer Constrained Advantage Actor-Critic; 'GT-CA2C' in the table 1, 2), a baseline using our model without Safety Shield ('w/o SS' in tables) and another baseline 'FC-CA2C' with fully-connected layers (replacing GCN-Transformer), constrained advantage actor-critic and Safety Shield, each on Intersection and Highway scenarios. Our method and baselines are all under the multi-agent framework in Alg. 1. Training and testing experiment results are presented in table 1 and 2. We highlight our method's top leading performance among all solutions. For each entry in tables, the left percentage is the collision-free rate in simulation; the right number is the mean episode return defined as the mean of agents' sums over stepwise rewards in every episode: \( \sum_{\epsilon=1}^{m} Avg_i {\sum_{t}{r_i^t}}/m \).

    BibTeX

    @inproceedings{zhang2023spatial,
            title={Spatial-temporal-aware safe multi-agent reinforcement learning of connected autonomous vehicles in challenging scenarios},
            author={Zhang, Zhili and Han, Songyang and Wang, Jiangwei and Miao, Fei},
            booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},
            pages={5574--5580},
            year={2023},
            organization={IEEE}
          }