How to solve the problem of "safety" in auto-driving car system by "deep reinforcement learning"? ...

Source: Internet
Author: User

Original source: ArXiv

Author: Aidin Ferdowsi, Ursula Challita, Walid Saad, Narayan B. Mandayam

"Lake World" compilation: Yes, it's Astro, Kabuda.

For autonomous Vehicles (AV), to operate in a truly autonomous way in future intelligent transportation systems, it must be able to handle the data collected through a large number of sensors and communication links. This is essential to reduce the likelihood of vehicle collisions and to improve traffic flow on the road. However, this dependence on communication and data processing makes AV vulnerable to cyber-physical attacks.


Recently, Professor Aidin Ferdowsi and Walid Saad of the Department of Electrical and Computer Engineering at Virginia Tech, professor of Ursula Challita at the Swedish Ericsson Institute, and Professor Narayan B. Mandayam at the University of Rochester, USA, Aiming at the problem of "safety" in autonomous vehicle system, a new anti-deep reinforcement Learning (RL) framework is proposed to solve the safety problem of auto-driving vehicle.


It can be said that in order to be able to operate effectively in future smart cities, autonomous vehicles (AV) must rely on in-vehicle sensors such as cameras and radars, as well as communication between vehicles. This reliance on sensors and communication links makes AV exposed to attackers ' network physical (CP) attacks, and they attempt to control AV by manipulating their data. Therefore, in order to ensure a safe and optimal AV dynamics control, the data processing function in AV must be robust against this CP attack.


Therefore, this paper analyzes the process of monitoring the status of AV dynamics in the presence of CP attacks, and proposes a new anti-depth reinforcement learning (RL) algorithm to maximize the robustness of AV dynamics control against CP attacks. We study the behavior of attackers and the response of AV to CP attack in the framework of game theory.


In the game being developed, an attacker attempts to inject erroneous data into the AV sensor readings to manipulate the best safe spacing between vehicles and potentially increase the risk of AV accidents or reduce traffic on the road. At the same time, AV, as a defender, tries to minimize pitch deviations to ensure robustness against attackers ' behavior. Since AV has no information about the attacker's behavior, and because of the infinite possibilities of manipulating data values, the results of the player's previous interactions are entered into the long short-term memory network (LSTM) block.


Each player's Lstm block learns the expected pitch deviation from its own behavior and feeds it to its RL algorithm. The attacker's RL algorithm then chooses the action that maximizes the pitch deviation, while the RL algorithm of AV tries to find the best action to minimize this deviation. The simulation results show that the proposed anti-depth RL algorithm can improve the robustness of AV dynamics control, because it can minimize the gap between Av.


The Intelligent Transportation System (ITS) will include autonomous vehicles (AV), Roadside smart Sensors (RSS), vehicle communications, and even drones. In order to be able to operate in a truly autonomous way in the future of its, AV must be able to handle a large amount of its data collected through a large number of sensors and communication links. The reliability of these data is critical to reducing the likelihood of vehicle collisions and improving traffic flow on the road. However, this dependence on communication and data processing makes AV vulnerable to cyber-physical attacks.


In particular, an attacker could insert in the AV data processing phase to reduce the reliability of the measurement by injecting erroneous data and eventually cause an accident or endanger the traffic flow in its. Such traffic disruptions can also spread to other critical infrastructures that are interdependent, such as the grid or cellular communication system that serves its services.


Figure 1: Architecture of the anti-depth reinforcement learning algorithm presented in this paper


Recently, scientists have put forward a number of security solutions to solve vehicle internal safety problems. P. Kleberger, T. Olovsson and E. Jonsson in their "security problems of networked car networks", the key vulnerabilities of vehicle controllers are identified, and many intrusion detection algorithms are proposed to protect the controller. In addition, in the actual wireless attacks on connected vehicles and in-vehicle security protocols, the authors point out that the remote wireless attack in Avs's current security protocol could disrupt its controller area network.


They analyzed the vulnerability of the internal network of AVS vehicles to the outside wireless attack. At the same time, the authors of the safety issues of plug-in vehicles address the security challenges of plug-in electric vehicles, taking into account their impact on power systems. In addition, the investigation of security threat and protection mechanism of embedded automobile network is introduced in the survey of security threat and protection mechanism of embedded automobile network.


In addition, scientists have recently studied vehicle communication security challenges and solutions. The security loopholes of the current vehicle communication architecture are analyzed. In addition, scientists have found that the computational overhead caused by beacon encryption can be mitigated by using short-term authentication schemes and cooperative vehicle computing architectures.


Figure 2: The behavior of AV and attackers, regret, and the deviation of the algorithm we propose, in the event of an attacker attacking only beacon information


However, in the design of security solutions, the architecture and solutions of some previous research results did not take into account the interdependence between the network layer and the physical layer of AV. In addition, existing research does not properly model the attackers ' behavior and goals. In this case, the behavior of the attacker and the target of this network-physical dependency will help to provide a better security solution.


In addition, in some previous research, the existing technology does not provide a solution to enhance the robustness of AV dynamics control against attack. However, designing an optimal and secure its requires robustness against attacks between vehicle sensors and inter-vehicle communication. In addition, the existing research on its security often assumes that the attacker's behavior is stable, but in many real cases, the attacker may adaptively change its strategy to enhance the impact of the attack on its.


Therefore, the main contribution of this paper is to propose a new type of anti-deep reinforcement Learning (RL) framework, which is designed to provide robust AV control. In particular, we propose a vehicle-following model, in which we focus on the control of an AV, which is immediately followed by another AV (car following models). Such a model is appropriate because it captures the dynamics of AV control while recording AV sensor readings and beacons.


We consider collecting four of the leading AV sources through in-vehicle sensors such as cameras, radars, RSS, and in-vehicle beacons. We believe that attackers can inject bad data into these information centers and try to increase the risk of accidents or reduce the flow of traffic. In contrast, AV's goal is to keep the attacker's data injection attack (injection attacks) robust while maximizing its speed. In order to analyze the interaction between AV and attackers, we present a game problem and analyze its Nash equilibrium (NE). However, we note that due to the existence of successive attackers and AV action sets, as well as continuous AV speed and spacing, it is challenging to get AV and attacker action at NE.


To solve this problem, we propose two deep neural networks (DNN) based on long-term memory (Long-short) (LSTM) blocks, extract summaries of past AV dynamics for AV and attackers, and feed these summaries back to each player's RL algorithm. On the one hand, the RL algorithm of AV tries to learn the best estimate from the leading AV speed by combining the sensor readings. On the other hand, the attacker's RL algorithm attempts to deceive AV and deviate from the best safe distance between vehicles. The simulation results show that the proposed depth RL algorithm converges to the Nash equilibrium point of hybrid strategy, and can significantly improve the robustness of AV for data injection attack.


The results also show that AV can effectively learn the rules of sensor fusion by using the proposed depth RL algorithm to minimize the error of velocity estimation, thus reducing the deviation from the optimal safety distance.


Figure 3: AV and attacker behavior, regret, and deviations in the event of an attacker attacking all sensors


In this paper, a new depth RL method is proposed, which can achieve robust dynamic control of AV in the case of the sensor reading being attacked by data injection (robust dynamics controller). In order to analyze the motives of attackers attacking AV data and to understand the response of AV to such attacks, we propose a game problem between the attacker and Av. We have shown that in Nash equilibrium (the mixed strategies at Nash equilibrium) The hybrid strategy is challenging from an analytical standpoint.


Therefore, we use the depth RL algorithm we propose to learn the optimal fusion of AV in each time step. In the proposed depth RL algorithm, we use the LSTM block, which can extract the time characteristic and the lazy character between the AV and the attacker's action and the deviation value, and feedback it to the reinforcement learning algorithm. The simulation results show that the proposed depth RL algorithm can mitigate the impact of data injection attacks on sensor data, thus maintaining robustness to these attacks.


The future Intelligence Laboratory is an artificial intelligence, internet and brain science cross-research institution jointly established by AI scientists and relevant institutions of the Academy of Sciences.


The main work of the Future Intelligent Laboratory includes: the establishment of AI Intelligence System IQ evaluation system, the world Artificial Intelligence IQ evaluation, the Internet (city) cloud Brain Research Program, the construction of Internet (city) cloud brain technology and Enterprise Atlas, in order to enhance enterprise, industry and city intelligence level services.

If you are interested in laboratory research, welcome to the next Smart Lab online platform. Scan the following two-dimensional code or click "Read Original" in the lower left corner of this article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.