In this paper, we consider multi-pursuer single-superior-evader pursuit-evasion differential games where the evader has a speed that is similar to or higher than the speed of each pursuer. A new fuzzy reinforcement learning algorithm is proposed in this work. The proposed algorithm uses the well-known Apollonius circle mechanism to define the capture region of the learning pursuer based on its location and the location of the superior evader. The proposed algorithm uses the Apollonius circle with a developed formation control approach in the tuning mechanism of the fuzzy logic controller (FLC) of the learning pursuer so that one or some of the learning pursuers can capture the superior evader. The formation control mechanism used by the proposed algorithm guarantees that the pursuers are distributed around the superior evader in order to avoid collision between pursuers. The formation control mechanism used by the proposed algorithm also makes the Apollonius circles of each two adjacent pursuers intersect or be at least tangent to each other so that the capture of the superior evader can occur. The proposed algorithm is a decentralized algorithm as no communication among the pursuers is required. The only information the proposed algorithm requires is the position and the speed of the superior evader. The proposed algorithm is used to learn different multi-pursuer single-superior-evader pursuit-evasion differential games. The simulation results show the effectiveness of the proposed algorithm.
A Learning Invader for the “Guarding a Territory” Game
A Reinforcement Learning Problem
This paper explores the use of a learning algorithm in the “guarding a territory” game. The game occurs in continuous time, where a single learning invader tries to get as close as possible to a territory before being captured by a guard. Previous research has approached the problem by letting only the guard learn. We will examine the other possibility of the game, in which only the invader is going to learn. Furthermore, in our case the guard is superior (faster) to the invader. We will also consider using models with non-holonomic constraints. A control system is designed and optimized for the invader to play the game and reach Nash Equilibrium. The paper shows how the learning system is able to adapt itself. The system’s performance is evaluated through different simulations and compared to the Nash Equilibrium. Experiments with real robots were conducted and verified our simulations in a real-life environment. Our results show that our learning invader behaved rationally in different circumstances.
Deep Learning has had a huge impact on computer science, making it possible to explore new frontiers of research and to develop amazingly useful products that millions of people use every day. Our internal deep learning infrastructure DistBelief, developed in 2011, has allowed Googlers to build ever larger neural networks and scale training to thousands of cores in our datacenters. We’ve used it to demonstrate that concepts like “cat” can be learned from unlabeled YouTube images, to improve speech recognition in the Google app by 25%, and to build image search in Google Photos. DistBelief also trained the Inception model that won Imagenet’s Large Scale Visual Recognition Challenge in 2014, and drove our experiments in automated image captioning as well as DeepDream.
Genetic programming (GP) is an automated method for creating a working computer program from a high-level problem statement of a problem. Genetic programming starts from a high-level statement of “what needs to be done” and automatically creates a computer program to solve the problem.