PROGRAM – 12th Workshop on Planning, Perception and Navigation of Intelligent Vehicle

Invited Keynotes

Title: Self-Supervised Learning for Perception Tasks in Automated Driving (slides, video) 8:00 AM (Las Vegas time)
Keynote speaker: Wolfram Burgard (University of Frieburg, Germany)

Abstract: At the Toyota Research Institute we are following the one-system-two-modes approach to building truly automated cars. More precisely, we simultaneously aim for the L4/L5 chauffeur application and the the guardian system, which can be considered as a highly advanced driver assistance system of the future that prevents the driver from making any mistakes. TRI aims to equip more and more consumer vehicles with guardian technology and in this way to turn the entire Toyota fleet into a giant data collection system. To leverage the resulting data advantage, TRI performs substantial research in machine learning and, in addition to supervised methods, particularly focuses on unsupervised and self-supervised approaches. In this presentation, I will present three recent results regarding self-supervised methods for perception problems in the context of automated driving. I will present novel approaches to inferring depth from monocular images and a new approach to panoptic segmentation.

Biography: Wolfram Burgard is VP for Automated Driving Technology at the Toyota Research Institute. He is on leave from his professorship at the University of Freiburg where he heads the research group for Autonomous Intelligent Systems. Wolfram Burgard is known for his contributions to mobile robot navigation, localization and SLAM (simultaneous localization and mapping). He has published more than 350 papers in the overlapping area of robotics and artificial intelligence.
Title: Understanding Risk and Social Behavior Improves Decision Making for Autonomous Vehicles (slides, video) 8:45 AM (Las Vegas time)
Keynote speaker: Daniela Rus (MIT, USA)

Abstract: Deployment of autonomous vehicles on public roads promises increases in efficiency and safety, and requires evaluating risk, understanding the intent of human drivers, and adapting to different driving styles. Autonomous vehicles must also behave in safe and predictable ways without requiring explicit communication. This talk describes how to integrate risk and behavior analysis in the control look of an autonomous car. I will describe how Social Value Orientation (SVO), which captures how an agent’s social preferences and cooperation affect their interactions with others by quantifying the degree of selfishness or altruism, can be integrsted in decision making and provide recent examples of developing and deploying self-driving vehicles with adaptation capabilities.

Biography: Daniela Rus is the Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science, Director of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, and Deputy Dean of Research in the Schwarzman College of Computing at MIT. She is also a visiting fellow at Mitre Corporation. Rus’s research interests are in robotics and artificial intelligence. The key focus of her research is to develop the science and engineering of autonomy. Rus is a Class of 2002 MacArthur Fellow, a fellow of ACM, AAAI and IEEE, and a member of the National Academy of Engineering and of the American Academy of Arts and Sciences. She is the recipient of the Engelberger Award for robotics. She earned her PhD in Computer Science from Cornell University
Title: Safe Autonomous Driving and Humans: Perception and Transitions (slides, video) 9:30 AM (Las Vegas time)
Keynote speaker: Mohan M Trivedi (University of California, USA)

Abstract: These are truly exciting times especially for researchers and scholars active in robotics and intelligent systems fields. Fruits of their labor are enabling transformative changes in daily lives of general public. In this presentation we will focus on changes affecting our mobility on roads with highly automated intelligent vehicles. We specifically discuss issues related to the understanding of human agents interacting with the automated vehicle, either as occupants of such vehicles, or who are in the near vicinity of the vehicles, as pedestrians, cyclists, or inside surrounding vehicles. These issues require deeper examination and careful resolution to assure safety, reliability and robustness of these highly complex systems for operation on public roads. The presentation will highlight recent research dealing with understanding of activities, behavior, intentions of humans specifically in the context of autonomous driving and transition controls.

Biography: Mohan Trivedi is a Distinguished Professor of Engineering and founding director of the Computer Vision and Robotics Research Laboratory, as well as the Laboratory for Intelligent and Safe Automobiles (LISA) at the University of California San Diego. These labs have played significant roles in the development of human-centered safe autonomous driving, advanced driver assistance systems, vision systems for intelligent transportation, homeland security, assistive technologies and human-robot interaction fields. Trivedi has received the IEEE Intelligent Transportation Systems (ITS) Society’s Outstanding Researcher Award and LEAD Institution Award, as well as the Meritorious Service Award of the IEEE Computer Society. He is a Fellow of IEEE, SPIE, and IAPR. He serves very regularly as a consultant to industry and government agencies in the USA and abroad. Trivedi frequently participates on panels dealing with technological, strategic, privacy, and ethical issues surrounding research areas he is involved in.

Links to Related Papers: http://cvrr.ucsd.edu/publications/index.html
Title: Decision Making Architectures for Safe Planning and Control of Agile Autonomous Vehicles (slides, video) 10:15 AM (Las Vegas time)
Keynote speaker: Evangelos Theodorou (Georgia Institute of Technology, USA)

Abstract: In this talk I will present novel algorithms and decision-making architectures for safe planning and control of terrestrial and aerial vehicles operating in dynamic environments. These algorithms incorporate different representations of robustness for high speed navigation and bring together concepts from stochastic contraction theory, robust adaptive control, and dynamic stochastic optimization using augmented importance sampling techniques. I will present demonstrations on simulated and real robotic systems and discuss future research directions.

Biography: Evangelos Theodorou is an Associate Professor with the School of Aerospace Engineering, Georgia Institute of Technology and is also the director of Autonomous Control and Decisions Systems (ACDS) laboratory. He is also affiliated with the Institute of Robotics and Intelligence Machines, and Center for Machine Learning Research at Georgia Tech. His interests are at the intersection stochastic control and optimization, machine learning, statistical physics and dynamic systems theory. Applications of his research include robotic and aerospace systems, applied physics, networked systems and bio-engineering.

Accepted Papers

Title: Marker-Based Mapping and Localization for Autonomous Valet Parking paper, slides, video
Authors: Zheng Fang, Yongnan Chen, Ming Zhou, Chao Lu

Abstract: Autonomous valet parking (AVP) is one of the most important research topics of autonomous driving in low speed scenes, with accurate mapping and localization being its key technologies. The traditional visual-based method, due to the change of illumination and appearance of the scene, easily causes localization failure in long-term applications. In order to solve this problem, we introduce visual fiducial markers as artificial landmarks for robust mapping and localization in parking lots. Firstly, the absolute scale information is acquired from fiducial markers, and a robust and accurate monocular mapping method is proposed by fusing wheel odometry. Secondly, on the basis of the map of fiducial markers that are sparsely placed in the parking lot, we propose a robust and efficient filtering-based localization method, which realizes accurate real-time localization of vehicles in parking lot. Compared with the traditional visual localization methods, we adopt artificial landmarks, which have strong stability and robustness to illumination and viewpoint changes. Meanwhile, because the fiducial markers can be selectively placed on the columns and walls of the parking lot, it is not easy to be occluded compared to the ground information, ensuring the reliability of the system. We have verified the effectiveness of our methods in real scenes. The experiment results show that the average localization error is about 0.3 m in a typical autonomous parking operation at a speed of 10km/h.
Title: Parameter Optimization for Loop Closure Detection in Closed Environments paper, slides, video
Authors: Nils Rottmann, Ralf Bruder, Honghu Xue, Achim Schweikard, Elmar Rueckert

Abstract: Tuning parameters is crucial for the performance of localization and mapping algorithms. In general, the tuning of the parameters requires expert knowledge and is sensitive to information about the structure of the environment. In order to design truly autonomous systems the robot has to learn the parameters automatically. Therefore, we propose a parameter optimization approach for loop closure detection in closed environments which requires neither any prior information, e.g. robot model parameters, nor expert knowledge. It relies on several path traversals along the boundary line of the closed environment. We demonstrate the performance of our method in challenging real world scenarios with limited sensing capabilities. These scenarios are exemplary for a wide range of practical applications including lawn mowers and household robots.
Title: Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles paper, slides, video
Authors: Ramin Nabati, Hairong Qi

Abstract: In this paper we present a novel radar-camera sensor fusion framework for accurate object detection and distance estimation in autonomous driving scenarios. The proposed architecture uses a middle-fusion approach to fuse the radar point clouds and RGB images. Our radar object proposal network uses radar point clouds to generate 3D proposals from a set of 3D prior boxes. These proposals are mapped to the image and fed into a Radar Proposal Refinement (RPR) network for objectness score prediction and box refinement. The RPR network utilizes both radar information and image feature maps to generate accurate object proposals and distance estimations.
The radar-based proposals are combined with image-based proposals generated by a modified Region Proposal Network (RPN). The RPN has a distance regression layer for estimating distance for every generated proposal. The radar-based and image-based proposals are merged and used in the next stage for object classification. Experiments on the challenging nuScenes dataset show our method outperforms other existing radarcamera fusion methods in the 2D object detection task while at the same time accurately estimates objects’ distances.
Title: SalsaNext: Fast, Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving paper, slides, video
Authors: Tiago Cortinhal, George Tzelepis, Eren Erdal Aksoy

Abstract: In this paper, we introduce SalsaNext for the uncertainty-aware semantic segmentation of a full 3D LiDAR point cloud in real-time. SalsaNext is the next version of SalsaNet [1] which has an encoder-decoder architecture consisting of a set of ResNet blocks. In contrast to SalsaNet, we introduce a new context module, replace the ResNet encoder blocks with a new residual dilated convolution stack with gradually increasing receptive fields and add the pixel-shuffle layer in the decoder. Additionally, we switch from stride convolution to average pooling and also apply central dropout treatment. To directly optimize the Jaccard index, we further combine the weighted cross entropy loss with Lov´asz-Softmax loss [2]. We finally inject a Bayesian treatment to compute the epistemic and aleatoric uncertainties for each LiDAR point. We provide a thorough quantitative evaluation on the Semantic-KITTI dataset [3], which demonstrates that SalsaNext outperforms the previous networks and ranks first on the Semantic-KITTI leaderboard
Title: SDVTracker: Real-Time Multi-Sensor Association and Tracking for Self-Driving Vehicles paper, slides, video
Authors: Shivam Gautam, Gregory P. Meyer, Carlos Vallespi-Gonzalez, Brian C. Becker

Abstract: Accurate motion state estimation of Vulnerable Road Users (VRUs), is a critical requirement for autonomous vehicles that navigate in urban environments. Due to their computational efficiency, many traditional autonomy systems perform multi-object tracking using Kalman Filters which frequently rely on hand-engineered association. However, such methods fail to generalize to crowded scenes and multi-sensor modalities, often resulting in poor state estimates which cascade to inaccurate predictions. We present a practical and lightweight tracking system, SDVTracker, that uses a deep learned model for association and state estimation in conjunction with an Interacting Multiple Model (IMM) filter. The proposed tracking method is fast, robust and generalizes across multiple sensor modalities and different VRU classes. In this paper, we detail a model that jointly optimizes both association and state estimation with a novel loss, an algorithm for determining ground-truth supervision, and a training procedure. We show this system significantly outperforms hand-engineered methods on a real-world urban driving dataset while running in less than 2.5 ms on CPU for a scene with 100 actors, making it suitable for self-driving applications where low latency and high accuracy is critical.
Title: Situation Awareness at Autonomous Vehicle Handover: Preliminary Results of a Quantitative Analysis paper, slides, video
Authors: Tamas D. Nagy, Daniel A. Drexler, Nikita Ukhrenkov, Arpad Takacs, Tamas Haidegger

Abstract: Enforcing system level safety is a key research domain within self-driving technology. Current general development efforts aim for Level 3+ autonomy, where the vehicle controls both lateral and longitudinal motion of the dynamic driving task, while the driver is permitted to divert their attention, as long as she/he is able to react properly to a handover request initiated by the vehicle. Consequently, situation awareness of the human driver has become one of the most important metrics of handover safety. In this paper, the preliminary results of a user study are presented to quantitatively evaluate emergency handover performance, using custom-designed experimental setup, built upon the Master Console of the da Vinci Surgical System and the CARLA driving simulator. The measured control signals and the questionnaire filled out by participants were analyzed to gain further knowledge on the situation awareness of drivers during handover at Level 3 autonomy. The supporting, custom open-source platform developed is available
at https://github.com/ABC-iRobotics/dvrk_carla.
Title: Towards Context-Aware Navigation for Long-Term Autonomy in Agricultural Environments paper, slides, video
Authors: Mark Hollmann, Benjamin Kisliuk, Jan Christoph Krause, Christoph Tieben, Alexander Mocky, Sebastian Putzy, Felix Igelbrinky, Thomas Wiemanny, Santiago Focke Martinez, Stefan Stiene, Joachim Hertzberg

Abstract: Autonomous surveying systems for agricultural applications are becoming increasingly important. Currently, most systems are remote-controlled or relying on a single global map representation. Over the last years, several use-case-specific representations for path and action planning in different contexts have been proposed. However, solely relying on fixed representations and action schemes limits the flexibility of autonomous systems. Especially in agriculture, the surroundings in which autonomous systems are deployed, may change rapidly during vegetation periods, and the complexity of the environment may vary depending on farm size and season. In this paper, we propose a context-aware system implemented in ROS that allows to change the representation, planning strategy and execution logics based on a spatially grounded semantic context. Our vision is to build up an autonomous system called Autonomous Robotic Experimental Platform (AROX) that is able to generate crop maps over a whole vegetation period without any user interference. To this end, we built up the hardware infrastructure for storing and charging the robot as well as the needed software to realize context-awareness using available ROS packages.
Title: Exploiting Continuity of Rewards – Efficient Sampling in POMDPs with Lipschitz Bandits paper, slides, video
Authors: Ömer Sahin Tas, Felix Hauser, Martin Lauer

Abstract: Decision making under uncertainty can be framed as a partially observable Markov decision process (POMDP). Finding exact solutions of POMDPs is generally computationally intractable, but the solution can be approximated by sampling-based approaches. These approaches rely on multiarmed bandit (MAB) heuristics, which assume the outcomes of different actions to be uncorrelated. In some applications, like motion planning in continuous spaces, similar actions yield similar outcomes. In this paper, we use variants of MAB heuristics that make Lipschitz continuity assumptions on the outcomes of actions to improve the efficiency of sampling-based planning approaches. We demonstrate the effectiveness of this approach in the context of motion planning for automated driving.
Title: Impact of Traffic Lights on Trajectory Forecasting of Human-driven Vehicles Near Signalized Intersections paper, slides, video
Authors: Geunseob Oh, Huei Peng

Abstract: Forecasting trajectories of human-driven vehicles is a crucial problem in autonomous driving. Trajectory forecasting in the urban area is particularly hard due to complex interactions with cars and pedestrians, and traffic lights (TLs). Unlike the former that has been widely studied, the impact of TLs on the trajectory prediction has been rarely discussed. Our contribution is twofold. First, we identify the potential impact qualitatively and quantitatively. Second, we present a novel resolution that is mindful of the impact, inspired by the fact that human drives differently depending on signal phase and timing. Central to the proposed approach is Human Policy Models which model how drivers react to various states of TLs by mapping a sequence of states of vehicles and TLs to a subsequent action of the vehicle. We then combine the Human Policy Models with a known transition function (system dynamics) to conduct a sequential prediction; thus our approach is viewed as Behavior Cloning. One novelty of our approach is the use of vehicle-to-infrastructure communications to obtain the future states of TLs. We demonstrate the impact of TL and the proposed approach using an ablation study for longitudinal trajectory forecasting tasks on real-world driving data recorded near a signalized intersection. Finally, we propose probabilistic (generative) Human Policy Models which provide probabilistic contexts and capture competing policies, e.g., pass or stop in the yellow-light dilemma zone.
Title: Semantic Grid Map based LiDAR Localization in Highly Dynamic Urban Scenarios paper, slides, video
Authors: Chenxi Yang, Lei He, Hanyang Zhuang, Chunxiang Wang, Ming Yang

Abstract: Change-over-time objects such as pedestrians and vehicles remain challenging for scan-to-map pose estimation using 3D LiDAR in the field of autonomous driving because they lead to incorrect data association and structural occlusion. This paper proposes a novel semantic grid map (SGM) and corresponding algorithms to estimate the pose of observed scans in such scenarios to improve robustness and accuracy. The algorithms consist of a Gaussian mixture model (GMM) to initialize the pose, and a grid probability model to keep estimating the pose in real-time. We evaluate our algorithm thoroughly in two scenarios. The first scenario is an express road with heavy traffic to prove the performance towards dynamic interferences. The second scenario is a factory to confirm the compatibility. Experimental results show that the proposed method achieves higher accuracy and smoothness than mainstream methods, and is compatible with static environments.