=Paper= {{Paper |id=Vol-2970/gdepaper7 |storemode=property |title=AUTO-DISCERN: Autonomous Driving Using Common Sense Reasoning |pdfUrl=https://ceur-ws.org/Vol-2970/gdepaper7.pdf |volume=Vol-2970 |authors=Suraj Kothawade,Vinaya Khandelwal,Kinjal Basu,Huaduo Wang,Gopal Gupta |dblpUrl=https://dblp.org/rec/conf/iclp/KothawadeK0WG21 }} ==AUTO-DISCERN: Autonomous Driving Using Common Sense Reasoning== https://ceur-ws.org/Vol-2970/gdepaper7.pdf
       AUTO-DISCERN: Autonomous Driving Using Common Sense Reasoning

            Suraj Kothawade, Vinaya Khandelwal, Kinjal Basu, Huaduo Wang, Gopal Gupta
                      Computer Science Department, The University of Texas at Dallas, Richardson, USA
                     {suraj.kothawade, vinaya.khandelwal, kinjal.basu, huaduo.wang, gupta}@utdallas.edu




                           Abstract
  Driving an automobile involves the tasks of observing sur-
  roundings, then making a driving decision based on these ob-
  servations (steer, brake, coast, etc.). In autonomous driving,
  all these tasks have to be automated. Autonomous driving
  technology thus far has relied primarily on machine learning
  techniques. We argue that appropriate technology should be
  used for the appropriate task. That is, while machine learn-
  ing technology is good for observing and automatically un-
  derstanding the surroundings of an automobile, driving deci-
  sions are better automated via commonsense reasoning rather
  than machine learning. In this paper, we discuss (i) how com-
  monsense reasoning can be automated using answer set pro-
  gramming (ASP) and the goal-directed s(CASP) ASP sys-                  Figure 1: Overview of the AUTO -D ISCERN system.
  tem, and (ii) develop the AUTO -D ISCERN 1 system using this
  technology for automating decision-making in driving. The
  goal of our research, described in this paper, is to develop       and stop). Our goal in this paper is to develop an AV sys-
  an autonomous driving system that works by simulating the          tem that emulates the mind of a human: we will use ML/DL
  mind of a human driver. Since driving decisions are based on       technology for tasks for which humans use pattern matching
  human-style reasoning, they are explainable, their ethics can      (vision and scene understanding) and automated common-
  be ensured, and they will always be correct, provided the sys-     sense reasoning for tasks for which humans perform mental
  tem modeling and system inputs are correct.
                                                                     reasoning (driving decision-making)(see Fig. 1).
                                                                        To automate commonsense reasoning, we use ASP (Gel-
                     1    Introduction                               fond and Kahl 2014; Brewka et al. 2011; Gebser et al. 2014)
Autonomous Vehicles (AVs) have been sought for a long                and the goal-driven implementation of ASP called s(CASP)
time. With the availability of cheaper hardware (sensors,            (Arias et al. 2018). A goal-driven implementation of ASP is
cameras, LIDAR) and the advent of advanced software tech-            important for automated commonsense reasoning as SAT-
nology (AI, Machine/Deep learning (ML/DL), Computer                  solver based implementations such as CLINGO (Gebser
Vision) over the last decades, rapid advancements have been          et al. 2014) face several practical issues (e.g., scalability,
made in AV technology. However, no car has yet achieved              explainability) (Gupta et al. 2017) for applications such as
full automation or Society of Automation Engineers (SAE)             autonomous driving.
Level 5 (Blanco May, 2021). We believe that AV technology
advancement has slowed due to over-reliance on ML/DL for                    2    Autonomous Vehicle Technology
automating all aspects of AVs. While ML technologies are             We express various decision making strategies that drivers
important for developing AV technology, we believe that we           use as commonsense rules in ASP, that will be executed
can achieve better success by closely emulating how humans           on the s(CASP) system. These rules capture various driv-
drive a car. Once a human driver has viewed their surround-          ing decisions regarding steering, turning, braking, acceler-
ing and processed a scene in their mind, they use their com-         ating, stopping, etc. We also report on a prototype system
monsense knowledge and commonsense reasoning to make                 called AUTO -D ISCERN that we have developed that takes a
driving decisions (e.g., if the traffic light is red, apply brakes   scene-description and sensor values as input and calculates
Copyright © 2021, for this paper by its authors. Use permitted un-   the driving decision at that instant using the rules. A use case
der Creative Commons License Attribution 4.0 International (CC       is shown in Fig. 1. We expect that a scene description (per-
BY 4.0).                                                             ception) will be obtained via image processing techniques
   1                                                                 that use deep learning methods. Because our decision mak-
     AUTO -D ISCERN : AUTOnomous DrivIng uSing Common
sEnse ReasoniNg                                                      ing is based on automated commonsense reasoning, every
decision can be explained, and, in theory, our system will
never make a wrong decision, as long as the rules are cor-
rect and system input is correct.
   Our main contribution in this paper is to show how auto-
mated commonsense reasoning can be harnessed to achieve
an SAE Level 5 autonomous driving system.
   Autonomous vehicles (AVs) hold enormous promise: they
can reduce the cost of transportation and increase conve-
nience, as well as reduce road accidents and traffic fatali-
ties by a significant number. AVs can also have a big en-
vironmental impact: private ownership of cars can become
unnecessary. AVs can greatly aid in the mobility of elderly,
disabled, and disadvantaged. The story of AVs has been one
of great optimism: it was projected in 2017 that there will
be 10 million AVs on the road by 2020 and that fully au-
tonomous vehicles will be the norm in 10 years. Not only we
have not reached the 10M vehicle mark, but also no vehicle
has earned the fully automated (SAE Level 5) designation.
   The history of making vehicles autonomous began with
the introduction of the cruise control in 1948 (Wikipedia
contributors 2021). In 1999, the US Federal Communica-
tion Commission allocated 75 MHz of spectrum dedicated           Figure 2: Tesla self-driving fails to perform a lane merge
to short range communication. In early 2000s, several teams      right. The radar sensor detects a barrier to the left. However
developed and demonstrated autonomous cars in response           the visual component is confused by the reflection on the
to a DARPA grand challenge (DARPA 2014; Thrun 2010).             barrier (likely thinks of it as a yellow dividing lane marking)
In 2009, Google began its self-driving project, and in 2014      and does not register that the lane is ending.
Google’s AV passed a 14-mile driving test in Nevada. The
US National Highway Transportation and Safety Adminis-
tration (NHTSA) released its initial policy on AVs that year,    PilotNet project (Bojarski et al. 2020), for example, a deep
and in 2015, Tesla released its autopilot self-driving soft-     learning model is trained on this data to make one of three
ware. Since then, many other companies have entered the          decisions: steering along a predicted trajectory, amount of
market, and many partnerships have been forged between           braking, and amount of acceleration. Predicted trajectories
them. The Society of Automation Engineers developed a            are one of the following: (i) lane stable (keep driving in the
scale for vehicle automation: Level 0: No automation; Level      same lane); (ii) change to left lane (first half of left-lane-
1: Driver assistance (e.g., cruise control); Level 2: Partial    change maneuver); (iii) change to left lane (second half of
automation (perform steering and acceleration under human        left-lane-change maneuver); (iv) change to right lane (first
watch); Level 3: Conditional automation (most tasks can be       half of right-lane-change maneuver); (v) change to right lane
performed, but human driver has control; Level 4: High au-       (second half of right-lane-change maneuver); (vi) split right
tomation (car can perform all tasks under specific circum-       (e.g. take an exit ramp); (vii) split left (e.g., left branch of a
stances, e.g., in a geo-fenced area; human over-ride is still    fork in the road). A ML based solution can be boiled down to
possible); Level 5: Full automation (all tasks automated; no     predicting the degree of steering, the amount of braking, and
human intervention required at all).                             the amount of acceleration at every moment during driving.
No car has reached SAE Level 5 yet. The principle reason,        Of course, the last two are mutually exclusive, that acceler-
we believe, is over-reliance on ML and DL technologies for       ation and braking are rarely needed at the same time.
most aspects of driving. Our goal here is to show that au-          Neural-based technology has a number of well-known is-
tomated commonsense reasoning is essential for achieving         sues with respect to accurate prediction of an outcome. An
SAE Level 5 automation.                                          ML algorithm is a universal function that learns an approx-
                                                                 imate mapping from an input to an output in accordance to
   3    Machine Learning-based AV Systems                        the training data. This technology is fundamentally statisti-
At present, machine learning plays a major role in the AV        cal. Edge cases, unusual circumstances that are uncommon
technology. A significant amount of technology goes into an      in training data, are not covered. Obviously, anything outside
AV: radar (to detect cars and other large objects), ultrasonic   the training data is not learned. This is especially true for au-
sensors (to detect objects close by, e.g, the curb), lidar (to   tonomous driving, where there are many situations that may
detect lane markings, edge of a road, etc.), GPS (for direc-     never be encountered in the driving data collected for train-
tion to destination and knowing AV’s location), and video        ing. A good example of this is flashing red and blue lights
cameras (for obtaining the surrounding scene and analyz-         in police cars and fire trucks that Tesla autopilot system has
ing it). All these are connected to a central computer where     had trouble with (Shen 2021). There are many such exam-
processing takes place. Driving data is collected along with     ples where a ML system gets confused by the noise present
sensor readings, video images, Lidar data, etc. In NVidia’s      in the data and generates an erroneous model. Fig. 2 shows
                                                                       To realize truly autonomous vehicles (SAE Level 5), we
                                                                    need to emulate the way humans drive cars, i.e., use ML
                                                                    for scene processing, while using automated commonsense
                                                                    reasoning for making inferences regarding driving actions.
                                                                    Once an inference is made (steer, accelerate, brake, etc.),
                                                                    it can be easily carried out by the machine. Our ideas are
Figure 3: Partial modifications to traffic signs can cause ML       based on the insight that for tasks for which humans use pat-
models to misclassify the sign entirely. A stop sign can be         tern matching (e.g., picture recognition), we should use ML
classified into a speed limit. Speed limits can be misinter-        technology, while for tasks for which humans use deduction,
preted as well, a small tape can make 35 to be classified as        we should use automated reasoning. The current practice of
85.                                                                 using ML for all AV tasks is overkill and is preventing us
                                                                    from reaching SAE Level 5.
                                                                       We envisage that the cameras will provide an image of
a scenario where Tesla AV model fails to detect a barrier on        the surroundings every second or so. This image will be pro-
the left. Fig. 3 shows an example where slight perturbations        cessed using deep learning (object, lane detection, depth pre-
to traffic signs can cause failure cases in ML models. There        diction, etc.) so that all the items present in the picture will
are many instances where an AV got into an accident, result-        be labeled and their bounding-boxes marked. The labels will
ing in loss of life (Law 2021). We believe that our common-         be predicates, that will be extracted from the picture, along
sense reasoning-based AUTO -D ISCERN system can safely              with coordinates of each bounding-box. A picture can be
deal with situations where other ML-based systems failed            labeled with predicates with the help of datasets such as Vi-
(see error mitigation in Sec. 6.3).                                 sual Genome (Krishna, Zhu et al. 2016) and systems such
   Because of these issues, many companies have scaled              as DenseCap (Johnson, Karpathy, and Fei-Fei 2016). These
down their ambitions down from SAE level 5. Some com-               predicates that describe the picture, constitute the input to
panies shut down (e.g., Starsky Robotics). Others (e.g.,            the AUTO -D ISCERN system. The commonsense rules that
Waymo) revised their goal down to achieving SAE Level               a human driver uses will take this data (expressed as predi-
4 while others have restricted autonomous behavior within           cates that capture the position and spatial relationship among
limited circumstances (e.g., geo-fenced areas). So the goal         various objects in the scene) as input to compute a driving
of reaching SAE Level 5 seems illusive. In this paper, we           decision. The question then arises: How does one automate
argue that SAE Level 5 can be reached via automated com-            commonsense reasoning?
monsense reasoning. In fact, we believe that automated com-
monsense reasoning is indispensable for the AV technology
to reach SAE Level 5 (full automation).                                                  5   Background
                                                                    Commonsense Reasoning: As mentioned earlier, an au-
  4    AV based on Commonsense Reasoning:                           tonomous driving system should be able to understand and
                    Motivation                                      reason like a human driver. If we examine how we humans
To drive a vehicle, a human driver must have the:                   reason, we fill lot of gaps in our understanding of a scene,
                                                                    conversation, or a piece of text we read, through our com-
1. ability to control the vehicle, i.e., be able to steer, brake,   monsense knowledge and reasoning (e.g., if we see a car
   accelerate, and signal for a turn.                               moving fast on a road, we use our commonsense knowl-
2. ability to make visual deductive inferences, i.e., be able       edge to infer that, normally, there must be a driver inside).
   to see objects in front of or around the vehicle and             Thus, to develop autonomous driving software, we need to
   make decisions, estimate the speed of objects, and project       automate commonsense reasoning, i.e., automate the human
   where they will be in the near future.                           thought process. The human thought process is flexible and
3. ability to make a visual abductive inference, i.e.: (i) be       non-monotonic in nature, which means “what we believe
   able to infer hidden or occluded parts of the objects; e.g.,     now may become false in the future with new knowledge”.
   a car will normally have four tires, even though only two        It is well known that commonsense reasoning can be mod-
   are visible; (ii) be able to perform counterfactual (“what-      eled with (i) defaults, (ii) exceptions to defaults, (iii) pref-
   if”) reasoning.                                                  erences over multiple defaults, and (iv) modeling multiple
                                                                    worlds (Gelfond and Kahl 2014; Brewka et al. 2011).
4. ability to distinguish between various scenarios, e.g., be
                                                                       Much of human knowledge consists of default rules, for
   able to tell, for example, that a car in a bill board is not
                                                                    example, “Normally, birds fly” is a default rule. However,
   the same as a car on the street.
                                                                    there are exceptions to defaults, for example, penguins are
   Essentially, there are two types of tasks involved in driv-      exceptional birds that do not fly. Reasoning with default
ing: (i) visual scene processing and inferencing (tasks #2          rules is non-monotonic, as a conclusion drawn using a de-
through #4 above) and (ii) controlling the vehicle (task #1).       fault rule may have to be withdrawn if more knowledge be-
Learning to control the vehicle is hard for humans, but vi-         comes available and the exceptional case applies. For exam-
sual inferencing comes naturally to us. Controlling the ve-         ple, if we are told that Tweety is a bird, we will conclude it
hicle is easy for a machine, learning visual inferencing is         flies. Knowing later that Tweety is a penguin will cause us to
significantly harder for it.                                        withdraw our earlier conclusion. Similarly, if we see a car,
we know there must be a driver inside, normally, unless we        mind of an automobile driver.
realize it’s a robot-taxi, then we withdraw that conclusion.
   Humans often make inferences in the absence of complete             6    Commonsense Reasoning-based AV
information. Such an inference may be revised later as more
information becomes available. This human-style reasoning         The commonsense rules that a human driver uses for driving
is elegantly captured by default rules and exceptions. Pref-      are modeled in ASP, using defaults, exceptions and prefer-
erences are needed when there are multiple default rules, in      ences. The input to these rules is gleaned from a scene that
which case additional information gleaned from the context        the driver sees, where the scene is translated into a set of
is used to resolve which rule to apply. One could argue that      predicates that describe the objects and their placement in
expert knowledge amounts to learning defaults, exceptions,        a scene. We assume that state-of-the-art ML technology is
and preferences in the field that a person is an expert in.       used to obtain these predicate labels. These predicates are
   Also, humans can naturally deal with multiple worlds.          represented as a set of facts in ASP that describe the envi-
These worlds may be consistent with each other in some            ronment and serves as input to the AUTO -D ISCERN system.
parts, but inconsistent in other parts. For example, animals      Formally, facts F for a given frame at a timestamp T are
don’t talk like humans in the real world, however, in the car-    combined with (commonsense) driving rules R to make a
toon world, animals do talk like humans. So Nemo the fish,        driving decision Y, given an intent X . Fig. Fig. 4 shows the
may be able to swim in both the real world and the cartoon        facts describing a scene and rule snippets that will be acti-
world, but it can talk only in the cartoon world. Similarly, we   vated for this scenario to compute a driving decision at each
are able to distinguish between a car shown on a billboard        timestamp. The ASP facts F contain speed, lane, relative
on the road and an actual car on the road. Humans have            distance, predicted trajectory of the AV and other detected
no trouble distinguishing between multiple worlds (world          objects, lane structure, intersection information, visible traf-
shown in the billboard vs real world) and can easily switch       fic signs and lights, etc. An intent X describes the short term
between them as the situation demands. Default reasoning,         goal that needs to be achieved by the AV in order to reach
augmented with the ability to operate in multiple worlds, al-     its destination. It is based on the instruction from the naviga-
lows one to closely represent the human thought process.          tion system, for example, continue in lane, stay in leftmost
Default rules with exceptions and preferences and multiple
worlds can be elegantly realized in the paradigm of ASP
(Gelfond and Kahl 2014; Baral 2003; Brewka et al. 2011)             Step 1. Scenario representation Extract entities from the im-
and executed using the s(CASP) system (Arias et al. 2018).          ages obtained from the environment using ML, computer vision
                                                                    models along with sensor data. Convert extracted information
ASP and s(CASP): ASP is a declarative paradigm that                 into structured format F that can be processed by prolog like
extends logic programming with negation-as-failure. We              programs
assume that the reader is familiar with ASP. Considerable
research has been done on ASP since the inception in the late
80s of the stable model semantics that underlies it (Brewka
et al. 2011). A major problem with ASP implementations
is that programs have to be grounded and SAT-solver-based
implementations such as CLINGO (Gebser et al. 2014)
used to execute the propositionalized program to find the            self_speed(5, 1), self_lane(1, 1), speed_limit(10, 1),
answer sets. There are multiple problems with this SAT-              lanes([1,2,3], 1), traffic_signs(stop, 1),
based implementation approach, which include exponential             obj_meta(1, 2, car, ...), traffic_light(green, 1)
blowup in program size, having to compute the entire                Step 2. Common sense reasoning Apply s(CASP) based rules
model, and not being able to produce a justification for a          R on the facts F to obtain a primitive understanding of the in-
conclusion (Gupta et al. 2017).                                     volved entities and their relations at every frame. Build upon the
   Goal-directed implementations of ASP such as s(CASP)             primitive understanding to obtain high level driving decisions Y
(Arias et al. 2018) work directly on predicate ASP programs          select_action(brake, T) :- brake_conditions(T),
(i.e., no grounding is needed) and are query-driven (simi-                                      obstacle_in_lane(...
lar to Prolog). The s(CASP) system only explores the parts           right_lane_clear(T) :- lanes(Lids, T) ...
of the knowledge-base that are needed to answer the query,                                  not neg_lane_clear(...)
and they provide a proof tree that serves as justification for       obstacle_in_lane :- subclass(OType, automobile), ...
the query. The s(CASP) system support predicates with ar-                                Depth #< Dist
bitrary terms as arguments as well as constructive nega-
                                                                    RESULT. High level action for each frame, along with a detailed
tion (Arias et al. 2018). It also supports abductive reason-        trace of why the action was chosen for the frame
ing. Goal-directed implementations of ASP such as s(CASP)
                                                                     ..., action(cruise, 140), action(accelerate, 141),
have been used for developing systems that emulate an ex-            action(cruise, 142), action(change_lane_left, 143),
pert. Chen et al have used it to emulate a cardiologists mind        ... , action(turn_left, 149), action(brake, 150) ...
by automating applications of the guidelines they use for
treating congestive heart failure (Chen, Marple et al. 2016).
The system, reportedly, can outperform cardiologists (Chen,
Salazar et al. 2018). In our project, we want to emulate the           Figure 4: Computation Steps of AUTO -D ISCERN .
lane, turn left, etc. Based on the facts and the driving rules, a   change_lane_left_conditions(T) :-
decision Y is taken which is one of accelerate, brake, cruise,          intent(stay_in_leftmost_lane, T).
change lane left, change lane right, turn left, turn right, etc.
                                                                    However, it is not always possible to perform an action even
6.1   Driving Rules                                                 if it is a short term goal. Predicate neg select action encodes
The driving rules written in s(CASP) are rules that drivers         the exception rules when performing the action is not safe
use while driving (e.g., if behind a slow-moving vehicle,           or possible. The AV cannot change to the left lane if its not
change lanes to go faster [default], unless lanes are blocked       clear of obstacles. Similarly, it cannot accelerate if it is ap-
[exception]). We refer to this collection of rules as the ‘Driv-    proaching a red traffic light or it is above the speed limit.
ing Rules Catalog’. First, we describe this catalog using ex-
amples. We then show (in the experimentation and testing            neg_select_action(accelerate, T) :-
                                                                        above_speed_limit(T);
section) how the catalog examples can be converted into
                                                                        self_lane(SLid, T), neg_lane_clear(T, SLid, 10);
ASP rules and executed in s(CASP) to make decisions.
                                                                        traffic_light(red, T).
   At the topmost level, the catalog is categorized by the          neg_select_action(change_lane_left, T) :-
set of actions (brake, accelerate, change lane, turn etc.). For         not left_lane_clear(T).
each action, ASP rules have been developed that use knowl-
edge of the scene to compute the driving action. Note that
the number of commonsense rules that humans use while               6.2   Experimentation & Testing
driving is not really that large. We have cataloged 35 rules at
present. Next, we give some examples.                               We demonstrate the usability and simplicity of our approach
                                                                    to autonomous driving. Further, we show selected scenarios
À Change lane to left, if there is a non-automobile obstacle        where common sense reasoning is essential to safe driving
  ahead in the lane within x meters, and the left lane is clear     and how our approach achieves this. For each selected sce-
  to perform the lane change                                        nario, we show the relevant rules that lead to the action and
Á Turn right, if the intent is to enter the right lane, and         describe the decision making process.
  if on the major lane of a T-junction intersection, and
  if AV’s predicted path does not intersect with any ob-            System Testing We have developed a large set of test sce-
  ject(pedestrian, cyclist, . . . )                                 narios to test our ASP-coded commonsense rules for driv-
 Turn right, if the intent is to enter the right lane, and if at   ing. The test scenarios cover normal and adverse conditions
  a signalized 4-way intersection, and if the traffic light is      encountered during driving. These scenarios cover general
  green, and if AV’s predicted path does not intersect with         cases as well as corner case situations that even humans
  any object                                                        would find challenging to drive in. In Fig. 5, we show some
                                                                    of the scenarios that are covered by our model.
à Brake, if there is an object ahead, within stopping dis-
  tance from the AV and in the same lane                            Common Driving Scenarios Once our rules were tested,
Ä Brake, if at an unsignalized 4-way intersection, and AV           we evaluated the AUTO -D ISCERN system on real-world sit-
  vehicle is not the earliest to arrive at the intersection         uations obtained from the K ITTI (Geiger, Lenz, and Urtasun
                                                                    2012) dataset. The experiment covers scenarios from a vari-
   To arrive at a decision, the AV evaluates all possible ac-
                                                                    ety of traffic conditions that one may come across on a day-
tions consistent with the current intent and decides on the
                                                                    to-day basis. We test AUTO -D ISCERN on manually anno-
best one. For each action, there is a set of default rules and
                                                                    tated subset of K ITTI scenarios to obtain a runtime analysis.
exceptions. The default rules evaluate the conditions under
which AV should consider taking the action. The exception
acts as a filter, checking if the action is safe to perform.
The hierarchical logic and default reasoning have been en-
coded in s(CASP). Here is code for change lane-left:
select_action(change_lane_left, T) :-
    change_lane_left_conditions(T),
    not ab(d_select_action(change_lane_left, T)),
    not neg_select_action(change_lane_left, T).


The change lane left conditions define the default rules to                 (a)                   (b)                   (c)
perform the action. The AV would consider changing to left
lane if the current intent is to stay in leftmost lane (precursor   Figure 5: Testing Scenario: white vehicle represents our AV
to performing a left turn) or it needs to overtake a vehicle or     (a) when obstructed by vehicles on all sides, AV should
avoid an obstacle ahead.                                            brake (b) when another object suddenly enters lane, AV
                                                                    should change lane if possible or brake (c) on a lane merge
change_lane_left_conditions(T) :- self_lane(SLid, T),               right, the AV should slow down, give way to traffic on target
    nonmv_ahead_in_lane(T, SLid, 20, OType),                        lane before merging
    neg_can_drive_over(OType), can_swerve_around(OType).
We selected 3 representative frames from 2 scenarios per en-
vironment and show the runtime for each frame. Tab. 1 sum-
marizes the result. The performance is largely determined by
the number of objects in the frame and the complexity of the
action performed.2 Note that on average our system com-
putes a decision for a frame in half a second. Our system
will take a snapshot once per second. We expect, on aver-
                                                                    change_lane_left_conditions(T) :-
age, half of this one second to go into analyzing the picture,
                                                                        intent(merge_into_left_lane, T).
annotating it and generating input data, and the other half in
                                                                    neg_select_action(change_lane_left, T) :-
computing the driving decision using s(CASP).                           not left_lane_clear(T).
                                                                    brake_conditions(T) :- intent(merge_into_left_lane, T),
                                   Runtime                              not left_lane_clear(T).
                  K ITTI
               Environment      per frame (ms)
                                Avg     Max                         (a) AV performs a change lane left to merge at high city traffic.
                   City         413      873
                  Road          285      635
                Residential     127      657
                 Campus         106      469

         K ITTI                   Runtime (ms)
       Scenario         Frame 1     Frame 2 Frame 3
         City 1           873         426      15                   turn_right_conditions(T) :- intent(enter_right_lane, T).
         City 2           507         525      262                  neg_select_action(turn_right, T) :-
        Road 1            160          26      32                       self_pred_path(SPath, T),
        Road 2            635         631      342                      obj_pred_path(Oid, OPath, T),
      Residential 1       657          21      16                       path_intersects(SPath, OPath).
      Residential 2        25          24      159                  brake_conditions(T) :- intent(enter_right_lane, T),
       Campus 1            47          51      31                       intersection(_, _, at, T).
       Campus 2           469          84      16                     (b) AV waits for pedestrians before performing a right turn.

   Table 1: Run-times for AUTO -D ISCERN on real-world                 Figure 6: Example frames from K ITTI experiments.
  environments in the K ITTI dataset. Top table shows avg
time taken for processing frames in various environments.          human driver. There are many advantages to developing an
                                                                   AV system based on commonsense reasoning:
   Selected frames from the experiment along with the rules        Explainability: Every driving decision made is explainable,
that allowed AUTO -D ISCERN to perform the required action         as it can be justified via rules. The justification is obtained
are shown below. Fig. 6a shows a situation where the AV has        by using the s(CASP) system’s proof tree generation facility
to merge into the left lane with high incoming traffic. As the     (Arias et al. 2020). An example proof tree fragment for the
AV approached the merge, it slowed down to a stop, per-            scenario in Fig. 6a is shown below:
forming the lane change when the left lane was clear. Fig.
6b is an example of waiting before performing a right turn.        QUERY:Does 'start_drive' holds (for 410, and 411)?
The rules build upon predicted object trajectories obtained        >'start_drive' holds (for 410, and 411) because
from ML model to realize performing a right turn is unsafe.         >'suggest_action' holds(for change_lane_left) because
These rules were derived from example À and Á of the driv-           >'action' holds (for change_lane_left) and
ing rules catalog respectively.                                       >there is no evidence that 'neg_suggest_action'
   Finally, we tested the AUTO -D ISCERN system on cases               holds (for change_lane_left) and
where a ML based system failed. AUTO -D ISCERN system                 >'select_action' holds (for change_lane_left) because
                                                                       >'change_lane_left_conditions' holds because
was able to arrive at a correct decision in all such scenarios.
                                                                        >'intent' holds (for merge_into_left_lane).
                                                                       >there is no evidence that 'neg_select_action'
6.3   Discussion                                                        holds (for change_lane_left) because
Our experiments indicate that commonsense knowledge                     >'left_lane_clear' holds because
about driving can be modeled with relative ease. This should             ...
not come as a surprise, as we use relatively few rules to chart          >there is no evidence that 'neg_lane_clear'
a course through the various types of objects we encounter                holds (for Lid 2, and StopDist 10) because
                                                                         >there is no evidence that 'class' holds,
as we drive. If the rules are correct, and the input to our sys-
                                                                          with Var0 not equal bicycle, bike, car,
tem is correct, then we are in effect modeling an unerring
                                                                          pedestrian
   2                                                                      ...
     Experiment run on Intel(R) Core(TM) i7-6500U CPU @
                                                                   The global constraints hold.
2.50GHz 8-Core Processor, 12GB RAM.
                                                                        7    Related Work and Conclusions
Error Mitigation: A major problem with ML-based sys-             Use of formal logic and ASP to model driving has been
tems is that unusual and corner cases may be missed. For         proposed in the past. Bhatt et al. have employed ASP and
example, as shown earlier, the speed limit of 35 may be          the CLINGO system for autonomous driving experiments
read as 85. If a human misinterprets the speed limit sign        (Suchan et al. 2018; Suchan, Bhatt, and Varadarajan 2019).
of 35 as 85, their commonsense knowledge that they are           They propose a framework that takes visual observations
in a city tells them that an 85 speed limit seems too high.      computed by deep learning methods as input and provides
This type of commonsense reasoning can be performed in           visuo-spatial semantics at each timestamp. These seman-
our AUTO -D ISCERN system. Defaults rules with exceptions        tics help in reasoning with overall scene dynamics (e.g.
can be written in ASP to determine what a reasonable speed       sudden occlusion of a motorcycle at a distance due to a
limit ought to be in each type of surrounding, which can         car right in the front). However, their work can only sup-
then be used to perform sanity checks as shown in the (self-     port decision-making via visual sense-making. On the other
explanatory) example below:                                      hand, our AUTO -D ISCERN system focuses on “understand-
                                                                 ing” the scene through commonsense reasoning and then
max_speed(Location, S) :- reasonable_speed(Location, S1),        computing a driving decision. Additionally, use of CLINGO
    posted_speed_limit(Location, S2),                            for executing ASP poses some limitations as discussed ear-
    minimum(S1, S2, S), not abnormal(Location, S).
                                                                 lier (Gupta et al. 2017).
                                                                    Karimi and Duggirala (Karimi and Duggirala 2020) have
   Commonsense knowledge can also be used to ensure that         coded up rules from the California DMV handbook in ASP
the frame corresponding to a scene is consistent with infor-     using CLINGO. Their goal is to verify the correctness of
mation provided by various sensors in the AV. If there is        AV systems’ behavior at intersections. In contrast, our ap-
any inconsistency, the sensor information can be given pri-      proach is to use commonsense reasoning/ASP for actual
ority, as visual information is more likely to be erroneous.     autonomous driving. There are other works in this direc-
Consider the example shown in Fig. 2. The following rules        tion that apply formal logic/reasoning to verifying AV sys-
allow AUTO -D ISCERN to perform a change lane right based        tems, particularly at unsignaled intersections (Hilscher and
on sensor data overriding the visual information.                Schwammberger 2016; Azimi et al. 2011; Hafner et al.
                                                                 2013; Loos and Platzer 2011), as well as situations where
change_lane_right_conditions(T) :-
    sensor(left, Dist, T),
                                                                 an AV should hand over to a human driver (McCall 2019).
    collision_distance(CD, T), Dist =< CD.                          To conclude, in this paper we described our
                                                                 AUTO -D ISCERN system that employs commonsense
  Thus, ensuring that the AV system is safe is considerably      reasoning (CSR) for autonomous driving. Commonsense
easier as we are emulating a human driver’s mind that gets       knowledge about driving is represented as a predicate
inputs from various sources and tries to infer a consistent      answer set program and executed on the s(CASP) goal-
world view with respect to which a driving action is taken.      directed ASP system. While ML based AV have made
A system like AUTO -D ISCERN can also be used to aid ML          significant advances, none of them have reached the level of
based AV systems to cross-check the decision made by them.       full automation. We strongly believe that taking an approach
Additionally, it’s provably ethical and explainable.             based on automating CSR is indispensable for developing
                                                                 AV technology. The goal is to emulate human drivers,
Handling Complex Scenarios: More complex scenarios               who use CSR for making decision while driving. Rules for
can also be handled through the use of s(CASP)-based com-        driving were developed into the AUTO -D ISCERN system
monsense reasoning technology. For instance, flashing traf-      with the help of existing data-sets such as K ITTI, driving
fic lights require that a sequence of scenes is processed to     manuals available, and our own commonsense knowledge
recognize the flashing of lights. Commonsense rules to per-      of driving. Our system is explainable and provably ethical.
form this processing can be easily written in ASP/s(CASP).       Input to the system consists of data from all the sensors
Similarly, predicting where various objects in the scene will    as well as description of the scene as predicates (obtained
be a few seconds in the future can also be done by analyzing     using ML technology). Given that CSR is used, ML errors
a sequence of temporally ordered scene through common-           in processing the scene can be compensated for.
sense reasoning. Incorporating more nuanced commonsense             The main contribution of our work is to demonstrate how
reasoning is part of our future work.                            a complete decision making system for autonomous driv-
   An ML based AV system has to be retrained if it has to        ing can be developed by modeling commonsense reasoning
be used in another country with slightly different conven-       in ASP and the s(CASP) system. The s(CASP) system is
tions. In contrast, a commonsense reasoning-based AV sys-        what makes AUTO -D ISCERN possible, and it is one reason
tem such as AUTO -D ISCERN can be easily used in such sit-       why an AV system based entirely on commonsense reason-
uations as the differences in conventions can be described       ing has not been developed thus far. Future work includes
as commonsense rules (the DL-based scene understanding           refining and developing the AUTO -D ISCERN infrastructure
system, of course, must be retrained if traffic signs are dif-   to make it work with the CARLA setup (https://carla.org) as
ferent). Note that humans don’t have to be heavily retrained     well as to develop an actual AV deployment with our indus-
when they drive in another country.                              trial partner.
                  Acknowledgement                                Hafner, M. R.; Cunningham, D.; Caminiti, L.; and Vecchio,
Authors are supported by NSF awards IIS 1718945, IIS             D. D. 2013. Cooperative collision avoidance at intersections:
1910131, IIP 1916206 and by grants from Amazon and               Algorithms and experiments. IEEE Transactions on Intelli-
DoD.                                                             gent Transportation Systems. Vol. 14, No. 3, pp. 1162–1175.
                                                                 Hilscher, M.; and Schwammberger, M. 2016. An abstract
                       References                                model for proving safety of autonomous urban traffic. Proc.
                                                                 International Colloquium on Theoretical Aspects of Com-
Arias, J.; Carro, M.; Chen, Z.; and Gupta, G. 2020. Justifica-   puting. Springer Verlag, 2016, pp. 274–292.
tions for goal-directed constraint answer set programming.       Johnson, J.; Karpathy, A.; and Fei-Fei, L. 2016. DenseCap:
arXiv preprint arXiv:2009.10238.                                 Fully Convolutional Localization Networks for Dense Cap-
Arias, J.; Carro, M.; Salazar, E.; Marple, K.; and Gupta, G.     tioning. In Proc. IEEE CVPR, 4565–4574. IEEE Computer
2018. Constraint answer set programming without ground-          Society.
ing. TPLP, 18(3-4): 337–354.                                     Karimi, A.; and Duggirala, P. S. 2020. Formalizing Traffic
Azimi, S. R.; Bhatia, G.; Rajkumar, R. R.; and Mudalige,         Rules for Uncontrolled Intersections. In 2020 ACM/IEEE
P. 2011. Vehicular networks for collision avoidance at           11th International Conference on Cyber-Physical Systems
intersections. SAE International Journal of Passenger            (ICCPS), 41–50.
Cars-Mechanical Systems. Vol. 4, no. 2011-01-0573, pp.           Krishna, R.; Zhu, Y.; et al. 2016. Visual Genome: Connect-
406–416.                                                         ing Language and Vision Using Crowdsourced Dense Image
Baral, C. 2003. Knowledge representation, reasoning and          Annotations. In https://arxiv.org/abs/1602.07332.
declarative problem solving. Cambridge Univ. Press.              Law, C. 2021. The Dangers of Driverless Cars. The National
Blanco, S. May, 2021.           SAE Updates, Refines Offi-       Law Review, 11(249).
cial Names for ’Autonomous Driving’ Levels. Car and              Loos, S. M.; and Platzer, A. 2011. Safe intersections: At the
Driver.     https://www.caranddriver.com/news/a36364986/         crossing of hybrid systems and verification. 14th IEEE Int’l
sae-updates-refines-autonomous-driving-levels-chart/.            Conference on Intelligent Transportation Systems (ITSC),
Bojarski, M.; Chen, C.; Daw, J.; Değirmenci, A.; Deri, J.;      1181–1186.
Firner, B.; Flepp, B.; Gogri, S.; Hong, J.; Jackel, L.; et al.   McCall, R. 2019. A taxonomy of autonomous vehicle han-
2020. The NVIDIA PilotNet Experiments. arXiv preprint            dover situations. Transportation Research Part A: Policy
arXiv:2010.08776.                                                and Practice, 124: 507–522.
Brewka, G.; et al. 2011. Answer set programming at a             Shen, M. 2021. Tesla on autopilot slams into police
glance. Commun. ACM, 54(12): 92–103.                             car. USA Today. https://www.usatoday.com/story/money/
                                                                 business/2021/08/29/tesla-part-automated-drive-system-
Chen, Z.; Marple, K.; et al. 2016. A Physician Advisory Sys-
                                                                 slams-into-police-car/5642789001/.
tem for Chronic Heart Failure management based on knowl-
edge patterns. Theory Pract. Log. Program., 16(5-6): 604–        Suchan, J.; Bhatt, M.; and Varadarajan, S. 2019. Out of
618.                                                             Sight But Not Out of Mind: An Answer Set Programming
                                                                 Based Online Abduction Framework for Visual Sensemak-
Chen, Z.; Salazar, E.; et al. 2018. An AI-Based Heart Failure    ing in Autonomous Driving. In Proc. IJCAI, 1879–1885.
Treatment Adviser System. IEEE Journal of Translational          ijcai.org.
Engineering in Health and Medicine, 6: 1–10.
                                                                 Suchan, J.; Bhatt, M.; Walega, P. A.; and Schultz, C. P. L.
DARPA. 2014. The DARPA Grand Challenge: Ten Years                2018. Visual Explanation by High-Level Abduction: On
Later. https://www.darpa.mil/news-events/2014-03-13.             Answer-Set Programming Driven Reasoning About Moving
Gebser, M.; Kaminski, R.; Kaufmann, B.; and Schaub, T.           Objects. In Proc. AAAI 2018, 1965–1972.
2014. Clingo = ASP + Control: Preliminary Report. CoRR,          Thrun, S. 2010. Toward Robotic Cars. Commun. ACM,
abs/1405.3694.                                                   53(4): 99–106.
Geiger, A.; Lenz, P.; and Urtasun, R. 2012. Are we ready for     Wikipedia contributors. 2021.             History of self-
Autonomous Driving? The KITTI Vision Benchmark Suite.            driving cars — Wikipedia, The Free Encyclopedia.
In Conference on Computer Vision and Pattern Recognition         https://en.wikipedia.org/w/index.php?title=History of self-
(CVPR).                                                          driving cars&oldid=1040733011. [accessed Jan. 6 ’21].
Gelfond, M.; and Kahl, Y. 2014. Knowledge representation,
reasoning, and the design of intelligent agents: The answer-
set programming approach. Cambridge University Press.
Gupta, G.; Salazar, E.; ; et al. 2017. A Case for Query-
driven Predicate Answer Set Programming. In ARCADE
2017, 1st International Workshop on Automated Reasoning:
Challenges, Applications, Directions, Exemplary Achieve-
ments, Gothenburg, Sweden, volume 51 of EPiC Series in
Computing, 64–68.