=Paper=
{{Paper
|id=Vol-2699/paper09
|storemode=property
|title=An Adaptive Semantic Stream Reasoning Framework for Deep Neural Networks
|pdfUrl=https://ceur-ws.org/Vol-2699/paper09.pdf
|volume=Vol-2699
|authors=Danh Le-Phuoc,Thomas Eiter
|dblpUrl=https://dblp.org/rec/conf/cikm/PhuocE20
}}
==An Adaptive Semantic Stream Reasoning Framework for Deep Neural Networks==
An Adaptive Semantic Stream Reasoning Framework for Deep Neural Networks Danh Le-Phuoca , Thomas Eiterb a Technical University Berlin b Technical University Vienna Abstract Driven by deep neural networks (DNN), the recent development of computer vision makes visual sensors such as stereo cameras and Lidars ubiquitous in autonomous cars, robotics and traffic monitoring. However, due to operational constraints, a processing pipeline like object tracking has to hard-wire an engineered set of DNN models to a fixed processing logic. To overcome this, we propose a novel semantic reasoning approach that uses stream reasoning programs for in-cooperating DNN models with commonsense and domain knowledge using Answer Set Programming (ASP). This approach enables us to realize a reasoning framework that can adaptively reconfigure the reasoning plan in each execution step of incoming stream data. Keywords Semantic Reasoning, Neural-symoblic, Stream Reasoning, Stream Processing 1. Motivation ple, the recent report of Uberβs accident in Arizona [4] says "...The ADS detected the pedestrian 5.6 seconds be- The recent development of computer vision (CV) driven fore impact. Although the ADS continued to track the by deep neural networks (DNN) makes visual sensors pedestrian until the crash, it never accurately classified such as stereo cameras and Lidars ubiquitous in au- her as a pedestrian or predicted her path. By the time the tonomous cars, robotics and traffic monitoring. In par- ADS determined that a collision was imminent (1 second ticular, many DNN models for object detection [1] and before impact), the situation exceeded the response spec- tracking [2] are available. However making reliably ifications of the ADS braking system...". This accident working in a real-life online processing pipeline such could have not happened if the ADS could reconfig- as an Automated Driving System (ADS) or a traffic ure the object tracking pipeline on the fly, e.g chang- surveillance query engine is still very challenging. For ing DNN models or using alternative sensor sources to example, [2] reports that the most accurate DNN-driven improve the accuracy on detection and tracking. multi-object tracking (MOT) pipelines can process only This motivates us to propose an approach of com- 4-5 frames/second. To make such a system work on- bining stream reasoning with probabilistic inference line e.g. for ADS, where processing delay must be less to continuously configure such processing pipelines than 100ms [3], one has to hard-wire a fixed sets of based in semantic information representing common- DNN models with some sacrifices on accuracy and ro- sense and domain knowledge. The use of semantic in- bustness as the design constraints of an ADS limit how formation together with DNNs has proved to be useful much hardware can be put into a system [3]. For in- and led to better accuracy in image understanding [5] stance, an additional 400 W power consumption trans- and in object tracking [6]. Similar to ours, these ap- lates to a 3.23% reduction in miles per gallon for a 2017 proaches use declarative approaches to represent the Audi A4 sedan or similarly, the additional power con- processing pipelines of visual data. However, none of sumption will reduce the total driving range of electric them have considered how to deal with the aforemen- vehicles. tioned operational constraints in the context of stream Such a design-time trade-off often leads to unpre- processing. Our approach represents such constraints dictable fatal errors in a real deployment. For exam- in an extension of Answer Set Programming (ASP). This extension is proposed by leveraging LARS formu- Proceedings of the CIKM 2020 Workshops, October 19-20, Galway, las [7] for expressing stream reasoning programmes to Ireland. incorporate uncertainty of probabilistic inference op- email: danh.lephuoc@tu-berlin.de (D. Le-Phuoc); thomas.eiter@tuwien.ac.at (T. Eiter) erations under weighted rules similar to LPππΏπ [8], orcid: 0000-0003-2480-9261 (D. Le-Phuoc); 0000-0001-6003-6345 called semantic reasoning rules. As a result, we will (T. Eiter) be able to dynamically express a visual sensor fusion Β© 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). pipeline, e.g. MOT over multiple cameras, by seman- CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) tic reasoning rules to fuse probabilistic inference op- b 2 b 4 b 6 erations with ASP-based solving processes. Moreover, the expressive power of our approach also enables us b 1 b 3 b 5 to express operational constraints together with opti- det(yl,b ,car), trk(39,b ), β¦ 1 det(yl,b ,car), trk(39,b ), 2 β¦ 3 det(yl,b ,car), trk(41,b ), 4 β¦ 5 6 misation goals as a probabilistic planning programme 1 2 3 t similar to ππ΅πΆ + [9] so that our reasoning framework can reconfigure the reasoning plan adaptively in each Figure 1: A Semantic Visual Stream Snapshot execution step of incoming stream data. reads are missing), based on commonsense principles. 2. Formalization of Semantic We thus use the LARS framework [7] to represent a reasoning programme over our semantic stream data Stream Reasoning with DNN which can evaluated using an ASP solver. Models The LARS framework provides formulas with Boolean connectives and temporal operators @π πΌ, β‘πΌ, and βπΌ Semantically, a multi-sensor fusion data pipeline will to evaluate a formula πΌ at a time point π (resp. ev- consume the data that is observed by a Sensor as a ery, some time point) in the current stream π; win- stream of observations (represented as an Observation) dow operators β πΌ take for evaluating πΌ a data snap- π€ following standardized W3C/OGC Semantic Sensor Net- shot (substream) π from π by applying a function π€ on β² work Ontology (SSN) [10] . For instance, the example π. For example, the formula β βπππ‘(π¦π, π΅, πππ) states +5 in Figure 1 shows 3 image frames are observed by a that at evaluation time π , πππ‘(π¦π, π΅, πππ) holds at some traffic camera. These observations will then be fed into time point π in the window [π β 5, .., π ] selected by β² a probabilistic inference process such as a DNN model π€ = +5; in our representation, πππ‘(π¦π, π΅, πππ) is the or a CV algorithm (represented as a Procedure) to pro- matching condition for "a car was detected in bound- vide derived stream elements which then are repre- ing box π΅ by the YOLO detector". The formula @π πΌ senting Sampling instances. In this example, we have is aligned with a fluent πΉ resp. an event πΈ in Event πππ‘(π¦π, π1 , πππ) representing for the output bounding Calculus (EC) [12], such that @π πΉ β‘ βππππ π΄π‘(πΉ , π ) and box π1 from the YOLO detector π¦πβπ·π where π·π (short @π πΈ β‘ βππππππ (πΈ, π ). This will help us to employ for Detector) represents for the set of detectors sup- ASP-based EC axioms for common sense reasoning rules ported. Similarly, π‘ππ(39, π2 ) represents for tracking as proposed in [6]. bounding box π2 generated by a tracking algorithm (a To deal with uncertainty, we extend LARS with weighted Tracker) which associates π2 with the tracklet 39 via rules under LPππΏπ semantics as in [8]. In LPππΏπ , the popular object tracking algorithm SORT [11]. facts generated from feature extractors (DNN models The symbol FeatureOfInterest (FoI) is used to repre- or OpenCV algorithms) as well as abduction rules on sent the domain of physical objects which are subjects top can be annotated with uncertainty information given for the sensor observations, e.g. tracking objects and by a weight, which allows reasoning under certain lev- field of views (FoV) of the camera. The relationship be- els of uncertainty. tween a Result generated by probabilistic inference al- For our concerns, a semantic reasoning program Ξ is gorithms (e.g. YOLO detection model or Kalman filter a set of weighted rules π of the form algorithm) to such object is represented by the predi- cate isSampleOf (denoted as iSO). As such algorithms πβΆπΌ βπ½ (1) have output with uncertainty, we will use an abduc- where πΌ, π½ are LARS formulas and π β β βͺ {π₯} is the tion reasoning process to search for explainable evi- weight of the rule. If π = π₯, then π is a hard rule, oth- dences for iSO via rules driven commonsense and do- erwise a soft rule; by Ξ β and Ξ π we denote the sets main knowledge similar to [6]. of hard and soft rules of Ξ , respectively. The seman- To formalise the reasoning process with such a se- tics of Ξ is given by the answer streams [7] π of the mantic representation of stream data, we need a tem- LARS program Ξ π obtained from Ξ by dropping the poral model that allows us to reason about the prop- weights and each π where π violates πΌ β π½; each erties and features of objects from streams of sensor such π gets a probability ππΞ (π) assigned calculated observations. This model must account for the laws of from the weights of the rules retained for Ξ π ; for more the physical world movement and in particular be able information, we refer to [8]. In Section 3 we will ad- to fill gaps of incomplete information (e.g., if we do dress how to translate restricted programs Ξ into ASP not see objects appearing in observations, or camera programs to be fed into an ASP solver, and how the Reasonser solve weights π can be learned from training data. genSnapshot Stream Processor To demonstrate how to build a semantic reasoning ASP Solver program, we will emulate DeepSORT tracking algo- KB rithm [13] via soft rules that can search for supporting lars2asp DNN models evidences to re-identify objects associated with track- Planner genPlanPf lets created by Kalman filter above, by using visual ap- solve pearance associations. DeepSORT extends SORT with a DNN that is trained to discriminate targeted objects (e.g. pedestrians or vehicles) on a labelled re-identificationFigure 2: Stream Reasoning Framework dataset. Hence, we will search for pairs of bounding boxes of 3. Dynamic Reasoning two similar tracklets w.r.t. visual appearance. Due to a large search space of possible matches, we will limit Framework the search space by filtering the candidates based on their temporal and spatial properties. Therefore, we To the realize our reasoning approach in Secion 2, we use rules with windows to reason about two discon- proposes a dynamic reasoning framework illustrated nected tracklets that have two bounding boxes matched in Figure 2. The key components Reasoner and Planner within a time window of πΏ π time points which are of the framework are built on top an ASP Solver and aligned with the DeepSORTβs gallery of associated ap- a Stream Processor which are pluggable and generic pearance descriptors for each track. Based on this gallery modules. The control logic of the framework is gov- of previous tracked boxes, the cosine distance is com- erned by Algorithm 1. puted to compare appearance information that are par- While any ASP Solver supporting weak constraints ticularly useful to recover identities after long term oc- can be used in our framework, existing stream proces- clusions, when motion is less discriminative. Hence, sors such as relational or graph stream processing en- for merging two adjacent tracklets that have visual ap- gines need to be extended with some prerequisite fea- pearance matches, we use the parametrized soft rule tures to connect with the rest of the framework. For (15) below. The pair of parameters (πΏ , π£π ) has to be π π instance, in our under-development prototype, we ex- specified for each reasoning step via the probabilistic tend CQELS [14] to enable DNN inference on GPUs as planning component of our dynamic reasoning frame- built-in functions for CQELS-QL, the graph-based con- work in Section 3; π£π is one of the available visual π tinuous query language of CQELS. Via CQELS-QL, the matching models that represent the association met- auxiliary predicates such as πππ, πππ₯π‘ and πππ π‘ in Sec- rics to discriminate comparing bounding boxes. tion 2 are expressed as continuous queries in order to delegate the processing to the stream processor. This π1π βΆ πππ(π΅1 , π) β @π π‘ππ(π1 , π΅1 ), πππ(π΅1 , π2 ), π‘πππππ‘(π2 , π), mechanism also helps us to avoid grounding overhead π in continuous solving via materialised views similar to @ππ ππππ (π2 ), @ππ β+πΏ βπ‘ππ(π2 , π΅2 ), the over-grounding approach in [15]. In particular, we π π < ππ + 3, π£πππ‘πβ(π£π , π΅1 , π΅2 ) (2) leverage continuous multiway-joins with windows of Similarly, we can define rules to trigger the object CQELS to delegate the processing of LARS formulas matching search based on visual appearances of two that do not occur in rule heads. tracklets from two adjacent cameras. We use Moreover, the visual stream data with different for- @π πππ π‘(π΅1 , πΆ, π π‘ ) to specify the time difference π π‘ from mats (e.g. RGB videos and Lidar PointCloud) together the candidate camera πΆ at time point π to start the with the knowledge base (KB) have to be normalized search for the matches via the auxiliary predicate πππ π‘. to the data model supported by the stream processor. Also, πΆ is filtered by the auxiliary predicate πππ₯π‘ stat- For example, ontologies and metadata and extracted ing πΆ is adjacent to the camera where π΅1 was gener- symbols as outputs of DNN inference processes are ated. πππ π‘πΉ ππ (π, πΆ) represents for "object π left the represented as temporal graphs of CQELS. With these FoV of camera πΆ". features, the stream processor is able to generate data snapshots and planning profiles in ASP readable for- π2π βΆ πππ(π΅1 , π) β @π π‘ππ(π1 , π΅1 ), πππ₯π‘(π΅1 , πΆ), (3) mat via two methods ππππ πππππ and ππππππππ βππ‘ re- spectively. π @π πππ π‘(π΅1 , πΆ, π π‘ ), @π +π π‘ β+πΏ βπππ π‘πΉ ππ (π, πΆ), π‘πππππ‘(π2 , π), π‘ππ(π2 , π΅2 ), π£πππ‘πβ(π£π π , π΅1 , π΅2 ) The Planner calls the method ππππ πππππ to prepare the input for the first reasoning step of each time point Algorithm 1 Semantic_Reasoning(S, π‘) LARS formulas to an ASP program using the method Input: Semantic Stream π, new observation π, time ππππ 2ππ π, whose optimal models (answers sets) πΌ cor- π point π‘ respond to the solutions of (5). Output: Optimal answer set πΌ β Μ argmax ππΞ Μπ (Ξ πΊ |π, π , Ξ π ) (5) 1: π β 0, Ξ π β β πΌ π βΆΞ Μπ (π )β©πΌ π 2: for {π βΆ πΌ β π½} β Ξ π do 3: π βπ+1 With a chosen plan embedded in some πΌ π , the Reasoner 4: Ξ Μπ β Ξ Μπ βͺ {π’ππ ππ‘(π) β π½, πππ‘ πΌ} calls the method ππππππππ βππ‘ to generate the input 5: Ξ Μπ β Ξ Μπ βͺ {πΌ β π½, πππ‘ π’ππ ππ‘(π)} data of the reasoning program for the next step in line (11) to carry out the second reasoning step from line 6: Ξ Μπ β Ξ Μπ βͺ {βΆβΌ π’ππ ππ‘(π) [π@0]} (12) to (14) to generate the output of the whole pipeline 7: end for Μ as the optimal model πΌ β . To specify the weights of the 8: Ξ π β ππππ πππππ (π, π, Ξ π , π‘ β 1) Μ soft rules, we use the weight learning approach of [17] 9: Ξ Μπ β ππππ 2ππ π(Ξ π βͺΞ β βͺΞ π , π‘) which fits the weights using the training data via gra- 10: πΌ π β Solve(Ξ Μπ ) dient ascent. Training is done offline but uses Algo- 11: Ξ π· β ππππππππ βππ‘(π, πΌ π , π‘) rithm 1 to compute an optimal stable model in each 12: Ξ Μ β ππππ 2ππ π(Ξ Μπ βͺΞ β βͺΞ π· , π‘) step of updating weights in the gradient method. 13: πΌ β β Solve(Ξ Μ) 14: return πΌ β 4. Conclusion π which finds the optimal reasoning plan to achieve a This position paper presented a novel semantic rea- certain goal Ξ πΊ under an operational constraint Ξ π , soning approach that enables probabilistic planning following a probabilistic planning approach from [9] to adaptively optimize the sensor fusion pipeline un- in lines (8) to (10) of Algorithm 1. The reasoning prob- der operational constraints expressed in ASP. The ap- lem formalised in following is to find a configuration proach is realised with a dynamic reasoning mecha- (ππ‘ π , π£π π , πΏ π ) to result the highest probability of our nism that can integrate the uncertainy of DNN infer- tracking goal. For example, we can specify the goal of ence with semantic information, e.g common sense and being able to track the objects that were tracked in the domain knowledge in conjunction with runtime infor- previous time point π β 1 as mation as inputs for operational constraints. We are currently implementing an open sourced pro- Ξ πΊ = βπβ@π β1 π‘πππππ‘(π,_) @π π‘πππππ‘(π, _) totype of the proposed reasoning framework in Java Similarly, an operational constraint Ξ π can be expressed to exploit the code bases of CQELS and Ticker. We ASP rules. For instance, the example rule below rep- use the Java native interface to wrap C/C++ libaries of resents the constraint to limit executable plans Clingo 5.4.0 as the ASP Solver and NVidia CUDA 10.2 ππππ(π·, π , πΏ) at time point π based on the estimations as the DNN Inference engine. The solving and infer- the execution time of a candidate detection model π· ence tasks are coordinated in an asynchronous multi- and visual a matching model π together a candidate threading fashion to exploit the massive parallel capa- window parameter πΏ π . The auxiliary predicates ππ π‘ bilities of CPUs and GPUs. and ππππ provide the time estimation for correspond- ing DNN operations and the number of objects tracked in camera πΆ. Acknowledgments This work was funded in part by the German Ministry βΆβΌ @π ππ π‘(π·, ππ· ), @π ππ π‘(π , ππ ), @π ππππ(π·, π , πΏ), (4) @π β1 ππππ(πΆ, π ), ππ· + π β πΏ π β ππ > πππ₯π πππ for Education and Research as BIFOLD - Berlin Insti- tute for the Foundations of Learning and Data (refs From Ξ , lines (8) to (10) carry out solving the reason- 01IS18025A and 01IS18037A) and supported by the Aus- π ing problem formalised by formula (4). To generate the trian Federal Ministry of Transport, Innovation and LARS program from the soft rules Ξ π , the algorithm Technology (BMVIT) under the program ICT of the rewrites Ξ π into LARS formulas with weak constraints Future (FFG-PNr.: 861263, project DynaCon). as Ξ Μπ in lines (1) to (7) by extending the similar algo- rithm for LPππΏπ in [8]. Then, we use the incremental ASP encoding algorithm of Ticker [16] for rewriting References [13] N. Wojke, A. Bewley, D. Paulus, Simple online and realtime tracking with a deep association [1] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, metric, in: ICIP, 2017. X. Liu, M. PietikΓ€inen, Deep learning for generic [14] D. Le-Phuoc, M. Dao-Tran, J. X. Parreira, object detection: A survey, IJCV (2019). M. Hauswirth, A native and adaptive approach [2] G. Ciaparrone, F. L. SΓ‘nchez, S. Tabik, L. Troiano, for unified processing of linked streams and R. Tagliaferri, F. Herrera, Deep learning in video linked data, in: ISWC, 2011, pp. 370β388. multi-object tracking: A survey, Neurocom- [15] F. Calimeri, G. Ianni, F. Pacenza, S. Perri, J. Zan- puting (2019). URL: http://www.sciencedirect. gari, Incremental answer set programming with com/science/article/pii/S0925231219315966. overgrounding, TPLP 19 (2019). doi:https://doi.org/10.1016/j.neucom. [16] H. Beck, T. Eiter, C. Folie, Ticker: 2019.11.023. A system for incremental asp-based [3] S.-C. Lin, Y. Zhang, C.-H. Hsu, M. Skach, M. E. stream reasoning, TPLP (2017). URL: Haque, L. Tang, J. Mars, The architectural impli- https://doi.org/10.1017/S1471068417000370. cations of autonomous driving: Constraints and doi:10.1017/S1471068417000370. acceleration, in: ASPLOS β18, 2018. [17] J. Lee, Y. Wang, Weight learning in a probabilistic [4] NTSB, Collision between vehicle controlled extension of answer set programs, in: KR, 2018, by developmental automated driving sys- pp. 22β31. tem and pedestrian in Tempe, Arizona, https://www.ntsb.gov/news/events/Documents/ 2019-HWY18MH010-BMG-abstract.pdf, 2019. Accessed: 2020-01-15. [5] S. Aditya, Y. Yang, C. Baral, Integrating knowl- edge and reasoning in image understanding, in: IJCAI, 2019. URL: https://doi.org/10.24963/ijcai. 2019/873. doi:10.24963/ijcai.2019/873. [6] J. Suchan, M. Bhatt, S. Varadarajan, Out of sight but not out of mind: An answer set pro- gramming based online abduction framework for visual sensemaking in autonomous driv- ing, in: IJCAI-19, 2019. URL: https://doi.org/ 10.24963/ijcai.2019/260. doi:10.24963/ijcai. 2019/260. [7] H. Beck, M. Dao-Tran, T. Eiter, LARS: A logic-based framework for analytic reasoning over streams, Artif. Intell. 261 (2018) 16β70. URL: https://doi.org/10.1016/j.artint.2018.04.003. doi:10.1016/j.artint.2018.04.003. [8] J. Lee, Z. Yang, LPMLN, Weak Constraints, and P-log, in: AAAI, 2017. [9] J. Lee, Y. Wang, A probabilistic extension of ac- tion language BC+ , TPLP 18 (2018) 607β622. URL: https://doi.org/10.1017/S1471068418000303. doi:10.1017/S1471068418000303. [10] K. Janowicz, A. Haller, S. J. D. Cox, D. L. Phuoc, M. LefranΓ§ois, SOSA: A lightweight ontology for sensors, observations, samples, and actuators, J. Web Semant. 56 (2019) 1β10. [11] A. Bewley, Z. Ge, L. Ott, F. Ramos, B. Upcroft, Simple online and realtime tracking, in: ICIP, 2016, pp. 3464β3468. [12] E. T. Mueller, Commonsense Reasoning: An Event Calculus Based Approach, 2 ed., 2015.