FORSETI: A Provenance-aware Visual Analysis Environment for the Lifecycle Management of E-autopsy Reports Baoqing Wang1 , Noboru Adachi2 and Issei Fujishiro1 1 Keio University, Graduate School of Science and Technology, Yokohama, Kanawaga 223-8522, Japan 2 University of Yamanashi, Graduate School of Medical Science, Chuo, Yamanashi 409-3898, Japan Abstract Autopsy reports are imperative for both medical and legal science. Medical examiners (MEs) and diagnostic radiologists (DRs) cross-reference autopsy findings, while judicial personnel derive legal documents. In a prior study, we proposed a visual analysis system named FORSETI (forensic autopsy system for e-court instruments) with x-LMML (extended legal medicine markup language) for MEs and DRs to author and review e-autopsy reports. In this paper, we outline our extended work in progress to introduce a provenance infrastructure for forensic data accountability to FORSETI, which can be characterized by two technical essences. The first is a provenance management mechanism that combines the forensic autopsy workflow management system (FAWfMS) and lmmlgit (a version control system for x-LMML files), allowing a large amount of provenance information about e-autopsy reports and their documented autopsy processes to be individually parsed. The second is authority management, which ensures the confidentiality of e-autopsy reports by deploying strict syntax-guided workflow controls and a custom-tailored tool. Keywords Computational forensics, Legal medicine, Accountability, Provenance, Authority 1. Introduction (VAs) to record autopsy results and to cross-reference PAs and VAs. Generally, in the refinement of PA (or In forensic science, the generation and utilization of VA) results, MEs (or DRs) with different experiences forensic autopsy reports are intrinsically a collaborative perform back-and-forth analyses of forensic data, while data science activity. Usually, forensic autopsy reports they expend substantial efforts recording provenance are generated by medical examiners (MEs) collaboratively information. This manual collection of provenance is working with diagnostic radiologists (DRs); the reports time-consuming, laborious, and error-prone. In cross- then serve as underlying legal documents for MEs and referencing, MEs and DRs may make biased or inaccurate DRs as well as for judicial personnel (JP). In these pro- autopsy decisions. This is because the clues of autopsy cesses, a large amount of forensic data needs to be col- insights, which serve as interpretative provenance infor- lected, visualized, analyzed, and annotated. Thus, much mation inspired by MEs and DRs experiences, are not work has been done to develop computational tools and well provided within the autopsy report. Thus, for MEs techniques for processing forensic data, including foren- and DRs to effectively share knowledge and insights, the sic autopsy assistance systems [1, 2], virtual autopsy plat- development of applications supporting the systematic forms [3], and languages [4, 1]. However, the use of management and analysis of provenance is necessary. In computational environments for forensic data has raised addition, JP finds existing autopsy reports cumbersome some critical issues—particularly, how autopsy insights because of the deficiencies of non-derivability. and results are obtained from forensic data, how the con- Multiple stakeholders (MEs, DRs, and JP) are involved fidentiality of forensic data is handled, and how to ensure in complicated pipelines for handling autopsy reports, the trustworthiness of the autopsy results. We elaborate where ethics and policies are commonly respected to on these concerns in the following. protect postmortem privacy. These ideological and legal Forensic autopsy reports are commonly generated and constraints are not sufficient for maintaining forensic in- used in physical autopsies (PAs) and virtual autopsies formation security. Clearly, computational tools and sys- tem mechanisms ensuring the confidentiality of autopsy Published in the Workshop Proceedings of the EDBT/ICDT 2022 Joint reports are needed. The verifiability and confidentiality Conference (March 29-April 1, 2022), Edinburgh, UK of data provenance in forensic autopsy workflows are $ wangbaoqing@keio.jp (B. Wang); fuji@ics.keio.ac.jp crucial for establishing data accountability, with which e- (I. Fujishiro) € https://fj.ics.keio.ac.jp/en/member/baoqing-wang (B. Wang); autopsy can ensure that data contributors are committed https://fj.ics.keio.ac.jp/en/member/issei-fujishiro (I. Fujishiro) to the truthfulness of the data.  0000-0002-6184-4245 (B. Wang); 0000-0002-8898-730X In our prior research [1], we introduced a visual anal- (I. Fujishiro) ysis system called FORSETI (forensic autopsy system for © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). e-court instruments) with x-LMML (extended version of CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) legal medicine mark-up language). The proposed pro- tion via designated commands. However, because these totype assists MEs and DRs in authoring and browsing kinds of systems ignore the structure of the script, the e-autopsy reports using x-LMML, but the research was user may find it difficult to link the provenance they have not targeted at the lifecycle management of e-autopsy collected to the steps in the script. reports in terms of data provenance. Unfortunately, the FORSETI prototype [1] does not In this paper, we outline our work in progress that ad- support provenance functionalities. We therefore extend dresses the aforementioned issues for establishing data our original research to introduce provenance aware- accountability by extending the prior research to design a ness to the FORSETI by taking and using the best of the provenance-aware FORSETI, which can be characterized workflow-based and script-based provenance approaches by two technical essences. The first is a provenance man- to lifecycle management of x-LMML files and their asso- agement mechanism that combines the forensic autopsy ciated processes. workflow management system (FAWfMS) and lmmlgit On the other hand, authority management refers to (a version control system for x-LMML files) to allow a access control among users for preventing illegal infor- large amount of provenance information about e-autopsy mation leaks. Authority management is essential to main- reports and their documented autopsy processes to be in- tain the confidentiality and objectivity of collaborative dividually parsed. The second is authority management, data science activities. Our authority management design which ensures the confidentiality of e-autopsy reports is mainly inspired by electronic health record systems by deploying strict syntax-guided workflow controls and (EHRS) [15], where authorized information providers a custom-tailored tool. The paper concludes with direc- can create and manage patients’ health information in a tions for future work in pursuit of a provenance-aware digital format (EHR) such that they can be shared with FORSETI. other authorized providers in more than one healthcare organization. Note that the syntax of the derivation re- lationship is the major difference between our system 2. Related Work and EHRS. In contrast to EHRS used in medical organi- zations, our system needs to satisfy the usage of both This section reviews prior work on provenance systems medical and legal organizations. In addition, numerous and authority management, both of which are vital com- compliance regulations require audit logs for electronic ponents for the core functionalities of the proposed ex- records. The Health Care Portability and Accountabil- tension to the current FORSETI system. ity Act (HIPAA) mandates proper logging of access and Provenance, also known as audit trail, lineage, and change histories for EHR [16]. However, this is still a pedigree, refers to the entire amount of information com- “black box” for e-autopsy reports. Thus, establishing ac- posing all the elements and their relationships that con- countability mechanism for forensic data is necessary to tribute to the existence of a set of data [5]. Recently, ensure the trustworthiness of autopsy reports. systematic execution of tasks such as collecting, manag- To the best of our knowledge, there are few published ing, and analyzing provenance information has received works exploring the potential of data provenance with significant attention in a wide range of application fields authority management for accountability of forensic data. (e.g., bioinformatics, astronomy, ecology, and geology). In addition, the data accountability mechanism can pro- In this context, two basic types of systems are usually con- vide some insights for addressing many big data chal- sidered. One is workflow-based system, generally known lenges related to data quality and privacy. as the scientific workflow management system (SWfMS), which involves the linking of components as a task exe- cution plan in the form of workflows whose computation 3. Problem Statement is abstracted by directed acyclic graphs (DAGs) [6]. For defining task workflows, some SWfMSs, such as VisTrails In this section, we identify three forensic autopsy goals [7], Swift [8], Kepler [9], and Taverna [10] use their own and associated computational tasks in the processing flow scripting languages, whose syntax is restricted to support of e-autopsy reports. the creation of specific types of DAGs. Thus, the SWfMS In our prior research [1], a general workflow for MEs lacks the flexibility provided by general-purpose script- and DRs to perform collaborative autopsy was identi- ing languages. The other is a script-based system, which fied, in which the use of the e-autopsy report is imper- refers to the user’s interaction with the data processing ative. As delineated in Figure 1, for performing PA or components through a sequence of commands entered in VA (A1 or A2, respectively), autopsy reports generated the shell interface to track provenance data. These kinds from MEs’ or DRs’ work are integrated into a decision of systems, such as PASS [11], ES3 [12], noWorkflow report that contains phased conclusions for the step-by- [13], and Lancet [14], provide users with the flexibility to step refinement of autopsy results. For repetitive and search for, derive, store, and share provenance informa- detailed cross-referencing (a structure of A1 with B1 and 4. Provenance-aware FORSETI In this section, we give an overview of the provenance management in FORSETI, with a focus on its two core characteristics: the combination of FAWfMS and lmmlgit and authority management. Figure 1: Operators and tasks in forensic activities. The provenance-aware FORSETI system supports the processing flows of the e-autopsy report in PA, VA, and A2 with B2), DRs’ (MEs’) autopsy reports are viewed as e-court, enabling the computational tasks outlined in references for MEs (DRs) work. In these processes, back- section 3. Figure 2 (a) illustrates the overall picture of and-forth reviewing, verifying, and sharing of forensic the provenance-aware FORSETI, where the input (A), data are accompanied by the routine work of MEs and manipulations (B, C, D), output (E), and data model of DRs. After processing in a forensic hospital, the final au-x-LMML (F1) are existing parts in the current version topsy report is transmitted to JP (C), who extracts parts of FORSETI, while provenance (F2) is the primary com- and modifies the form of the information for use in legal ponent of this work. Fortunately, the original syntax of document generation and trials. The correctness of the x-LMML in FORSETI has been well designed, facilitat- practical workflow relies on the individual correctness of ing the incorporation of data provenance functionalities. all stakeholders (MEs, DRs, and JP). However, involved As shown in F2, a three-dimensional coordinates system stakeholders may act fallaciously in their own interest or is introduced to provide the underlying framework for make inaccurate decisions according to their oversights. the lifecycle management of e-autopsy reports in terms We first identify three forensic goals (G) in terms of au- of “Time evaluation,” “Repository,” and “Computational thoring and reviewing that describe the target problems. forensic ontology.” On the “Time evaluation” axis, each Then, we explicitly state three computational tasks (T) node represents a version of x-LMML files for a different of provenance management that our functional design stakeholder, such as DRs, MEs, judges, jury, or the prose- should address. cution. These x-LMML files are gradually being refined G1—Accountability and interoperability. The au- with stakeholders’ processing, achieving the global tran- topsy report is co-authored by multiple doctors (MEs sition from e-autopsy reports to e-court documents. On and DRs), so each piece of diagnostic information should the “Repository” axis, each node indicates an x-LMML have a descriptive and trusted interpretation to allow for file storing a forensic autopsy case. Note that the third shared use. axis, “Computational forensics ontology” serves as the G2—Reproductivity and traceability. The forensic theoretical basis for support, organization, maintenance, report must be able to be distributed, reused, and retracedspecification, and extension of x-LMML files. by MEs, DRs, and JP. As shown in Figure 2 (b), provenance functionalities G3—Privacy security. There must be a concern for in FORSETI consists of three parts: collection (T1, T2, postmortem privacy in authoring autopsy reports using T3), management (T2, T3), and analysis (T2, T3). In col- a computational environment. lection, the navigation interface and the FOSETI system For addressing these forensic goals, the following threecapture mechanisms collect provenance data in x-LMML computational tasks can be identified. files at different granularities, such as activity duration, T1—Provenance information. The task enables users descriptive insights, and expertise explanation. To man- to reason about, verify and refer to the results; share andage the collected provenance data, a version control sys- reuse the knowledge; and assess data quality and validity. tem is tailored for the lifecycle management of e-autopsy T2—Lifecycle management. It is essential to facilitate reports. In analysis, by comparing the related x-LMML efficient reuse of e-autopsy reports among stakeholders files, users can quickly view the differences among the (MEs, DRs, and JP) by intelligently deriving the version, autopsy results, and then utilize the process provenance content, format, authoring manner, and viewing manner of these results to make a consensus. In the intersection of autopsy reports based on the stakeholders’ duties. of the three circles in Figure 2 (b), the core components T3—Authority management. The access control sys- of three parts are positioned: x-LMML, FAWfMS and tem containing tailored workflows and computational lmmlgit, and authority management. As shown in the tools should be designed for MEs, DRs, and JP to author bottom of Figure 2 (b), FAWfMS is defined under a hier- and reuse the e-autopsy reports. archical structure of workflow management. Note that each of the computational tasks is specified by multiple forensic goals. These forensic goals and com- putational tasks can provide guidance for the design of a 4.1. Combination of FAWfMS and lmmlgit provenance management in FORSETI. In our design, FAWfMS and lmmlgit (T1, T2, T3) inherit the advantages of workflow- and script-based provenance Figure 2: Overall picture of provenance-aware FORSETI system. (a) Functional components of provenance-aware FORSETI system. (b) Abstract structure of provenance functions in FORSETI. management approaches, respectively. These advantages interface for authoring and browsing x-LMML files, as play an important role in the three components of the shown in the top-right corner of Figure 3. FORSETI system. In the following, we explain the roles The lmmlgit is a version control system (VCS) mainly FAWfMS and lmmlgit play in the provenance function- inspired by Git [17], and acts as an expert in processing alities and how they are incorporated. granularity information, privacy security, and data ac- From the lower-left to the top-right corner of Figure 3, countability. All the functionalities in FAWfMS can be the three user interfaces of FAWfMS are shown, nav- carried out by lmmlgit commands, but not vice versa. In igation interface, node editor interface, and FORSETI particular, if a user needs to view a specific target stored interface. The navigation interface is for MEs and DRs in an x-LMML file, it is difficult to use FAWfMS due to a to register the person information, clarify the status of coarser granularity, but lmmlgit can be used to access, the autopsy process, and navigate to their next task. As delete, edit, and check all the targets of an x-LMML file by shown in the left of Figure 3, three branches (PA, Jux- simply typing designated commands into the shell inter- taposition, and VA) with circled numbers are in place. face. Thus, lmmlgit works as the back-end of FAWfMS Each circled number represents a set of x-LMML files for finer handling of process provenance documented in with their major version number. Users (MEs, DRs, and x-LMML files due to its flexibility. JP) can click each circle to invoke the node editor inter- face, where each node graph links an x-LMML file for 4.2. Authority Management a particular author or browser, as shown in the middle To build an effective provenance-aware FORSETI system, of Figure 3. The node is not only able to perform some an authority management (T1, T2, T3) is proposed for basic operations, such as move, add, delete, and modify, e-autopsy confidentiality, in which three works were but also has some special features, such as comparison involved. analysis. The node editor can reveal the pedigree of the The first is strict workflow designs for various stake- x-LMML files and their status, such as derived, merged, holders. As shown in the lower-left corner of Figure 3, locked, in-process, and out-process. By double-clicking four steps for monitoring users’ processing are presented on the selected node, the user can access the FORSETI in the navigation interface. Through these four steps, the users’ personal information, including ID, e-mail, loca- tion, affiliation, and position, is stored and verified for assigning access rights. Then, the node editor allows users to author or browse the e-autopsy reports based on the user’s level of access. The second is the locking tool installed in the node editor for giving the user control over their own node. Other users can view e-autopsy reports only after obtaining permission from authors or administrators. The third component is particularly important: a well-designed access control syntax sup- ports the first two tasks on the back-end. In the future development plan, the syntactic structure for JP is go- Figure 3: FAWfMS, lmmlgit, and authority management in ing to be installed in the authority management to allow provenance-aware FORSETI system. the e-autopsy report to be transformed into an e-court document. 5. Concluding Notes in: Proceedings of the 2016 EuroVis Short Papers, 2016, pp. 31–35. This paper is an initial report on the provenance-aware [5] Y. L. Simmhan, B. Plale, D. Gannon, A survey of FORSETI, with an aim to empower MEs, DRs, and JP data provenance in e-science, SIGMOD Record 34 to accountably author and review e-autopsy reports us- (2005) 31–36. ing a combination of FAWfMS, lmmlgit, and authority [6] J. Cheney, A. Ahmed, U. A. Acar, Provenance as management. dependency analysis, Mathematical Structures in In the future, incorporation of the provenance func- Computer Science 21 (2011) 1301–1337. tionalities into the autopsy juxtaposition methods, as [7] S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, shown in Figure 2 (a) C, should be a priority. Since the C. T. Silva, H. T. Vo, Vistrails: Visualization meets autopsy juxtaposition methods act as a “bridge” for cross- data management, in: Proceedings of the 2006 ACM referencing between MEs and DRs, integrating prove- SIGMOD International Conference on Management nance functionalities with these methods will enable of Data, 2006, pp. 745–747. a more effective manner of referencing. Indeed, the [8] L. M. Gadelha Jr, B. Clifford, M. Mattoso, M. Wilde, provenance-supported corpse model juxtaposition allows I. Foster, Provenance management in Swift, Future MEs to understand the insight processes of VA findings Generation Computer Systems 27 (2011) 775–780. in the augmented reality setting, which leads to trustwor- [9] I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Lu- thy planning of PA. Similar effects can occur in wound dascher, S. Mock, Kepler: An extensible system for photograph juxtaposition and illustrative scene juxtaposi- design and execution of scientific workflows, in: tion. The next issue is to evaluate the provenance-aware Proceedings of the 16th International Conference FORSETI with domain experts using reliable and real on Scientific and Statistical Database Management, datasets, which can empirically prove the effectiveness IEEE, 2004, pp. 423–424. of our system and provide useful feedback for further [10] D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. improvements. The final issue to be confronted is to Pocock, P. Li, T. Oinn, Taverna: A tool for building complete the syntax of access control for authority man- and running workflows of services, Nucleic Acids agement for JP. Research 34 (2006) W729–W732. [11] K.-K. Muniswamy-Reddy, D. A. Holland, U. Braun, M. Seltzer, Provenance-aware storage systems, in: Acknowledgments Proceedings of the Usenix Annual Technical Con- This work has been supported in part by JSPS KAKENHI ference, 2006, pp. 43–56. under the Grants-in-Aid for Scientific Research (A) No. [12] J. Frew, P. Slaughter, Es3: A demonstration of trans- 26240015, 17H00737, and 21H04916. parent provenance for scientific computation, in: Proceedings of the International Provenance and Annotation Workshop, Springer, 2008, pp. 200–207. References [13] L. Murta, V. Braganholo, F. Chirigati, D. Koop, J. Freire, noWorkflow: Capturing and analyzing [1] B. Wang, Y. Asayama, M. O. Boussejra, H. Shojo, provenance of scripts, in: Proceedings of the In- N. Adachi, I. Fujishiro, FORSETI: A visual analy- ternational Provenance and Annotation Workshop, sis environment for authoring autopsy reports in Springer, 2014, pp. 71–83. extended legal medicine mark-up language, The [14] J.-L. R. Stevens, M. Elver, J. A. Bednar, An automated Visual Computer 37 (2021) 2951–2963. and reproducible workflow for running and analyz- [2] Y. Asayama, B. Wang, M. Nakayama, H. Shohjoh, ing neural simulations using Lancet and IPython N. Adachi, Y. Kiyoki, I. Fujishiro, THEMIS: Context- Notebook, Frontiers in Neuroinformatics 7 (2013) sensitive similarity analysis for wound imagery 44. using mathematical model of meaning, in: Pro- [15] O. Can, D. Yilmazer, Improving privacy in health ceedings of the 2021 International Conference on care with an ontology-based provenance manage- Cyberworlds, IEEE, 2021, pp. 129–132. ment system, Expert Systems 37 (2020) e12427. [3] C. Lundström, T. Rydell, C. Forsell, A. Persson, [16] R. Nosowsky, T. J. Giordano, The health insurance A. Ynnerman, Multi-touch table system for medical portability and accountability act of 1996 privacy visualization: Application to orthopedic surgery rule: Implications for clinical research, Annual planning, IEEE Transactions on Visualization and Review of Medicine 57 (2006) 575–590. Computer Graphics 17 (2011) 1775–1784. [17] J. Loeliger, M. McCullough, Version Control with [4] M. O. Boussejra, N. Adachi, H. Shojo, R. Takahashi, Git: Powerful tools and techniques for collaborative I. Fujishiro, LMML: Initial developments of an inte- software development, O’Reilly Media, Inc., 2012. grated environment for forensic data visualization,