=Paper= {{Paper |id=Vol-3135/bigvis_short1 |storemode=property |title=FORSETI: A Provenance-aware Visual Analysis Environment for the Lifecycle Management of E-autopsy Reports |pdfUrl=https://ceur-ws.org/Vol-3135/bigvis_short1.pdf |volume=Vol-3135 |authors=Baoqing Wang,Noboru Adachi,Issei Fujishiro |dblpUrl=https://dblp.org/rec/conf/edbt/WangAF22 }} ==FORSETI: A Provenance-aware Visual Analysis Environment for the Lifecycle Management of E-autopsy Reports== https://ceur-ws.org/Vol-3135/bigvis_short1.pdf
FORSETI: A Provenance-aware Visual Analysis Environment
for the Lifecycle Management of E-autopsy Reports
Baoqing Wang1 , Noboru Adachi2 and Issei Fujishiro1
1
    Keio University, Graduate School of Science and Technology, Yokohama, Kanawaga 223-8522, Japan
2
    University of Yamanashi, Graduate School of Medical Science, Chuo, Yamanashi 409-3898, Japan


                                             Abstract
                                             Autopsy reports are imperative for both medical and legal science. Medical examiners (MEs) and diagnostic radiologists (DRs)
                                             cross-reference autopsy findings, while judicial personnel derive legal documents. In a prior study, we proposed a visual
                                             analysis system named FORSETI (forensic autopsy system for e-court instruments) with x-LMML (extended legal medicine
                                             markup language) for MEs and DRs to author and review e-autopsy reports. In this paper, we outline our extended work in
                                             progress to introduce a provenance infrastructure for forensic data accountability to FORSETI, which can be characterized by
                                             two technical essences. The first is a provenance management mechanism that combines the forensic autopsy workflow
                                             management system (FAWfMS) and lmmlgit (a version control system for x-LMML files), allowing a large amount of
                                             provenance information about e-autopsy reports and their documented autopsy processes to be individually parsed. The
                                             second is authority management, which ensures the confidentiality of e-autopsy reports by deploying strict syntax-guided
                                             workflow controls and a custom-tailored tool.

                                             Keywords
                                             Computational forensics, Legal medicine, Accountability, Provenance, Authority



1. Introduction                                                                                                       (VAs) to record autopsy results and to cross-reference
                                                                                                                      PAs and VAs. Generally, in the refinement of PA (or
In forensic science, the generation and utilization of                                                                VA) results, MEs (or DRs) with different experiences
forensic autopsy reports are intrinsically a collaborative                                                            perform back-and-forth analyses of forensic data, while
data science activity. Usually, forensic autopsy reports                                                              they expend substantial efforts recording provenance
are generated by medical examiners (MEs) collaboratively                                                              information. This manual collection of provenance is
working with diagnostic radiologists (DRs); the reports                                                               time-consuming, laborious, and error-prone. In cross-
then serve as underlying legal documents for MEs and                                                                  referencing, MEs and DRs may make biased or inaccurate
DRs as well as for judicial personnel (JP). In these pro-                                                             autopsy decisions. This is because the clues of autopsy
cesses, a large amount of forensic data needs to be col-                                                              insights, which serve as interpretative provenance infor-
lected, visualized, analyzed, and annotated. Thus, much                                                               mation inspired by MEs and DRs experiences, are not
work has been done to develop computational tools and                                                                 well provided within the autopsy report. Thus, for MEs
techniques for processing forensic data, including foren-                                                             and DRs to effectively share knowledge and insights, the
sic autopsy assistance systems [1, 2], virtual autopsy plat-                                                          development of applications supporting the systematic
forms [3], and languages [4, 1]. However, the use of                                                                  management and analysis of provenance is necessary. In
computational environments for forensic data has raised                                                               addition, JP finds existing autopsy reports cumbersome
some critical issues—particularly, how autopsy insights                                                               because of the deficiencies of non-derivability.
and results are obtained from forensic data, how the con-                                                                Multiple stakeholders (MEs, DRs, and JP) are involved
fidentiality of forensic data is handled, and how to ensure                                                           in complicated pipelines for handling autopsy reports,
the trustworthiness of the autopsy results. We elaborate                                                              where ethics and policies are commonly respected to
on these concerns in the following.                                                                                   protect postmortem privacy. These ideological and legal
   Forensic autopsy reports are commonly generated and                                                                constraints are not sufficient for maintaining forensic in-
used in physical autopsies (PAs) and virtual autopsies                                                                formation security. Clearly, computational tools and sys-
                                                                                                                      tem mechanisms ensuring the confidentiality of autopsy
Published in the Workshop Proceedings of the EDBT/ICDT 2022 Joint                                                     reports are needed. The verifiability and confidentiality
Conference (March 29-April 1, 2022), Edinburgh, UK                                                                    of data provenance in forensic autopsy workflows are
$ wangbaoqing@keio.jp (B. Wang); fuji@ics.keio.ac.jp
                                                                                                                      crucial for establishing data accountability, with which e-
(I. Fujishiro)
€ https://fj.ics.keio.ac.jp/en/member/baoqing-wang (B. Wang);                                                         autopsy can ensure that data contributors are committed
https://fj.ics.keio.ac.jp/en/member/issei-fujishiro (I. Fujishiro)                                                    to the truthfulness of the data.
 0000-0002-6184-4245 (B. Wang); 0000-0002-8898-730X                                                                     In our prior research [1], we introduced a visual anal-
(I. Fujishiro)                                                                                                        ysis system called FORSETI (forensic autopsy system for
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).                     e-court instruments) with x-LMML (extended version of
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
legal medicine mark-up language). The proposed pro-            tion via designated commands. However, because these
totype assists MEs and DRs in authoring and browsing           kinds of systems ignore the structure of the script, the
e-autopsy reports using x-LMML, but the research was           user may find it difficult to link the provenance they have
not targeted at the lifecycle management of e-autopsy          collected to the steps in the script.
reports in terms of data provenance.                              Unfortunately, the FORSETI prototype [1] does not
   In this paper, we outline our work in progress that ad-     support provenance functionalities. We therefore extend
dresses the aforementioned issues for establishing data        our original research to introduce provenance aware-
accountability by extending the prior research to design a     ness to the FORSETI by taking and using the best of the
provenance-aware FORSETI, which can be characterized           workflow-based and script-based provenance approaches
by two technical essences. The first is a provenance man-      to lifecycle management of x-LMML files and their asso-
agement mechanism that combines the forensic autopsy           ciated processes.
workflow management system (FAWfMS) and lmmlgit                   On the other hand, authority management refers to
(a version control system for x-LMML files) to allow a         access control among users for preventing illegal infor-
large amount of provenance information about e-autopsy         mation leaks. Authority management is essential to main-
reports and their documented autopsy processes to be in-       tain the confidentiality and objectivity of collaborative
dividually parsed. The second is authority management,         data science activities. Our authority management design
which ensures the confidentiality of e-autopsy reports         is mainly inspired by electronic health record systems
by deploying strict syntax-guided workflow controls and        (EHRS) [15], where authorized information providers
a custom-tailored tool. The paper concludes with direc-        can create and manage patients’ health information in a
tions for future work in pursuit of a provenance-aware         digital format (EHR) such that they can be shared with
FORSETI.                                                       other authorized providers in more than one healthcare
                                                               organization. Note that the syntax of the derivation re-
                                                               lationship is the major difference between our system
2. Related Work                                                and EHRS. In contrast to EHRS used in medical organi-
                                                               zations, our system needs to satisfy the usage of both
This section reviews prior work on provenance systems
                                                               medical and legal organizations. In addition, numerous
and authority management, both of which are vital com-
                                                               compliance regulations require audit logs for electronic
ponents for the core functionalities of the proposed ex-
                                                               records. The Health Care Portability and Accountabil-
tension to the current FORSETI system.
                                                               ity Act (HIPAA) mandates proper logging of access and
   Provenance, also known as audit trail, lineage, and
                                                               change histories for EHR [16]. However, this is still a
pedigree, refers to the entire amount of information com-
                                                               “black box” for e-autopsy reports. Thus, establishing ac-
posing all the elements and their relationships that con-
                                                               countability mechanism for forensic data is necessary to
tribute to the existence of a set of data [5]. Recently,
                                                               ensure the trustworthiness of autopsy reports.
systematic execution of tasks such as collecting, manag-
                                                                  To the best of our knowledge, there are few published
ing, and analyzing provenance information has received
                                                               works exploring the potential of data provenance with
significant attention in a wide range of application fields
                                                               authority management for accountability of forensic data.
(e.g., bioinformatics, astronomy, ecology, and geology).
                                                               In addition, the data accountability mechanism can pro-
In this context, two basic types of systems are usually con-
                                                               vide some insights for addressing many big data chal-
sidered. One is workflow-based system, generally known
                                                               lenges related to data quality and privacy.
as the scientific workflow management system (SWfMS),
which involves the linking of components as a task exe-
cution plan in the form of workflows whose computation         3. Problem Statement
is abstracted by directed acyclic graphs (DAGs) [6]. For
defining task workflows, some SWfMSs, such as VisTrails        In this section, we identify three forensic autopsy goals
[7], Swift [8], Kepler [9], and Taverna [10] use their own     and associated computational tasks in the processing flow
scripting languages, whose syntax is restricted to support     of e-autopsy reports.
the creation of specific types of DAGs. Thus, the SWfMS           In our prior research [1], a general workflow for MEs
lacks the flexibility provided by general-purpose script-      and DRs to perform collaborative autopsy was identi-
ing languages. The other is a script-based system, which       fied, in which the use of the e-autopsy report is imper-
refers to the user’s interaction with the data processing      ative. As delineated in Figure 1, for performing PA or
components through a sequence of commands entered in           VA (A1 or A2, respectively), autopsy reports generated
the shell interface to track provenance data. These kinds      from MEs’ or DRs’ work are integrated into a decision
of systems, such as PASS [11], ES3 [12], noWorkflow            report that contains phased conclusions for the step-by-
[13], and Lancet [14], provide users with the flexibility to   step refinement of autopsy results. For repetitive and
search for, derive, store, and share provenance informa-       detailed cross-referencing (a structure of A1 with B1 and
                                                           4. Provenance-aware FORSETI
                                                           In this section, we give an overview of the provenance
                                                           management in FORSETI, with a focus on its two core
                                                           characteristics: the combination of FAWfMS and lmmlgit
                                                           and authority management.
Figure 1: Operators and tasks in forensic activities.         The provenance-aware FORSETI system supports the
                                                           processing flows of the e-autopsy report in PA, VA, and
A2 with B2), DRs’ (MEs’) autopsy reports are viewed as     e-court, enabling the computational tasks outlined in
references for MEs (DRs) work. In these processes, back-   section 3. Figure 2 (a) illustrates the overall picture of
and-forth reviewing, verifying, and sharing of forensic    the provenance-aware FORSETI, where the input (A),
data are accompanied by the routine work of MEs and        manipulations (B, C, D), output (E), and data model of
DRs. After processing in a forensic hospital, the final au-x-LMML (F1) are existing parts in the current version
topsy report is transmitted to JP (C), who extracts parts  of FORSETI, while provenance (F2) is the primary com-
and modifies the form of the information for use in legal  ponent of this work. Fortunately, the original syntax of
document generation and trials. The correctness of the     x-LMML in FORSETI has been well designed, facilitat-
practical workflow relies on the individual correctness of ing the incorporation of data provenance functionalities.
all stakeholders (MEs, DRs, and JP). However, involved     As shown in F2, a three-dimensional coordinates system
stakeholders may act fallaciously in their own interest or is introduced to provide the underlying framework for
make inaccurate decisions according to their oversights.   the lifecycle management of e-autopsy reports in terms
   We first identify three forensic goals (G) in terms of au-
                                                           of “Time evaluation,” “Repository,” and “Computational
thoring and reviewing that describe the target problems.   forensic ontology.” On the “Time evaluation” axis, each
Then, we explicitly state three computational tasks (T)    node represents a version of x-LMML files for a different
of provenance management that our functional design        stakeholder, such as DRs, MEs, judges, jury, or the prose-
should address.                                            cution. These x-LMML files are gradually being refined
G1—Accountability and interoperability. The au-            with stakeholders’ processing, achieving the global tran-
topsy report is co-authored by multiple doctors (MEs       sition from e-autopsy reports to e-court documents. On
and DRs), so each piece of diagnostic information should   the “Repository” axis, each node indicates an x-LMML
have a descriptive and trusted interpretation to allow for file storing a forensic autopsy case. Note that the third
shared use.                                                axis, “Computational forensics ontology” serves as the
G2—Reproductivity and traceability. The forensic           theoretical basis for support, organization, maintenance,
report must be able to be distributed, reused, and retracedspecification, and extension of x-LMML files.
by MEs, DRs, and JP.                                          As shown in Figure 2 (b), provenance functionalities
G3—Privacy security. There must be a concern for           in FORSETI consists of three parts: collection (T1, T2,
postmortem privacy in authoring autopsy reports using      T3), management (T2, T3), and analysis (T2, T3). In col-
a computational environment.                               lection, the navigation interface and the FOSETI system
   For addressing these forensic goals, the following threecapture mechanisms collect provenance data in x-LMML
computational tasks can be identified.                     files at different granularities, such as activity duration,
T1—Provenance information. The task enables users          descriptive insights, and expertise explanation. To man-
to reason about, verify and refer to the results; share andage the collected provenance data, a version control sys-
reuse the knowledge; and assess data quality and validity. tem is tailored for the lifecycle management of e-autopsy
T2—Lifecycle management. It is essential to facilitate     reports. In analysis, by comparing the related x-LMML
efficient reuse of e-autopsy reports among stakeholders    files, users can quickly view the differences among the
(MEs, DRs, and JP) by intelligently deriving the version,  autopsy results, and then utilize the process provenance
content, format, authoring manner, and viewing manner      of these results to make a consensus. In the intersection
of autopsy reports based on the stakeholders’ duties.      of the three circles in Figure 2 (b), the core components
T3—Authority management. The access control sys-           of three parts are positioned: x-LMML, FAWfMS and
tem containing tailored workflows and computational        lmmlgit, and authority management. As shown in the
tools should be designed for MEs, DRs, and JP to author    bottom of Figure 2 (b), FAWfMS is defined under a hier-
and reuse the e-autopsy reports.                           archical structure of workflow management.
   Note that each of the computational tasks is specified
by multiple forensic goals. These forensic goals and com-
putational tasks can provide guidance for the design of a 4.1. Combination of FAWfMS and lmmlgit
provenance management in FORSETI.                             In our design, FAWfMS and lmmlgit (T1, T2, T3) inherit
                                                              the advantages of workflow- and script-based provenance
Figure 2: Overall picture of provenance-aware FORSETI system. (a) Functional components of provenance-aware FORSETI
system. (b) Abstract structure of provenance functions in FORSETI.
management approaches, respectively. These advantages         interface for authoring and browsing x-LMML files, as
play an important role in the three components of the         shown in the top-right corner of Figure 3.
FORSETI system. In the following, we explain the roles           The lmmlgit is a version control system (VCS) mainly
FAWfMS and lmmlgit play in the provenance function-           inspired by Git [17], and acts as an expert in processing
alities and how they are incorporated.                        granularity information, privacy security, and data ac-
   From the lower-left to the top-right corner of Figure 3,   countability. All the functionalities in FAWfMS can be
the three user interfaces of FAWfMS are shown, nav-           carried out by lmmlgit commands, but not vice versa. In
igation interface, node editor interface, and FORSETI         particular, if a user needs to view a specific target stored
interface. The navigation interface is for MEs and DRs        in an x-LMML file, it is difficult to use FAWfMS due to a
to register the person information, clarify the status of     coarser granularity, but lmmlgit can be used to access,
the autopsy process, and navigate to their next task. As      delete, edit, and check all the targets of an x-LMML file by
shown in the left of Figure 3, three branches (PA, Jux-       simply typing designated commands into the shell inter-
taposition, and VA) with circled numbers are in place.        face. Thus, lmmlgit works as the back-end of FAWfMS
Each circled number represents a set of x-LMML files          for finer handling of process provenance documented in
with their major version number. Users (MEs, DRs, and         x-LMML files due to its flexibility.
JP) can click each circle to invoke the node editor inter-
face, where each node graph links an x-LMML file for          4.2. Authority Management
a particular author or browser, as shown in the middle        To build an effective provenance-aware FORSETI system,
of Figure 3. The node is not only able to perform some        an authority management (T1, T2, T3) is proposed for
basic operations, such as move, add, delete, and modify,      e-autopsy confidentiality, in which three works were
but also has some special features, such as comparison        involved.
analysis. The node editor can reveal the pedigree of the         The first is strict workflow designs for various stake-
x-LMML files and their status, such as derived, merged,       holders. As shown in the lower-left corner of Figure 3,
locked, in-process, and out-process. By double-clicking       four steps for monitoring users’ processing are presented
on the selected node, the user can access the FORSETI         in the navigation interface. Through these four steps, the
                                                              users’ personal information, including ID, e-mail, loca-
                                                              tion, affiliation, and position, is stored and verified for
                                                              assigning access rights. Then, the node editor allows
                                                              users to author or browse the e-autopsy reports based on
                                                              the user’s level of access. The second is the locking tool
                                                              installed in the node editor for giving the user control
                                                              over their own node. Other users can view e-autopsy
                                                              reports only after obtaining permission from authors
                                                              or administrators. The third component is particularly
                                                              important: a well-designed access control syntax sup-
                                                              ports the first two tasks on the back-end. In the future
                                                              development plan, the syntactic structure for JP is go-
Figure 3: FAWfMS, lmmlgit, and authority management in        ing to be installed in the authority management to allow
provenance-aware FORSETI system.                              the e-autopsy report to be transformed into an e-court
                                                              document.
5. Concluding Notes                                               in: Proceedings of the 2016 EuroVis Short Papers,
                                                                  2016, pp. 31–35.
This paper is an initial report on the provenance-aware       [5] Y. L. Simmhan, B. Plale, D. Gannon, A survey of
FORSETI, with an aim to empower MEs, DRs, and JP                  data provenance in e-science, SIGMOD Record 34
to accountably author and review e-autopsy reports us-            (2005) 31–36.
ing a combination of FAWfMS, lmmlgit, and authority           [6] J. Cheney, A. Ahmed, U. A. Acar, Provenance as
management.                                                       dependency analysis, Mathematical Structures in
   In the future, incorporation of the provenance func-           Computer Science 21 (2011) 1301–1337.
tionalities into the autopsy juxtaposition methods, as        [7] S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger,
shown in Figure 2 (a) C, should be a priority. Since the          C. T. Silva, H. T. Vo, Vistrails: Visualization meets
autopsy juxtaposition methods act as a “bridge” for cross-        data management, in: Proceedings of the 2006 ACM
referencing between MEs and DRs, integrating prove-               SIGMOD International Conference on Management
nance functionalities with these methods will enable              of Data, 2006, pp. 745–747.
a more effective manner of referencing. Indeed, the           [8] L. M. Gadelha Jr, B. Clifford, M. Mattoso, M. Wilde,
provenance-supported corpse model juxtaposition allows            I. Foster, Provenance management in Swift, Future
MEs to understand the insight processes of VA findings            Generation Computer Systems 27 (2011) 775–780.
in the augmented reality setting, which leads to trustwor-    [9] I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Lu-
thy planning of PA. Similar effects can occur in wound            dascher, S. Mock, Kepler: An extensible system for
photograph juxtaposition and illustrative scene juxtaposi-        design and execution of scientific workflows, in:
tion. The next issue is to evaluate the provenance-aware          Proceedings of the 16th International Conference
FORSETI with domain experts using reliable and real               on Scientific and Statistical Database Management,
datasets, which can empirically prove the effectiveness           IEEE, 2004, pp. 423–424.
of our system and provide useful feedback for further        [10] D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R.
improvements. The final issue to be confronted is to              Pocock, P. Li, T. Oinn, Taverna: A tool for building
complete the syntax of access control for authority man-          and running workflows of services, Nucleic Acids
agement for JP.                                                   Research 34 (2006) W729–W732.
                                                             [11] K.-K. Muniswamy-Reddy, D. A. Holland, U. Braun,
                                                                  M. Seltzer, Provenance-aware storage systems, in:
Acknowledgments                                                   Proceedings of the Usenix Annual Technical Con-
This work has been supported in part by JSPS KAKENHI              ference, 2006, pp. 43–56.
under the Grants-in-Aid for Scientific Research (A) No.      [12] J. Frew, P. Slaughter, Es3: A demonstration of trans-
26240015, 17H00737, and 21H04916.                                 parent provenance for scientific computation, in:
                                                                  Proceedings of the International Provenance and
                                                                  Annotation Workshop, Springer, 2008, pp. 200–207.
References                                                   [13] L. Murta, V. Braganholo, F. Chirigati, D. Koop,
                                                                  J. Freire, noWorkflow: Capturing and analyzing
 [1] B. Wang, Y. Asayama, M. O. Boussejra, H. Shojo,              provenance of scripts, in: Proceedings of the In-
     N. Adachi, I. Fujishiro, FORSETI: A visual analy-            ternational Provenance and Annotation Workshop,
     sis environment for authoring autopsy reports in             Springer, 2014, pp. 71–83.
     extended legal medicine mark-up language, The           [14] J.-L. R. Stevens, M. Elver, J. A. Bednar, An automated
     Visual Computer 37 (2021) 2951–2963.                         and reproducible workflow for running and analyz-
 [2] Y. Asayama, B. Wang, M. Nakayama, H. Shohjoh,                ing neural simulations using Lancet and IPython
     N. Adachi, Y. Kiyoki, I. Fujishiro, THEMIS: Context-         Notebook, Frontiers in Neuroinformatics 7 (2013)
     sensitive similarity analysis for wound imagery              44.
     using mathematical model of meaning, in: Pro-           [15] O. Can, D. Yilmazer, Improving privacy in health
     ceedings of the 2021 International Conference on             care with an ontology-based provenance manage-
     Cyberworlds, IEEE, 2021, pp. 129–132.                        ment system, Expert Systems 37 (2020) e12427.
 [3] C. Lundström, T. Rydell, C. Forsell, A. Persson,        [16] R. Nosowsky, T. J. Giordano, The health insurance
     A. Ynnerman, Multi-touch table system for medical            portability and accountability act of 1996 privacy
     visualization: Application to orthopedic surgery             rule: Implications for clinical research, Annual
     planning, IEEE Transactions on Visualization and             Review of Medicine 57 (2006) 575–590.
     Computer Graphics 17 (2011) 1775–1784.                  [17] J. Loeliger, M. McCullough, Version Control with
 [4] M. O. Boussejra, N. Adachi, H. Shojo, R. Takahashi,          Git: Powerful tools and techniques for collaborative
     I. Fujishiro, LMML: Initial developments of an inte-         software development, O’Reilly Media, Inc., 2012.
     grated environment for forensic data visualization,