INTRODUCTION

Errors and Artefacts in Histopathological Imaging

Antony Galton

apgalton@ex.ac.uk 0

Shereen Fouad

Gabriel Landini

David Randell

1 0 Department of Computer Science, University of Exeter , Exeter , UK 1 School of Dentistry, Institute of Clinical Sciences, University of Birmingham , UK

Fig. 1. The Histological Imaging Pipeline segmentation

INTRODUCTION

Preparation of histological samples for digital imaging, followed by image formation and capture, forms only the start of an extended pipeline running from biopsy to diagnosis or analysis. Artefacts arising at these early stages form the best documented part of a heterogeneous catalogue of things that can go wrong in the course of following the pipeline. The literature on this subject mainly covers problems arising at the stage of specimen preparation and how these affect what is seen by the microscopist. With the advent of digital microsopy and telepathology, however, new kinds of digitisation artefacts or imaging errors can arise during slide image capture, and further types of error emerge when we process, interpret, and make inferences from the digital images. In this work we provide a classification and explanation of such phenomena, and how, where, and why they arise in the imaging pipeline, so that they can at least be mitigated at the point at which they are generated.

Human patient slicing, mounting, staining image formation and capture Microscope slide

Digital image (pixel array) treatment

Diagnosis quantitation, evaluation

Histological model (image segments interpreted as depicting entities in tissue sample) interpretation and label ing Segmented image (potential y meaningful sets of pixels)

The histological imaging pipeline (Figure 1) comprises a sequence of stages leading from extraction of biological tissue from an organism to yield a tissue sample which is then prepared for imaging and segmentation, to the application of histological theory to interpret the segmented regions as depicting actual histological entities present in the original sample. This interpretation and labelling results in a histological model for the sample; only on the basis of such a model can diagnostic inferences be made, leading to the possibility of selecting the most appropriate treatment. THE SYSTEM OF ONTOLOGICAL LEVELS

In order to classify the different types of error or artefact in the

histological imaging pipeline, we adopt the ontological framework used in (Galton et al. 2016) , according to which each stage of the pipeline is characterised by an ontologically distinct assemblage of entities that are handled at that stage. We refer to these assemblages as levels; they form a series as shown in Figure 2. The inhabitants of level 3 are image segments labelled with level 0 category names; as such, level 3 models are quite distinct from the level 0 entities they represent. Such models always represent simplified versions of reality, being two-dimensional representations of three-dimensional realities, capturing only a tiny fraction of the information that could potentially be extracted from those realities by a hypothetical “omniscient” histologist.

Level 0 comprises physical entities (real biological material), whereas levels 1 upwards comprise information entities, abstract patterns that may be instantiated in physical bearers (e.g., computer memory, screen display, hard-copy printout). We distinguish digital artefacts, arising from errors in the production of an information entity (at levels 1 and above), from non-digital artefacts, arising from errors occurring wholly within level 0.

Level 3: Histological Models

Labelled image segments interpretable as model cells, model nuclei, etc

Level 2: Segmented Images

Image segments as candidate cells, candidate nuclei, etc

Level 1: Captured Images

Pixel arrays

Level 0:Biological Reality

Tissues, cells, nuclei, etc

Biopsy samples, histological preparations, etc LEVEL-DEPENDENT ERROR-TYPES

Here we present a brief overview of the kinds of errors that are encountered during the transition from each level to the next.

Level 0 to level 0.5. Errors here include tissue sampling errors, arising from the process of extracting tissue samples from organisms (e.g., destruction or degradation of samples, incorrectly targeted sampling, crush, splits, fragmentation, haemorrage; tears and missing parts, scratches from a damaged microtome blade, tissue sections too thick, ill-chosen cut direction (Fig. 3a)); and tissue preparation errors, occurring during slicing, staining, and mounting (e.g., fixation failure, tissue shrinkage, folds (Fig.

3b), contamination with foreign matter or air bubbles, over- or

understaining, faded stain; lack of stoichiometry of certain dyes; immunohistochemistry-related issues such as background staining and antibody cross reactivity; misplaced tissue micro-array cores).

Level 0.5 to level 1. These are imaging errors, relating to image

formation in the imaging device (e.g., a microscope), or imagecapture in the capture device (e.g., a camera). In each case we distinguish device errors and deployment errors:

Image formation Image capture Device errors Chromatic aberration (Fig. 3d) Spatial distortion Bayer mask errors “Hot” and “dead” pixels (Fig. 3e) Deployment errors Coverslip scratches Uneven background illumination (Fig.3c) Thermal noise Interference and banding

Level 1 to level 2. These are image-processing errors that occur during the process of manipulating the initially captured image in order to enable discovery of relevant information from it.

Segmentation picks out some distinguished subset of pixels in the

image and treats each of its connected components (segments) as an “object”. The goal is to find segments which depict level 0 entities.

Errors occur when the technique used leads to segmented images

that fail to correspond to reality. These include oversegmentation, where disconnected image segments derive from a single connected object in reality, and undersegmentation, where one segment represents a group of distinct objects in reality (Fig. 3f). Level 2 to level 3. These are interpretation errors, involving incorrect labelling of level 2 entities by level 0 categories.

Level 3 entities are histological models, represented as image segments labelled by histological categories in conformity with theoretical expectations (e.g., nuclei should be proper parts of their cell bodies). Often the segmentation must be manipulated before category labels can be conformably assigned; such resegmentation operations (Randell et al. 2013) may introduce other errors if not deployed carefully. Uncorrected errors from any earlier stage in the pipeline may result in histological models which, though theoretically acceptable, do not correspond to reality.

Mitigation of interpretation errors depends on tests based on prior theoretical understanding of the target entities, e.g., typical

shape and size range of nuclei. Some tests can be embedded in the segmentation process itself, resulting in level 2 entities already more nearly conformable with level 3 (Landini et al. 2016) .

Beyond level 3. This is a miscellaneous collection of histological inference errors, leading to a faulty diagnosis. These can arise from faulty or incorrectly-used software systems (e.g., for computeraided diagnosis) used in digitised image analysis. Errors may occur at any stage in the software development, from design, through implementation and testing, up to final deployment. Systematic consideration of such errors is relatively new to the histological imaging community, but in view of recent advances in the field it is important to recognise them as a significant class.

In pattern recognition algorithms, for example, histological

images are represented by vector quantisation, where each object in the segmented image is characterized by a set of features. A variety of errors can arise from inappropriate choice of feature set.

In general these high-level errors arise if too much trust is placed in necessarily imperfect software; it should not be used “blindly” but under the scrutiny of a trained pathologist whose judgment can supplement or correct an otherwise highly automated process. ACKNOWLEDGEMENTS This work is supported by EPSRC grant EP/M023869/1 “Novel context-based segmentation algorithms for intelligent microscopy”.

Galton , A. , Landini , G. , Randell , D. & Fouad , S. ( 2016 ), Ontological levels in histological imaging , in R. Ferrario & W. Kuhn, eds, 'Formal Ontology in Information Systems', IOS Press, pp. 271 - 284 .

Landini , G. , Randell , D. , Fouad , S. & Galton , A. ( 2016 ), ' Automatic thresholding from the gradients of region boundaries' , Journal of Microscopy.

Randell , D. A. , Landini , G. & Galton , A. ( 2013 ), ' Discrete mereotopology for spatial reasoning in automated histological image analysis' , IEEE Transactions on Pattern Analysis and Machine Intelligence 35 ( 3 ), 568 - 581 .