<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Testing Computer Vision Applications An Experience Report on Introducing Code Coverage Analysis in the Field</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Iulia Nica</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franz Wotawa</string-name>
          <email>wotawag@ist.tugraz.at</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kathrin Juhart</string-name>
          <email>Kathrin.Juhartg@joanneum.at</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>In this paper we present our work in progress in defining a suitable testing and validation methodology to be used within computer vision (CV) projects. Typical quality assurance (QA) measures, targeting the applicability in real-world scenarios, are meant here to complement the research on specific computer vision methods. While inspecting the existing literature in the domain of CV performance evaluation, we first identified the main challenges the CV researchers have to deal with. Second, as every vision algorithm eventually takes the form of a software program, we followed the classic software development process and performed an in depth code coverage analysis in order to assure the quality of our test suites and pinpoint code areas that need to be reviewed. This further leaves us with the questions of which test coverage tool to prefer in our situation and whether we can introduce some specific evaluation criteria for identifying the right tool to be used within a CV project. In this article we also contribute to answering these questions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Computer vision (CV) is used today in a wide range of real-world
applications, from industrial inspection and safety relevant vehicle
functions to 3D model generation by photogrammetric methods,
medical imaging and fingerprint recognition. Although a vast
variety of literature covering evaluation techniques in subfields of the
whole topic is available, still no study reports on testing a complete
vision system, i.e., comprising hardware, software, data
communication and control. Obviously the high quality of CV applications has a
great impact on their usability in real world scenarios. Hence beside
traditional CV evaluation techniques such as using test data sets as
input and comparing the algorithms output against a manually
established ground truth - we have to control the quality of the involved
applications by means of applying a more generic evaluation strategy.
In this context, quality assurance (QA) activities like peer reviews,
coding guidelines, or the usage of software quality tools (static and
dynamic analyzers) offer many benefits, from being able to track the
CV projects progress and estimate its relative complexity to helping
us realize when we have achieved the desired state of quality [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Still, what is different about testing CV applications and why is it
so difficult to test whether computer vision algorithms can live up to
their claims?</p>
      <p>Regarding algorithmic correctness on one side, it is often very hard
to get a consistent and exact definition of the desired output for a
specific input. Especially in classification tasks, it is tough to decide,
when the obtained results are still correct and when we are dealing
with an abnormal behavior.</p>
      <p>Regarding the evaluation of the complete, often very complex
vision system on the other side, the QA team has to manage and run a
high amount of tests on all levels - from unit tests, to integration,
function and system tests. Therefore one needs to understand the
system as a whole, as well as all of its components and their
interdependencies. Furthermore we have to cover also possible
hardware faults when identifying use cases, based on the defined system
requirements and specifications. Fortunately, today there are
wellestablished QA practices and many quality management tools
available on the market, meant to ease the generic evaluation of products
and processes, that the only challenge is to find a proper manner to
integrate them in the vision project.</p>
      <p>The remainder of this paper is organized as follows. In Section 2
we review the existing literature in the domain of CV performance
evaluation and introduce some basic quality assurance terms.
Afterwards, in Section 3, we identify and discuss the requirements a code
coverage tool has to fulfill in order to be used in the CV domain.
Further on, we give a short overview of our four best ranked tools.
In Section 4 we first introduce the case study and compare the tools
based on their integration with the example application. Additionally,
we present the first success story in improving our code coverage.
With Section 5 we conclude this paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Research</title>
      <p>
        In our work, we have first inspected the existing literature in the
domain of performance evaluation in computer vision. General
overviews of empirical evaluations were found in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and will be further presented here in a chronological order. They
all review the commonly used techniques for performance
characterization of algorithms in different subfields of CV.
      </p>
      <p>
        In the early 90s, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] was discussing the evident lack of
performance evaluation in the literature on vision algorithms. In the
author’s opinion, this situation has been tolerated because the ability to
perform a CV task was interesting enough, so that the performance
of the new algorithm became a secondary issue. In order to quickly
design a machine vision system, which works efficiently and meets
requirements, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] suggests an analogy with a system’s engineering
methodology. Thus, a well-defined protocol containing a modeling
component, an experimental component and a data analysis
component was envisioned. The modeling component would describe the
ideal input image population (real or synthetic images), the random
perturbation model (by which non-ideal images arise), the random
perturbation process (that characterizes the output random
perturbation as function of input random perturbation) and the criterion
function (by which one can quantify the difference between the ideal
output and the computed output). The experimental component
describes the performed experiments, whilst the data analysis
determines the performance characterization based on the experimentally
observed data.
      </p>
      <p>
        In the absence of acknowledged methods for the evaluation of
algorithmic performance, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] proposed the definition of performance
as function of mathematical sophistication. However, as the number
and specificity of assumptions made in the mathematics underlying a
vision algorithm increase (i.e., the sophistication of an algorithm
increases), the performance of the CV application not necessarily does.
This is the case when the assumptions made do not match the
application characteristics. Furthermore, the need of standard databases,
evaluation protocols and scoring methods/performance metrics
available to researchers was identified by the authors.
      </p>
      <p>
        Regarding the typology of test data, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] differentiate first between
data without noise and data with noise. Moreover, they mention three
types of empirical testing: testing using real data with full control,
empirical testing with partially controlled test data and testing in an
uncontrolled environment. Depending on the distribution of the
available data into training and testing sets, test protocols have been
proposed. Another discussed issue in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is again the necessity to define
a metric, which can be used to quantify performance. The authors
associate such performance metrics with the failure modes of an
algorithm. For each type of vision algorithm, specific evaluation
metrics were defined according to the function performed by the given
algorithm. Some examples are the ROC (Receiver Operating
Characteristic) curve in case of a feature detector, the confusion matrix in
case of object recognition, or the true and false matches when
dealing with matching algorithms, such as those used in stereo or motion
estimation.
      </p>
      <p>
        Similarly to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the authors of [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] outline two different levels of
analysis for vision systems:
technology evaluation, which concerns the characteristics of the
algorithms using generic metrics, such as ROC curves.
Standardized data sets are used and the results are therefore repeatable
and depend on the size and scope of the test data sets. Generally,
this evaluation stage requires simple metrics related to the fulfilled
function- detection, estimation, classification.
scenario evaluation, which concerns the system’s behavior in
particular situations - for a specific functionality with its sets of
variables (e.g, number of users, type of lighting). The test data is
based on a controlled real world and is therefore only partly
reproducible. More complex metrics are to be used here, e.g.,
system reliability expressed as mean time between failures.
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] takes the topic of technology evaluation a step further by
defining a set of eight key questions, thought to highlight the best
practices and the state of evaluation methodology in several
representative areas of computer vision discipline: sensor characterization,
feature detection, shape- and grey-level-based object localization,
shape-based object indexing: recognition, lossy image and video
compression, differential optical flow, stereo vision, face recognition,
measuring structural differences in medical images. From the
guiding questions formulated in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], we selected those, which are in our
opinion first to be answered in algorithmic testing:
1. Is there a data set for which the correct answers are known?
2. Are there data sets in common use?
3. Are there any known algorithms that can be used as benchmarks
for comparison?
4. What should we be measuring to quantify performance? What
metrics are used?
      </p>
      <p>
        Though the analysis in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] touches also other aspects of
building a complete vision system, it excludes testing the hardware.
Furthermore, the mentioned software validation is limited to ensuring
that the software implementation of an algorithm correctly
instantiates its mathematical foundation [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Hence, the collected answers
for each of the considered visual tasks indicate the fact that
performance characterization techniques are mostly application/algorithm
specific and that currently they do not refer to the integrated system
as a whole, i.e., comprising hardware, software, data communication
and control.
      </p>
      <p>
        More currently published research like [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] emphasizes the
role of test data generation and test data validation in vision testing.
For the purpose of evaluating CV algorithms, there are today some
publicly available data sets, such as the FERET database [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for face
recognition algorithms, Middlebury [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and KITTI [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] test data sets
for stereo vision, or VOT datasets [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for visual tracking. The
usage of this large amount of test images brings yet some problems.
One of them is that the test data sets are not specially designed for
a particular vision application, but for a class of algorithms. Hence,
a 100% coverage of the possible scenarios can not be guaranteed.
As introduced in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and further elaborated in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], a solution to
this problem would be the automatic generation of datasets, so that
they contain all the typical scenes and hazards, without including too
much redundancy, so that the testing effort could be manageable.
      </p>
      <p>For a change, as the vision algorithm will take eventually the form
of a software program, we see no reason why we should not take
advantage of the great progress in the domain of quality assurance and
software testing in particular. The usage of standardized QA
methods, metrics and tools can ease the work of any CV developer and
quickly improve the overall process, especially in terms of system’s
resilience and end user’s satisfaction.</p>
      <p>
        ”Quality control activities determine whether a product conforms
to its requirements, specifications, or pertinent standards”[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In
addition to the traditional testing practices, QA activities encompass
peer reviews, coding guidelines, and also the usage of software
quality tools, like static analyzers that examine source code for possible
errors or code coverage analysis tools, that can measure the actual
coverage of the software with the available test data sets. For more
information on software testing and other QA techniques we refer
the interested reader to [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Code Coverage Analysis</title>
      <p>Among the first quality assurance metrics invented for systematic
software testing, code coverage is used to describe the degree to
which the source code of a program is tested by a particular test
suite. Test coverage can be used in unit testing, regression testing,
for test case order optimization, test suite augmentation or test suite
minimization.</p>
      <p>The code coverage analysis process is generally divided into code
instrumentation, data gathering, and coverage analysis. Code
instrumentation consists in inserting some additional statements, that
monitor the execution of the source code. The instrumentation can be
done basically at code level in a separate pre-processing phase or at
runtime.</p>
      <p>In order to be self-contained, we briefly introduce here the most
commonly used code coverage metrics, as they might be new to the
computer vision community. We further refer to the following small
code snippet to quickly highlight their major advantages/
disadvantages in practice:
if (x&gt;1 &amp;&amp; y==0) {</p>
      <p>z=z+1;
}
if (x==2 || z&gt;1) {</p>
      <p>z=z+2;
}</p>
      <p>As already mentioned, several kinds of instrumentation are
possible. The most common are for:
line or statement coverage: where the tool instruments the
execution of every executable source code line; this coverage criterion
is a rather poor one, as it is completely insensitive to some control
structures and logical operators.</p>
      <p>For instance, one could execute every statement (reaching a 100%
line coverage) from our example by writing a single test case:
T1(x=2, y=0, z=4). Now, let us assume that the second
decision should have stated z&gt;0. If so, this error would not be
detected. Or perhaps in the first decision should be an or rather than
an and. This error would also go undetected.
decision or branch coverage: it reports whether each decision has
a true and a false outcome at least once; this criterion is
stronger than line coverage, but it is still rather weak.</p>
      <p>For instance, with our previous test-case inputs T1(x=2, y=0,
z=4) and a new one T2(x=3, y=1, z=1), we can reach full
decision coverage. However, if in the second decision we should
have had z&lt;1 instead of z&gt;1, the mistake would not be detected
by the two test cases.
condition coverage: in this case, one has to write enough test cases
to ensure that each condition in a decision takes on both true and
false outcomes at least once; this metric is similar to decision
coverage, but has better sensitivity to the control flow. However,
full condition coverage does not guarantee full decision coverage.
For instance, the following test cases: T3(x=1, y=0, z=4)
and T4(x=2, y=1, z=1) cover all conditions’ outcomes, but
they cover only two of four decisions’ outcomes.
function coverage: reports whether each function is called (and
how many times); it is useful during preliminary testing to quickly
find coarse deficiencies in a test suite.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>CV tailored Evaluation Criteria</title>
      <p>Following the classic software development process depicted in
Figure 1, we first learned that code coverage analysis does not exist in
most of the CV projects. As a result, we tried to identify the
musthave and nice-to-have features of a code coverage tool to be used in
the CV application domain. Like in any tool selection process, one
has to clarify first the user’s requirements. We will further present
only those particular requirements related to computer vision
software, and neglect general questions such as: what platforms can the
tool run on, what is the target application’s language or which are the
supported compilers. We will not mention here requirements
coming from the quality assurance team, which are to be discussed in the
next section.</p>
      <p>The following list ranks the priorities of these specific features, as
discussed with CV software developers:
1. working with templates: due to the great variety of data types
(pixel and parameter types), there is a tremendous number of
templates defined in CV applications and which have to be taken into
consideration when analyzing the code coverage. Tools that
cannot handle templates appropriately are dismissed.
2. unit testing support: in our case, CPPUnit unit testing framework
support is needed, as this is the most frequently used framework
in C/C++ CV applications.
3. excluding 3rd party libraries from the coverage analysis: as most
of the CV applications make use of third party libraries, whose
analysis is obviously not desired, the tool has to provide a simple
way to hook/instrument only certain files.
4. automated testing/non-interactive testing: taken into
consideration the high complexity of the currently developed CV software,
an easy automation of the test coverage analysis is essential.
5. performance under big test data amounts: There is no doubt that
the insertion of instrumentation will increase the code size and
affect the instrumented applications performance, i.e., it will use
more memory and run slower. A low performance overhead is of
course desired, however, considering the complexity of the target
programs, our requirement is that the analysis tool does not crash.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Four state-of-the-art Code Coverage Tools</title>
      <p>Identifying the right tool for code coverage analysis in vision
applications can lead to major productivity improvements and implicitly
to increases in the release quality of the overall computer vision
system. Hence, various free and commercial coverage analyzers have
been inspected and compared. As a large variety of coverage metrics
exist (see the preceding summary), the QA team imposed as
requirement that the code coverage tool should be able to measure at least
condition coverage. This requirement together with the previously
presented CV tailored evaluation criteria have led to limiting our
comparative evaluation to the following four state-of-the-art
commercial coverage tools: C++ coverage validator1 , Squish Coco code
coverage tool2 , BullseyeCoverage tool3 , Testwell CTC++ analyser4
.
4</p>
    </sec>
    <sec id="sec-6">
      <title>Case Study: Dibgen and Dibgiom Libraries</title>
      <p>Dibgen is a collection of basic C++ libraries used particularly,
but not exclusively, in computer vision applications implemented
1 http://www.softwareverify.com/cpp-coverage.php
2 http://www.froglogic.com/squish/coco/index.php
3 http://www.bullseye.com/measurementTechnique.html
4 http://www.verifysoft.com/de cmtx.html
by JOANNEUM RESEARCH (JR). Included libraries cover
basic, mostly matrix based mathematical operations, color handling
and evaluation, as well as generic parameter storage, progress
information handling, different types of basic file IO methods often
used in computer vision, and value-to-string conversion (and
backconversion). All the libraries are implemented using template-heavy
C++ code allowing the usage of different data types (pixel types,
parameter types) for most of the operations. In terms of volume,
Dibgen consists of approximately 100000 LOC.</p>
      <p>The other partially analyzed collection was Dibgiom. Seen as
OpenCV counterpart and based on Dibgen, it contains 15 libraries,
which are all used for image processing tasks. The library consists
of approximately 9 MB of source code and approximately 255000
LOC. We further provide a brief description of those Dibgiom
libraries, which were yet analyzed:</p>
      <p>Band: Various representations of image data in the memory (tiled
with FileIO for huge satellite data, pure memory-based for rapid
CPU access, specially aligned memory layout for acceleration
using Intel Performance Primitives, special layout for CUDA
acceleration), transparently accessible via the same interface to both
user and algorithms.</p>
      <p>BandIterator: Generic access iterators for bands regardless
of memory layout (see above)
Calibration: Simple radiometric calibration methods
Convolve: Image filter based on convolution (Gauss, Laplace,
etc.)
Detect: Various detectors (Extrema, Bright Spot, Corner, etc.)
Filter: General image-filter (arithmetic, logic, etc.,), that
convert, in principle, a pixel in the source image(s) to a pixel in the
target image
KernelFilter: Core-based image filters (mean, median, etc.),
which do not calculate any convolution
KeyPoint: Description of key points for various detectors
Operation: Operations on images whose result, or whose
source is not an image (source no image: filling images, etc., or
target no image: the sum of all the pixels in the image)
Pyramid: Generation of pyramid representations (Gauss, etc.)
Segmentation: Image-based operations that compute
segmentations from arbitrary source images (Watershed, RegionGrowing,
etc.)
Sift: Special version of a Sift detector.</p>
      <p>For the Dibgen experiments we used the same unit test suites
and the same configuration for all the four coverage tools. Although
each tool features more than just decision and function coverage, we
will merely present the comparison of these two types of coverage
measurements, as only they are computed by all the four tools.</p>
      <p>The tests carried out for the Dibgiom experiments are also unit
tests, in which the source data is generated either directly by means of
using unit-test programs (usually only for very simple algorithms), or
by reading the image data from files. In the latter case, the expected
outcome is generated with other reference implementations chosen
from the literature (like MATLAB, OpenCV, etc.) and it is further
compared with the outcome produced by Dibgiom.</p>
      <p>In Table 1 we list the global results for the whole Dibgen test
application, while in Table 2 and Table 3 we present the coverage
results per directory. It is worth noting that with Testwell CTC++, the
coverage results are extremely low, while the other three tools
compute comparable coverage results. Table 4 depicts the running times
for the normal, uninstrumented program and for the instrumented
programs. Note that the tests were run on a notebook with Intel(R)
Core(TM) i7-4500U CPU 1.80 GHz and 8 GB of RAM running
under Windows 10 Pro. Although the running time for the program
invoked by Coverage Validator is approximately six times higher, we
have no source code instrumentation involved, i.e., there is no need to
recompile or relink the target program. The only requirement is the
existence of PDB files with debug information and/or MAP files with
line number information. Therefore we chose to further use the
Coverage Validator tool for the first Dibgiom experiments. The results
can be seen in Table 5.
One of the most complex and frequently used basic libraries in
the JR’s CV applications is the library ParameterPool from the
Dibgen collection. With about 17.000 LOC, the library is used to
store any kind of parameters of arbitrary types in one container. Each
parameter can be combined with validity information, access level
permission for user interface based parameter modifications, as well
as several kinds of descriptive text (unit, help text). Additionally,
parameters can be grouped together and it is possible to define several
types of parameter dependencies. Since this library is used heavily in
nearly every JR CV application, the JR developers particularly paid
attention to test it thoroughly from the very start of development.</p>
      <p>However, first code coverage analysis showed dissenting results,
especially in branch, function and line coverage, while at least file
coverage could reach nearly acceptable results (see Figure 2). More
detailed analysis showed that only 12 out of 36 source code files
had a line coverage better than 90%, while 9 files were not tested
at all (see Figure 4). Although the remaining 15 files were tested at
least partially from the line coverage point of view, especially their
branch coverage showed very poor results. After particular review of
the tested source code, the used test code as well as the used test data,
the test code has been adapted in some places and some test data sets
have been slightly modified.</p>
      <p>Additionally, some new test functions were developed, especially
for previously untested files or functions. One meanwhile unused</p>
      <p>Coverage Validator’s summary tab before improvements.</p>
      <p>Coverage Validator’s summary tab after improvements.</p>
      <p>Coverage Validator’s Files and Lines tab before improvements.
source code file could be entirely removed. Two of the untested
source code files contained only source code that is used to
disable default class behavior (make default constructor, copy
constructor and/or assignment operator private), which makes this code
untestable by design. Altogether, all of these mentioned
modifications did not touch more than 10% of the test code, but resulted in a
huge improvement in all code coverage measures (see Figure 3). As
one can see in Figure 5, now from the remaining 35 source code files,
32 reach a line coverage above 90% (23 of which even reach 100%
- compared to only 9 before the modifications were made). The 2
still remaining untested files contain the above mentioned disabling
source code.</p>
      <p>By improving the test code and the test data for the exemplarily
chosen library, 3 implementation errors were found and corrected, 2
of which can be considered to potentially cause major problems in
applications. Spending some effort in QA and improving the
coverage of the tested source code will already pay off in the near future
in several stages of the testing process; especially in regression and
integration tests.
5</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>In this paper we presented our first steps in the direction of
constructing a generic testing and evaluation protocol for CV applications. In
our view, the performance characterization methodology in the
domain can successfully be complemented with well known techniques
borrowed from a typical quality assurance process.</p>
      <p>The conducted experiments on JR’s source code demonstrated that
with little effort, by means of using a code coverage analysis tool for
the available unit tests, the CV developers can considerably improve
their code, and implicitly the release quality of the overall CV
system.</p>
      <p>After finishing unit/module-testing the program, we have to
perform higher-order testing, as for instance integration and system tests
(see Figure 1), in order to complete the testing process. Therefore,
together with JR, we analyzed the requirements and possible use
cases/hazards of one CV application, which was chosen as
representative candidate in the Vision+ project3. We paid particular
attention to the process of test case definition, with focus on:
requirement(s)(from the requirements specification) related to a particular
test case, its prerequisites (any conditions that must be fulfilled prior
to executing the test), its detailed setup and preferred execution
procedure (automated/manual). However, as usually a test management
tool is used to accomplish the task, we further encourage CV
developers to consider the integration of such a tool in their projects. Our
colleagues from JR have already started out on analyzing the test
management tools available for managing functional software and
hardware testing in agile development projects. Some of the benefits
one gains are the assurance of the complete test cycle, the
repeatability of tests as well as the automatic generation of statistics and
reports.</p>
      <p>Finally, we would like to summarize the main ideas, which will
further lead our work presented in this paper. On one hand, as
resources are always limited, we have to find the right mixture of QA
techniques and to focus towards specific CV pain points. In order to
do this, it is important to determine the desired quality attributes for
CV applications. On the other hand, we have to find a way to derive
applicability rules for certain sets of CV algorithmic classes. Due to
the vast diversity of CV algorithms, these tasks are rather difficult,
however, the classical hierarchy of vision systems, which groups the
them into low-, mid- and high-level processing levels, could serve as
a starting point. At low-level vision, code structure and data
representation are still in close correlations (in other words, every pixel has to
be treated by some kind of operation/code), thus code improvement
by QA directly affects the data quality. For example, filtering
operations by convolutions (as those contained in our Dibgiom library)
are many simple code snippets executed many times sequentially or
in parallel, thus even small code discrepancies produce a large effect,
which easily propagate further to higher processing levels. Mid- and
high-level vision algorithms on the other hand, are more difficult to
tackle, because the representations fall into one of the exponentially
many branches of different meta-data types, where often the same
meta-data can be produced by fundamentally different code pieces.</p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This work was partly funded by BMVIT/BMWFW under COMET
programme, project no. 836630, by ”Land Steiermark” trough SFG
under project no. 1000033937, and by the Vienna Business Agency.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Beizer</surname>
          </string-name>
          ,
          <article-title>Black-box testing: techniques for functional testing of software and systems</article-title>
          , NY, USA,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Kevin</given-names>
            <surname>Bowyer</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Jonathon</surname>
          </string-name>
          <string-name>
            <surname>Phillips</surname>
          </string-name>
          ,
          <article-title>Empirical Evaluation Techniques in Computer Vision</article-title>
          , IEEE Computer Society Press, Los Alamitos, CA, USA, 1st edn.,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Courtney</surname>
          </string-name>
          and
          <article-title>Neil A. Thacker, 'Imaging and vision systems', chapter Performance Characterisation in Computer Vision</article-title>
          : Statistics in Testing and Design,
          <volume>109</volume>
          -
          <fpage>128</fpage>
          , Nova Science Publishers, Inc., Commack,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA, (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Robert</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Grady</surname>
          </string-name>
          ,
          <article-title>Practical Software Metrics for Project Management</article-title>
          and
          <string-name>
            <given-names>Process</given-names>
            <surname>Improvement</surname>
          </string-name>
          , Prentice-Hall, Inc., Upper Saddle River, NJ, USA,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Robert</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Haralick</surname>
          </string-name>
          , '
          <article-title>Performance characterization in computer vision', CVGIP: Image Underst</article-title>
          .,
          <volume>60</volume>
          (
          <issue>2</issue>
          ),
          <fpage>245</fpage>
          -
          <lpage>249</lpage>
          , (
          <year>September 1994</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Matej</given-names>
            <surname>Kristan</surname>
          </string-name>
          , Jiri Matas, Ales Leonardis, Tomas Vojir, Roman P. Pflugfelder, Gustavo Ferna´ndez, Georg Nebehay, Fatih Porikli, and Luka Cehovin, '
          <article-title>A novel performance evaluation methodology for single-target trackers'</article-title>
          , CoRR, abs/1503.01313, (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Moritz</given-names>
            <surname>Menze</surname>
          </string-name>
          and Andreas Geiger, '
          <article-title>Object scene flow for autonomous vehicles'</article-title>
          ,
          <source>in Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Myers</surname>
          </string-name>
          ,
          <source>The Art of Software Testing</source>
          , New Jersey, Second Edition edn.,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jonathon</surname>
          </string-name>
          <string-name>
            <given-names>Phillips</given-names>
            , Hyeonjoon Moon, Syed A.
            <surname>Rizvi</surname>
          </string-name>
          , and
          <string-name>
            <surname>Patrick J. Rauss</surname>
          </string-name>
          , '
          <article-title>The FERET Evaluation Methodology for Face-Recognition Algorithms'</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Mach</source>
          . Intell.,
          <volume>22</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1090</fpage>
          -
          <lpage>1104</lpage>
          , (
          <year>October 2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Scharstein</surname>
          </string-name>
          , Heiko Hirschmller, York Kitajima, Greg Krathwohl, Nera Nesic,
          <string-name>
            <given-names>Xi</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <article-title>Porter Westling, 'High-resolution stereo datasets with subpixel-accurate ground truth</article-title>
          .', in GCPR, eds.,
          <string-name>
            <surname>Xiaoyi</surname>
            <given-names>Jiang</given-names>
          </string-name>
          ,
          <source>Joachim Hornegger, and Reinhard Koch</source>
          , volume
          <volume>8753</volume>
          of Lecture Notes in Computer Science, pp.
          <fpage>31</fpage>
          -
          <lpage>42</lpage>
          . Springer, (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Thacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Beveridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Courtney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. R.</given-names>
            <surname>Crum</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          .
          <article-title>Performance characterisation in computer vision: A guide to best practices</article-title>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wiegers</surname>
          </string-name>
          ,
          <article-title>Peer Reviews in Software: A Practical Guide</article-title>
          , AddisonWesley,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Oliver</surname>
            <given-names>Zendel</given-names>
          </string-name>
          , Wolfgang Herzner, and Markus Murschitz, '
          <article-title>VITRO - vision-testing for robustness'</article-title>
          ,
          <source>ERCIM News</source>
          , (
          <volume>97</volume>
          ), (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Oliver</surname>
            <given-names>Zendel</given-names>
          </string-name>
          , Markus Murschitz,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Humenberger</surname>
          </string-name>
          , and Wolfgang Herzner, '
          <article-title>CV-HAZOP: introducing test data validation for computer vision'</article-title>
          ,
          <source>in 2015 IEEE International Conference on Computer Vision</source>
          , ICCV 2015, Santiago, Chile, December 7-
          <issue>13</issue>
          ,
          <year>2015</year>
          , pp.
          <fpage>2066</fpage>
          -
          <lpage>2074</lpage>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>