<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparing automatic accessibility testing tools</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aki Lempola</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Timo Poranen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zheying Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tampere University</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Accessibility aims to provide services to users with disabilities. With 15% of the world's population living with some form of disability and the increasing aging population, web accessibility is increasingly critical. Recent legislation reinforces this need, requiring accessible websites for all. Web accessibility evaluation ensures that the website conforms to legal requirements and the needs of disabled users. Automatic testing tools play an important role in this process. Previous studies have shown that tools detect a diferent number of issues. In this paper, we compared three automatic accessibility testing tools: IBM Equal access accessibility checker, LERA, and WAVE. We measured their coverage of WCAG success criteria, scanning speed, and the number of issues detected. Finnish e-commerce sites and a test site with known accessibility issues were used for evaluation. This study highlights the strengths and weaknesses of selected automatic accessibility testing tools. WAVE was the fastest tool to scan pages. IBM Accessibility Checker covered the most WCAG success criteria. The number of detected issues varied depending on the page and the type of accessibility issues present on the page. In ifve out of six tested pages, IBM Equal Access Accessibility Checker identified the most issues, and WAVE identified the most issues on one of the six pages.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;web accessibility</kwd>
        <kwd>WCAG</kwd>
        <kwd>tool comparison</kwd>
        <kwd>automatic accessibility testing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Web accessibility ensures everyone, regardless of ability, can
access and use web content. While non-disabled people may
easily read, navigate, watch, and listen to media content,
disabled people may not access the content in the same way
as others. Inaccessible websites exclude individuals from
information and services increasingly delivered online.</p>
      <p>
        World Health Organization [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] estimates that about 15%
of the world’s population lives with some form of disability.
Accessibility benefits everyone, not just individuals with
disabilities [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Aging people may experience deterioration
of cognitive and or physical skills and senses, making
accessibility design important [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Also, proper accessibility
design can improve user experience, especially in
challenging situations such as noisy environments, bright sunlight,
or small screens [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Web accessibility evaluation can ensure that the website
meets the needs of disabled users and complies with legal
requirements. Automatic testing tools play an important
role in identifying potential accessibility issues. However,
studies have shown variation in detecting these issues [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        This paper is based on a master’s thesis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] of the first
author. In this research, we compare three diferent
automatic accessibility testing tools. This study aims to answer
the following research questions:
• RQ 1 What success criteria do automatic accessibility
testing tools test?
• RQ 2 Is there a diference between selected tool
features?
• RQ 3 Do the tools detect diferent issues?
      </p>
      <p>
        The research questions are addressed by conducting a
document analysis to identify how the tools communicate
the WCAG success criteria they test and by comparing them
using Finnish e-commerce sites and a test site with known
accessibility issues. This comparison measures their
coverage of WCAG success criteria, scanning speed, and the
number of issues detected.
The definition of web accessibility is widely discussed in
the research, and there are many diferent definitions with
diferent scopes and natures. Our research adheres to the
definition by WAI [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]: “Web accessibility means that
websites, tools, and technologies are designed and developed so
that people with disabilities can use them. More specifically,
people can: perceive, understand, navigate, interact with
the Web, and contribute to the Web". Web accessibility is
closely related to usability and inclusion when developing
a Web that works for everyone.
      </p>
      <p>
        The World Wide Web Consortium (W3C), an
international community that develops web standards to ensure
the long-term growth of the Web, launched the Web
Accessibility Initiative (WAI). This initiative developed the widely
adopted Web Content Accessibility guidelines (WCAG) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
The previous version 2.1 was released in 2018, and the
latest version 2.2 was released in October 2023. WCAG 2.2
extends the older 2.1 version, and content that conforms
to the newer version 2.2 also conforms to the 2.1 version.
Thus WCAG 2.x versions are backward compatible [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Web
accessibility guidelines, checklists, and standards such as
Web Content Accessibility Guidelines (WCAG) are used to
evaluate accessibility. They are also used in some
countries’ legislation. For example, the European Union uses
WCAG 2.1 conformance levels A and AA as standards for
web accessibility [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. WCAG 2.0 is also ISO (International
Organization for Standardization) standard ISO/IEC 40500.
      </p>
      <p>
        Figure 1 shows the structure of WCAG 2.1. At the top
level, WCAG 2.1 is divided into four principles that make
the Web accessible. Under each principle, there is a list
of guidelines that set basic goals that the authors should
follow to make the content accessible. WCAG 2.1 comprises
13 guidelines, each of which includes a set of testable success
criteria. WCAG 2.1 has 78 success criteria (30 level A, 20
level AA, and 28 level AAA). Each success criterion belongs
to one of three conformance levels A (lowest), AA, and AAA
(highest). To meet a certain conformance level of WCAG
2.1 website needs to satisfy all success criteria of that level
and all levels below it. That means to meet conformance
level AA, the site must satisfy all success criteria of levels
AA and A. For each of the guidelines and success criteria
in the WCAG 2.0, the document provides techniques that
are either suficient to meet the success criteria or advisory
that go beyond what is needed to pass the success criteria.
Advisory techniques may address accessibility issues that
are not covered by any of the success criteria. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
      </p>
      <p>The four principles of WCAG 2.1 are: perceivable,
operable, understandable, and robust. Under the perceivable
principle, there is a total of four guidelines and 29 success
criteria. Perceivable means that the content and user
interface components must be presented in a way the users
perceive them.</p>
      <p>The operable principle includes a total of five guidelines
and 29 success criteria. Operable means that all user
interface components and navigation must be reachable and
usable. The understandable principle contains a total of
three guidelines and 17 success criteria. Understandable
means that the content on the site should be
comprehensible for users from diferent backgrounds, education, and
language skills. The robust principle includes one guideline
and three success criteria. Robust means that the web pages
should be robust enough to work on various user agents.</p>
      <p>
        In Finland, the act on the provision of digital services [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
put in place the accessibility requirements for public service
websites and mobile applications. The main target of the act
is public sector websites and mobile applications, such as
schools and authorities, but also a part of the private sectors
such as banks, insurance companies, etc.) are subject to the
law.
      </p>
      <sec id="sec-1-1">
        <title>2.2. Evaluating web accessibility</title>
        <p>
          Web accessibility evaluation is a process that evaluates how
well users with disabilities can use the Web. This process
aims to find accessibility problems and possibly assess the
level of accessibility [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          Evaluating web accessibility involves assessing its
content in two parts: technical content and natural information
content [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Technical content consists of the markup and
code that describes how the content is displayed and how
the user interface functions. Natural information content
includes the information contained on web pages, text,
multimedia, images, etc. Some success criteria for technical
content are easy to evaluate automatically with software.
For example, WCAG [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] success criterion 1.4.3 for minimum
contrast sets minimum requirements for contrast between
foreground text and the background. The evaluation shares
similarities with software quality assurance, where specific
test cases verify the behavior of software in a controlled
environment. However, the same rule is not trivial to evaluate
when evaluating text in images. It is hard to diferentiate
between foreground text and background in image data,
and often human input is needed. Evaluating natural
information content for accessibility is equally important as
technical content, even though it is often neglected [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
Web content is subject to change frequently, while the
software is often released in discrete versions that don’t change
much over time [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          Accessibility testing tools can be evaluated in at least two
ways: using a test suite or selecting a representative sample
of websites [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Test suites comprise a set of tests where
each success criterion is designed to check if a tool detects
an intentionally made error. On the other hand, selecting a
representative sample of websites allows assessment of the
tool’s performance across diverse real-world scenarios.
        </p>
        <p>
          W3C [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] lists 139 automatic tools for evaluating against
WCAG 2.0 guidelines and 85 tools for WCAG 2.1 guidelines.
The list allows filtering the tools by language, tool type,
supported formats, assistive technologies, scope of evaluation,
and licenses.
        </p>
        <p>
          Errors reported by automatic testing tools can greatly
difer when testing the same website [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Some tools may
report the same error multiple times, thus inflating the
number of errors [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Previous studies [
          <xref ref-type="bibr" rid="ref16 ref17 ref5">16, 5, 17</xref>
          ] recommended
to use multiple automatic tools to increase confidence in
results.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Comparison of accessibility testing tools</title>
      <p>To answer RQ 1 and RQ 2, we investigated the tool
documentation if the tools are transparent about what success criteria
they test. Success criteria coverage was collected from the
available tool documentation. We categorized coverage as
the tool mapping at least one issue to a success criterion, not
necessarily guaranteeing detection of all potential issues
within that criterion. In addition, one issue might map to
multiple success criteria. To answer RQ 3, tools were tested
on three diferent websites, with a focus on in only
automatically detected issues. This excludes warnings and potential
accessibility issues that need a human review,
recommendations, and best practices not related to WCAG 2.1 success
criteria. The study also assumes that all the accessibility
issues reported by the tools were real issues.</p>
      <sec id="sec-2-1">
        <title>3.1. Selecting tools</title>
        <p>
          Automatic accessibility evaluation tools were selected from
the WAIs [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] web accessibility evaluation tools list page.
The vendors and others provide information about the tools
on the page. W3C does not endorse specific tools listed
on the page. The page can assist in selecting evaluation
tools by allowing users to filter according to a wanted tool
feature. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] When we selected the tools, firstly, the list was
ifltered by selecting the tools that check WCAG 2.1
guidelines. Then the type of the tool was set to browser plugins,
and supported formats were set to CSS, HTML, and images.
The list of tools was further filtered down by selecting tools
that generate evaluation results reports, and the license was
set to free software. These filters resulted in a list of five
tools. From the result list, three tools were selected for this
study. They are IBM Equal access accessibility checker [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ],
LERA [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], and WAVE [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>
          All these accessibility testing tools are Chrome
extensions. We used Google Chrome Version 111.0.5563.65
(Oficial Build) (64-bit) [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], and for IBM Equal Access
Accessibility checker Chrome extension version 3.1.46.9999, for LERA
we used version 0.5.2, and for WAVE we used the Chrome
extension 3.2.3.
        </p>
        <p>The IBM accessibility checker reports accessibility
errors in violations, needs review, and recommendations
categories. Figure 2 shows a result of an accessibility scan.
Violations are errors detected by the tool automatically, needs
review are possible violations that need manual review, and
recommendations are opportunities to apply best practices.</p>
        <p>Figure 3 shows the LERA dashboard. The dashboard
shows the number of issues found on the page and the
distribution of issues by severity. The automated issues tab
shows the issues in a list. After clicking an issue, LERA
shows details of the issue, including code snippet, impact,
issue tags that show references to guidelines, and
recommendations on how to fix the issue. Clicking the eye icon
highlights the issue location on the page.</p>
        <p>WAVE presents the page with embedded icons and
indications. These icons and indicators present some information
about the accessibility of the page. Figure 4 shows an
example scan result with a missing text alternative issue selected.
The WAVE side panel provides a summary of the scan
results. Accessibility issues are reported as errors and alerts.
WAVE also reports features, structure elements, and ARIA
labels detected, that way evaluator can manually review
that the features are implemented correctly.</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Selecting websites</title>
        <p>To compare the accessibility testing tools objectively, we
decided to use two e-commerce websites Vertaa.fi and
Verkkokauppa.com, for their diverse content coverage,
including images, text, videos, input fields, and forms. In
addition, these e-commerce sites do not have specific legal
accessibility requirements yet. Additionally, we selected an
open-source test suite.</p>
        <p>
          Vertaa.fi is the most popular price comparison site in
Finland [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. It ofers a service to compare prices of products
but doesn’t sell any products itself. The service collects
product price information from 243 stores and directs users
to search for products and find a store that ofers the lowest
price.
        </p>
        <p>
          Verkkokauppa.com is the most visited and well-known
retail e-commerce store in Finland [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. They sell computers,
electronics, toys, games, sports products, etc. Verkkokauppa
was founded in 1992, and in addition to their e-commerce
store, they have 4 retail stores.
        </p>
        <p>
          To compare the tools’ capabilities, we also selected an
open-source test suite developed by the Government digital
services [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. The test site has 142 accessibility issues.
However, we excluded two tests from our evaluation, because
the embedded YouTube video for the flashing content test
is no longer available, and the alternative text for an audio
ifle is missing.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>3.3. Selecting sampling pages</title>
        <p>To ensure diverse scenarios and capture a wider range of
potential accessibility issues, we strategically selected pages
beyond just the landing pages of each website. A landing
page is the first page for most users to visit the site and
navigate to the diferent pages of the site. We assume that
developers paid the most attention to minimize errors and
accessibility issues. On Verkkokauppa.com we included
in the test the landing page, account creation page, and
customer service page. On the Vertaa.fi site we tested the
landing page and the cheap flights category (halvat lennot)
page. On the test suite, most of the tests were on the main
page, but some of the tests were linked to diferent pages.
We tested both the main page and test pages directed by
links. All the pages were tested on 6.4.2023.</p>
      </sec>
      <sec id="sec-2-4">
        <title>3.4. How the pages were tested</title>
        <p>We tested each page with each tool one by one. First, we
waited for the entire page to load, and scrolled to the bottom
of the page before initiating any tests. Then we tested the
same page in the same browser window, with each tool one
by one. This minimizes the probability of dynamic content
changing between the test of diferent tools. Lastly, we
repeated the steps one more time, to see if the accessibility
testing tools produced consistent results, and to collect the
data of scan speed of the tools. All the tools reported
identical results on both scans on every page we tested. Tools
report the detected accessibility issues in diferent ways.
Each tool checks accessibility rules that are connected with
one or more WCAG 2.1 success criteria. The study focused
on how well issues detected by the automatic testing tools
conform to WCAG. Therefore, we collected the total
number of accessibility issues and the number of issues in each
WCAG success criterion. In some cases, the number of
accessibility issues may be less than the sum of WCAG violations,
because an issue detected by the tool may be mapped to
multiple success criteria by the tool.</p>
      </sec>
      <sec id="sec-2-5">
        <title>3.5. Comparison metrics</title>
        <p>We compared the tools on the aspects of eficiency,
completeness, and the number of detected issues. Eficiency was
measured by scan time, reflecting the time taken to analyze
each page. Fast scan time is important for eficiently
evaluating large websites with hundreds of pages. Completeness
defines how completely the tool covers the WCAG success
criteria. The tool is considered to cover a success criterion if
the tool performs at least one test on that success criterion.
The number of detected issues is the number of
automatically detected issues on a page. Issues that needed a human
review and recommendations were discarded because we
were interested in how well the tools detect issues
automatically. While not exhaustive, we also compared a list
of features found particularly useful during testing. This
comparison helps users identify tools suited to their specific
needs.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Results</title>
      <sec id="sec-3-1">
        <title>4.1. Success criteria coverage</title>
        <p>
          To evaluate the transparency of each tool, we analyzed their
documentation to identify how clearly they communicate
the WCAG success criteria they test. This helps us
understand how comprehensively users can assess the tool’s
capabilities. Figure 5 shows the success criteria covered by
the tools according to the documentation [
          <xref ref-type="bibr" rid="ref18 ref20 ref26">26, 18, 20</xref>
          ]. In
Figure 5, F stands for failure, A for an alert, N stands for
needs review, and X means that the tool didn’t specify if the
tests produce errors or alerts of possible errors.
        </p>
        <p>Our analysis reveals that the selected tools collectively
ofer tests for issues and warnings across 37 of the 78 WCAG
success criteria, which represents 47% of the success criteria
coverage. While the success criteria are covered, it is crucial
to understand that the coverage does not mean complete
testing. It simply means that the tool can detect at least one
kind of issue mapped to the success criterion. Tools also
map some detected issues to multiple success criteria.</p>
        <p>IBM Equal Access Accessibility Checker did not
mention if the automatic test for specific WCAG success criteria
produces issues or alerts of possible errors. They only
reported that the success criteria were automatically tested.
But when comparing the coverage of issues and alerts, IBM
Accessibility Checker covers the most WCAG 2.1 success
criteria 31 out of 78, while WAVE and LERA both cover 22
success criteria when considering both automatically
detected issues and alerts of possible issues. According to the
tool’s documentation, LERA covers the most success criteria
with a fully automatic test with 20 out of 78 success criteria,
while WAVE detects automatic issues for 13 out of 78
success criteria. The Union of the success criteria covered by
the tools shows that the selected tools cover diferent
success criteria. Thus complementing each other. The Union
of success criteria covered by issues and alerts is 37, while
the single tool with the widest coverage covered 31 success
criteria.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Detected accessibility issues</title>
        <p>In this section, we go over the accessibility issues detected
by the tools. We found that every tool we tested detected
accessibility issues on every tested page. The performance
of the tools seems to depend on the page and the type of
accessibility issues present on the page. One tool may find
the greatest number of issues on one page but the least
number of issues on another page.</p>
        <p>Figure 6 shows the total number of issues for each success
criterion on all tested pages. IBM Accessibility Checker
reported the most issues for success criteria 4.1.2 (name,
role value), 2.4.1 (bypass block), 2.1.1 (keyboard), and 1.3.1
(info and relationships). WAVE reported the greatest number
of issues for success criteria 1.4.3 (contrast), 1.1.1 (non-text
content), 2.4.4 (link purpose in context), 2.4.6 (headings and
labels), and 3.3.2 (labels or instructions).</p>
        <sec id="sec-3-2-1">
          <title>4.2.1. Verkkokauppa.com</title>
          <p>Figures 7, 8, and 9 show the accessibility issues detected on
the tested pages on verkkokauppa.com. Every tool found
accessibility issues on all these pages. WAVE found a total
of 25 issues, IBM 69 issues, and LERA 17 issues.</p>
          <p>Figure 7 shows the accessibility issues detected on the
verkkokauppa.com landing page. IBM Accessibility Checker
found the most errors, that is 41 accessibility issues were
detected, while WAVE detected 12 issues and LERA 3 issues.</p>
          <p>IBM Accessibility Checker also found issues in the
greatest number of success criteria, finding issues in 5 diferent
success criteria. While IBM found the most issues on the
landing page, every tool found issues in the success
criterion 1.4.3 low contrast, WAVE found the greatest number of
issues for this success criteria 12, IBM 11, and LERA 1 issue.</p>
          <p>Accessibility issues identified on the account creation
page are presented in Figure 8. Again, IBM Accessibility
Checker identified the most issues. In detail, IBM
Accessibility Checker found 17 issues, and WAVE and LERA both
found 3 accessibility issues. On this page, WAVE found the
most issues violating success criteria 1.4.3 for low contrast.
WAVE also maps the empty form label rule to 4 diferent
WCAG success criteria, success criteria 1.1.1, success criteria
1.3.1, success criteria 2.4.6, and success criteria 3.3.2, while
IBM Accessibility Checker maps this issue to success criteria
4.1.2. LERA and IBM Accessibility Checker found the same
number of issues for success criteria 1.3.5 and 1.4.3. Again,
WAVE found the greatest number of contrast issues 2, while
IBM and LERA found 1. Figure 9 shows the accessibility
issues found on the verkkokauppa.com customer service
page. On this page, the tools produced the most similar
results. All three tools found the same number of violations
for success criteria 1.1.1 non-text content, each tool found
10 issues. In addition, LERA and IBM Accessibility Checker
produced an identical report of errors on the customer
service page. Both tools found 11 issues, and both tools mapped
the found to the same success criteria. Every tool scanned
the customer service page under a second, because the page
was the smallest page of the three tested pages.
4.2.2. Vertaa.fi
Figures 10 and 11 show the results of pages tested on
vertaa.fi. The total number of errors on the tested pages were:
WAVE 205, IBM Accessibility Checker 234, and LERA 139
accessibility issues between the two tested pages. Table 6
shows the accessibility issues detected with each tool on the
vertaa.fi landing page. WAVE detected the greatest number
of issues on the landing page with a total of 139 issues were
detected, IBM Accessibility Checker detected 112 issues, and
LERA 118 issues.</p>
          <p>Figure 10 also shows that the WAVE detected the greatest
number of accessibility violations in the success criteria
1.4.3 contrast. Figure 7 shows some contrast issues detected
on the vertaa.fi landing page, only the dark blue text has
suficient contrast. IBM Accessibility Checker was the only
tool to detect any violations of success criteria 2.4.1. Every
tool found 44 violations for the success criteria 1.1.1, but
WAVE also maps missing labels to this category, for that
reason WAVE reported more success criteria 1.1.1 violations
than the two other tools.</p>
          <p>Figure 11 shows the accessibility issues detected on the
vertaa.fi flight search page. IBM Accessibility Checker found
the greatest number of errors on the flight search page,
with a total of 122 accessibility issues found, WAVE found
66 issues, and LERA 21. Similarly, to the vertaa.fi landing
page, all the tools found the same number of success criteria
1.1.1 violations, but WAVE mapped additionally 3 missing
label issues to this category. Hence the larger number of
issues for success criteria 1.1.1. WAVE detected the greatest
number of success criteria 1.4.3 violations. IBM Accessibility
Checker was the only tool to detect issues for success criteria
2.1.1 and 2.4.1. IBM Accessibility checker also detected the
greatest number of violations for the success criteria 4.1.2:
IBM 97, LERA 5, WAVE 0.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>4.2.3. Test suite</title>
          <p>Accessibility issues detected on the test are shown in Figure
12. IBM Accessibility Checker detected the greatest number
of issues in the test suite, most of the issues were in the
success criteria 1.3.1 and these issues were about data Table
cells missing header or scope, IBM Accessibility Checker
was the only tool that detected these issues.</p>
          <p>All the tools found the same number of issues for success
criteria 1.4.3 and 2.2.2. LERA and IBM Accessibility Checker
found the same number of issues for 7 out of 11 success
criteria. For the test of usage of lang attribute for change of
language with an invalid value, IBM accessibility checker
and LERA mapped the issue to success criteria 3.1.2, while
WAVE mapped this issue to success criteria 3.1.1.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>4.3. Summary of test results</title>
        <p>WAVE reported the most issues for the success criterion
1.4.3 low contrast on every page we tested. While testing
the test suite, all the tools detected all the contrast tests in
the test suite. This implies that the use of test suites doesn’t
necessarily imitate the real issues on real pages. Across the
6 pages we tested, IBM Accessibility Checker detected the
most accessibility issues overall. However, WAVE surpassed
it in finding issues on one specific page, demonstrating the
value of considering multiple tools for diverse scenarios.</p>
        <p>IBM Accessibility Checker was the only tool to report
issues for success criteria 2.1.1 and 2.4.1 on the real pages,
while LERA and IBM Accessibility Checker detected issues
for these categories in the test suite, and WAVE detected
issues for the 2.4.1 in the test site.</p>
        <p>While all tools detected the mentioned accessibility issues,
there are inconsistencies in how they mapped these issues to
specific WCAG success criteria. For example, WAVE maps
an issue of an empty or missing form label to four success
criteria 1.1.1 non-text content, 1.3.1 info and relationships,
2.4.6 headings and labels, and 3.3.2 labels and instructions.
IBM Accessibility Checker maps the same issue to a success
criterion 4.1.2 name, role, and value. Additionally, LERA
maps the missing form label to two success criteria 1.3.1
info and relationships and 4.1.2 name, role, value. Another
example is about an image link with no alternative text.
WAVE maps this to criteria 1.1.1 non-text content and 2.4.4
link purpose, but IBM Accessibility Checker just maps this
criterion 2.4.4 link purpose. In addition, LERA maps this
issue to two criteria 2.4.4 link purpose and 4.1.2 name, role,
value. While consistent issue detection is crucial, these
discrepancies in WCAG mappings can be confusing for users,
especially those relying on the tools for compliance
guidance.This highlights the importance of considering not only
the number of issues detected but also how tools interpret
and categorize them. Users should be aware of potential
mapping inconsistencies and may need to consult additional
resources for definitive WCAG compliance assessments.</p>
        <p>The number of accessibility issues detected by each tool
depends significantly on the selected pages and the type
of issue present. One tool might detect more accessibility
issues on one page than another and fewer issues on
another page, depending on the type of accessibility issues
on the page. Out of the selected tools WAVE appears to
be best at detecting issues for success criterion 1.4.3 low
contrast, while IBM accessibility Checker appears to detect
most issues for success criteria 4.1.2 name, role, value, 2.4.1
bypass block, and 2.1.1 keyboard on the tested pages. If one
tool is better at detecting one type of accessibility issue than
other tools, and then if this type of issue is prominent on
the page, then that tool is going to detect more issues on the
page. As can be seen in Figures 10 and 11 vertaa.fi pages,
WAVE detected more issues on the landing page and IBM
on the flight search page. For that reason, using multiple
automatic testing tools is recommended.</p>
        <p>As for the scan time, WAVE was the fastest tool, LERA
was the second fastest, and IBM Accessibility Checker was
the slowest of the selected tools. Regarding the average scan
time per tested pages, IBM Accessibility checker was over
7 times slower than WAVE and LERA was 4 times slower
than WAVE.
Tool features were gathered while using the tools to scan
pages. Features listed in Figure 13, are not a comprehensive
list of all the features of the tools, rather they are the ones we
identified useful while using the tools. The tools showing
navigation order can be a useful feature. Especially the way
WAVE implemented this feature. WAVE shows what the
screen reader says. This can be useful for understanding the
functionality of screen readers, without the need to install
and learn to use a screen reader. Another useful feature
implemented by WAVE is to toggle the styles, this feature
can help to find accessibility issues hidden with styles. IBM
Accessibility Checker is the only tool to allow changing
between rulesets. IBM Accessibility Checker is the only
tool that allows one to select an element on the page and
show the issues on the selected element. This feature can
be useful if the page has a large number of issues. It may
be easier to select a part of the page and fix issues that way,
instead of going over an overwhelming number of issues.</p>
        <p>All three selected tools map the detected issues to the
WCAG 2.1 guidelines, this allows the user to seek more
information about the issue. All the selected tools highlight
issues on the page. All the tools also provide instructions
on how to fix accessibility issues.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Discussion</title>
      <p>In this research we compared three automatic accessibility
evaluation plugins for Google Chrome in terms of eficiency,
WCAG success criteria covered, and issues detected. The
selected tools were WAVE, IBM Equal Access Accessibility
Checker, and LERA. With the comparison of the tools, we
deepened our understanding of the automatic accessibility
evaluation tools, what these tools can test in terms of WCAG,
what issues they found in Finnish e-commerce sites, and
whether are there diferences among the selected tools.</p>
      <p>Regarding WCAG success criteria covered, we found that
the combination of the tools covered 37 success criteria
out of 78. This is more than any single tool, alone IBM
Accessibility Checker covered 31 success criteria, and WAVE
and LERA each covered 22 success criteria. From this, we
can see that the tools cover not only a diferent number of
success criteria but also diferent success criteria. Thus, the
tools complement each other.</p>
      <p>In terms of the number of issues detected, results depend
on the scanned page. More precisely the number of issues
detected by the tool depends on the types of accessibility
issues present on the page. Most of the issues are in
perceivable principle. Out of the scanned pages, success criterion
1.4.3 low contrast issues have the greatest impact on the
results. IBM Accessibility Checker detected the greatest
number of accessibility issues on five out of six scanned
pages, while WAVE detected the most issues on one out of
the six scanned pages. IBM Accessibility Checker detected
the greatest number of issues on the most of tested pages, it
also detected the least number of issues on one tested page.</p>
      <p>As to types of accessibility issues detected, WAVE
detected the greatest number of issues for four success criteria
and IBM Accessibility Checker for four success criteria.
Although the accessibility issues detected for each success
criterion may not be a reliable metric for measuring the tool
performance, the tools seem to map detected issues to the
WCAG diferently. WAVE tends to map an issue from one to
four success criteria, while IBM Accessibility Checker maps
these issues to one success criterion, and LERA maps issue
one to two success criteria.</p>
      <p>
        The results align with the previous studies of automatic
accessibility evaluation tools [
        <xref ref-type="bibr" rid="ref16 ref17 ref5">16, 5, 17</xref>
        ]. The tools selected
for this research cover diferent success criteria and
complement each other. The usage of a combination of diferent
automatic accessibility testing tools yields better results
than using a single tool. Thus, it is recommended to use
more than one automatic accessibility evaluation tool. It is
also important to keep in mind, that the automatic
accessibility testing tools cannot detect all accessibility issues.
Many accessibility requirements need human interpretation.
It’s not possible to determine conformance to the guidelines
with automatic tools alone.
      </p>
      <p>The method of this research has its limitations. Firstly,
we used only automatic accessibility evaluation tools, these
tools can only detect a part of accessibility issues present
on a page. And in this research, we were only interested
in the automatically detected issues. Further limiting the
number of the issues these tools can detect, as we discarded
all issues that needed manual review. Secondly, we assumed
that all the issues reported by the tools are true positives.
These limitations may reward a tool that reports more issues
with a cost of accuracy and penalizes tools that are more
conservative and attempt to report only real accessibility
issues.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>In this study, we covered three automatic accessibility
evaluation tools. We tested them on two diferent websites and
a test site. Future studies could expand the number of tools
and the number of tested pages and include more diferent
types of websites, to gain more confidence in the results.
Diferent types of automatic accessibility tools could be
included.</p>
      <p>Future studies could also analyse alerts of potential issues,
to find out if there are diferences between tools. Does one
tool report an accessibility issue as a detected issue, and do
other tools then report the same issue as an issue that needs
manual review? Future studies could also manually analyse
the automatically detected issues to compare the accuracy
of the tools. A comparison against a manual conformance
review of the page could also be made to analyse how well
the automatic tools detect issues compared to an expert
evaluator.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>World</given-names>
            <surname>Health</surname>
          </string-name>
          <string-name>
            <surname>Organization</surname>
          </string-name>
          ,
          <source>World report on disability summary</source>
          , https://www.who.int/publications/i/item/ WHO-NMH-VIP-
          <volume>11</volume>
          .01,
          <string-name>
            <surname>Accessed</surname>
          </string-name>
          :
          <fpage>2024</fpage>
          -5-31.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schmutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sonderegger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sauer</surname>
          </string-name>
          ,
          <article-title>Implementing recommendations from web accessibility guidelines: would they also provide benefits to nondisabled users</article-title>
          ,
          <source>Human factors 58</source>
          (
          <year>2016</year>
          )
          <fpage>611</fpage>
          -
          <lpage>629</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Richards</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. L.</given-names>
            <surname>Hanson</surname>
          </string-name>
          ,
          <article-title>Web accessibility: a broader view</article-title>
          ,
          <source>in: Proceedings of the 13th international conference on World Wide Web</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>72</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>WAI</surname>
          </string-name>
          , Accessibility, Usability, and Inclusion, https://www.w3.org/WAI/funda-mentals/ accessibility-usability-inclusion/, Accessed:
          <fpage>2023</fpage>
          -5-8.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ismailova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Inal</surname>
          </string-name>
          ,
          <article-title>Comparison of online accessibility evaluation tools: an analysis of tool efectiveness</article-title>
          ,
          <source>IEEE Access 10</source>
          (
          <year>2022</year>
          )
          <fpage>58233</fpage>
          -
          <lpage>58239</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lempola</surname>
          </string-name>
          ,
          <article-title>Comparing automatic accessibility testing tools</article-title>
          ,
          <source>Master's thesis</source>
          , Tampere University, 42 pages, available at: https://trepo.tuni.fi/handle/10024/148622 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>WAI</given-names>
            ,
            <surname>Accessibility</surname>
          </string-name>
          <string-name>
            <surname>intro</surname>
          </string-name>
          , https://www.w3.org/WAI/ fundamentals/accessibility-in-tro/, Accessed:
          <fpage>2023</fpage>
          -6- 9.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[8] W3c accessibility standards overview</article-title>
          , https://www.w3. org/WAI/standards-guidelines/wcag/, Accessed:
          <fpage>2023</fpage>
          - 5-31.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] Web content accessibility guidelines (WCAG) 2</article-title>
          .1, https://www.w3.org/TR/WCAG21/, Accessed:
          <fpage>2023</fpage>
          - 7-15.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>European</surname>
            <given-names>Commission</given-names>
          </string-name>
          ,
          <source>Directive (EU)</source>
          <year>2016</year>
          /
          <article-title>2102 of the European Parliament and of the Council of 26 October 2016 on the accessibility of the websites and mobile applications of public sector bodies</article-title>
          , https://eur-lex.europa.eu/legal-content/EN/TXT/ ?uri=CELEX:32016L2102,
          <string-name>
            <surname>Accessed</surname>
          </string-name>
          :
          <fpage>2024</fpage>
          -5-31.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <article-title>Laki digitaalisten palveluiden tarjoamisesta</article-title>
          . Valtiovarainministeriö, https://www.finlex.fi/fi/laki/smur/ 2019/20190306, Accessed:
          <fpage>2024</fpage>
          -1-31.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Brajnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yesilada</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Harper,</surname>
          </string-name>
          <article-title>The expertise efect on web accessibility evaluation methods</article-title>
          ,
          <source>HumanComputer Interaction 26</source>
          (
          <year>2011</year>
          )
          <fpage>246</fpage>
          -
          <lpage>283</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Aboy-Zahra</surname>
          </string-name>
          ,
          <article-title>Web accessibility and guidelines</article-title>
          , in: S. Harper, Y. Yesilada (Eds.),
          <source>Web Accessibility a Foundation for Research</source>
          , Springer Science &amp; Business media,
          <year>2008</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brown</surname>
          </string-name>
          , V. Conway,
          <article-title>Benchmarking web accessibility evaluation tools: measuring the harm of sole reliance on automated tests</article-title>
          ,
          <source>in: Proceedings of the 10th international cross-disciplinary conference on web accessibility</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <fpage>W3C</fpage>
          ,
          <article-title>Web accessibility evaluation tools list</article-title>
          , https: //www.w3.org/WAI/ER/tools/, Accessed:
          <fpage>2023</fpage>
          -7-15.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Padure</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pribeanu</surname>
          </string-name>
          ,
          <article-title>Comparing six free accessibility evaluation tools</article-title>
          ,
          <source>Informatica Economica</source>
          <volume>24</volume>
          (
          <year>2020</year>
          )
          <fpage>15</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T.</given-names>
            <surname>Frazão</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Duarte</surname>
          </string-name>
          ,
          <article-title>Comparing accessibility evaluation plug-ins</article-title>
          ,
          <source>in: Proceedings of the 17th International Web for All Conference</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>IBM</given-names>
            <surname>Equal Access Accessibility Checker</surname>
          </string-name>
          , https://www. ibm.com/able/toolkit/tools/, Accessed:
          <fpage>2024</fpage>
          -1-31.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>LERA - Website Accessibility</surname>
          </string-name>
          Testing &amp; Reporting Tool, https://advancedbytez.com/lera/, Accessed:
          <fpage>2024</fpage>
          -1-31.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Keyboard</surname>
            <given-names>accessibility</given-names>
          </string-name>
          , https://webaim.org/ techniques/keyboard/, Accessed:
          <fpage>2023</fpage>
          -7-15.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <article-title>WAVE web accessibility evaluation tools</article-title>
          , https://wave. webaim.org/, Accessed:
          <fpage>2024</fpage>
          -1-31.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Google</surname>
            ,
            <given-names>Google</given-names>
          </string-name>
          <string-name>
            <surname>Chrome</surname>
          </string-name>
          ., https://www.google.com/ chrome/, Accessed:
          <fpage>2023</fpage>
          -3-21.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Vertaa</surname>
          </string-name>
          , Vertaa.fi, https://www.vertaa.fi/info/info/, Accessed:
          <fpage>2023</fpage>
          -4-3.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Verkkokauppa</surname>
          </string-name>
          , Verkkokauppa.com. Yritystiedot., https://www.verkkokauppa.com/fi/yritystiedot, Accessed:
          <fpage>2023</fpage>
          -4-3.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Government</surname>
            <given-names>Digital</given-names>
          </string-name>
          <string-name>
            <surname>Services</surname>
          </string-name>
          .
          <article-title>Accessibility tool audit</article-title>
          ., https://alphagov.github.io/accessibility-tool-audit/ test-cases.html, Accessed:
          <fpage>2023</fpage>
          -4-3.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Deque</surname>
            ,
            <given-names>Deque</given-names>
          </string-name>
          <string-name>
            <surname>Labs</surname>
          </string-name>
          . 2023c.
          <article-title>Rule descriptions</article-title>
          ., https://github.com/dequelabs/axe-core/blob/ 4937bfa4f8d689f81fb89c71d6a292fcbdba767b/doc/ rule-descriptions.md, Accessed:
          <fpage>2023</fpage>
          -3-21.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>