=Paper= {{Paper |id=Vol-2958/paper4 |storemode=property |title=A Test Suite for JSON Schema Containment |pdfUrl=https://ceur-ws.org/Vol-2958/paper4.pdf |volume=Vol-2958 |authors=Lyes Attouche,Mohamed-Amine Baazizi,Dario Colazzo,Yunchen Ding,Michael Fruth,Giorgio Ghelli,Carlo Sartiani,Stefanie Scherzinger |dblpUrl=https://dblp.org/rec/conf/er/AttoucheBCDFGSS21 }} ==A Test Suite for JSON Schema Containment== https://ceur-ws.org/Vol-2958/paper4.pdf
        A Test Suite for JSON Schema Containment?

 Lyes Attouche1 , Mohamed-Amine Baazizi2 , Dario Colazzo1 , Yunchen Ding3 ,
 Michael Fruth3 , Giorgio Ghelli4 , Carlo Sartiani5 , and Stefanie Scherzinger3
    1
        Université Paris-Dauphine & Université PSL, CNRS, LAMSADE, Paris, France
                      {lyes.attouche, dario.colazzo}@dauphine.fr
           2
             Sorbonne Université, LIP6 UMR 7606, France baazizi@ia.lip6.fr
                          3
                            University of Passau, Passau, Germany
                 {michael.fruth, stefanie.scherzinger}@uni-passau.de
        4
          Dipartimento di Informatica, Università di Pisa, Italy ghelli@di.unipi.it
          5
             DIMIE, Università della Basilicata, Italy carlo.sartiani@unibas.it



          Abstract. JSON is a very popular data exchange format, and JSON
          Schema an increasingly popular schema language for JSON. Evidently,
          schemas play an important role in implementing conceptual models. For
          JSON Schema, there is a first generation of tools for checking whether
          one schema is contained in another. This is an important task when
          comparing schemas, and ultimately, the conceptual models that they
          capture. Testing whether such tool implementations are correct is diffi-
          cult, since writing test cases requires a deep understanding of the JSON
          Schema language. In this demo, we present the first systematically gen-
          erated test suite for JSON Schema containment checking. This test suite
          consists of pairs of schemas where the containment relationship is known
          by construction. Our test suite aims at covering all language features of
          JSON Schema. Applying existing containment checkers (including our
          own implementation) to our test suite, we discovered implementation
          bugs not known to us. We offer our test suite to the research community
          as well as to tool developers, hoping to contribute to the development of
          JSON Schema containment checkers.


Keywords: JSON Schema Containment Checking · Test Suite · Comparing
Conceptual Models


1        Introduction

Nowadays, the most widely used data exchange format is JSON, thanks to its
flexibility and its ability to represent both objects and sequences. JSON Schema
is increasingly adopted for specifying and validating JSON instances. Software
systems exposed as API rely on JSON Schema to express the shape of the
?
    This contribution was partly supported by the Deutsche Forschungsgemeinschaft
    (DFG, German Research Foundation) – 385808805.
    Copyright © 2021 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
20      Attouche et al.




Fig. 1. Screenshot showing test schemas side-by-side. We use the tool Josch [9] as an
editor and for comparing the verdicts of different JSON Schema containment checkers.


expected request and the format of the returned data. Some machine learning
libraries [4] also resort to JSON Schema to verify the input-output compatibility
of the pipelines operators. Hence, it becomes paramount to develop tools for
deciding whether the set of instances of one schema are included in the set of
instances of another schema, that is, to check schema containment.
    In the case of JSON, checking for schema containment can be particularly
challenging, as JSON Schema [11], the de-facto standard schema language for
JSON data [3], is extremely powerful, but also complex (especially when nega-
tion is involved [5]). As an example, consider the assertion “items”: {“type”:
“integer”} in schema 2 from Figure 1 ( 2 ). This assertion states that the ele-
ments in an array can be integers only, but the assertion is also satisfied by any
JSON value that is not an array.
    When comparing two schemas, or two versions of some schema, a crucial
question is how these schemas relate. For instance, in Figure 1, the left schema
(here, an array of three integer constants) is contained in the right: Any instance
valid w.r.t. the left schema is also valid w.r.t. the right schema, but not vice versa.
    Syntax-driven approaches for checking schema containment cannot be easily
adapted, nor tree-automata approaches, as used for XML types [14]. Indeed,
JSON Schema is inherently more expressive than XML Schema [13], given the
presence of the uniqueItems assertion; in a similar way, approaches based on
stacks and regular expression inclusion algorithms cannot cope with the con-
straints of JSON Schema. More sophisticated approaches are required. First
proposals have been made [2, 10], yet given the lack of a trusted testbed, check-
ing the correctness of such tools remains a challenge [8].

Our Contributions. We propose here a test suite for checking JSON Schema con-
tainment. Our test suite comprises pairs of schemas and the schema containment
result. We made the deliberate decision to not hand-craft our test suite, or use
real-world examples, as this would make it vulnerable to errors. Rather, we de-
rive all tests from an existing ground truth, the JSON Schema Test Suite [12] (a
collection of schema validation tests, which is a different and well-explored task).
By design, our test suite covers all keywords in the JSON Schema language.
                                     A Test Suite for JSON Schema Containment          21

    We have made our test suite available as open source. Our Zenodo archive
at https://zenodo.org/record/5336931 also links to our GitHub repository.


2   Preliminaries

JSON data model. The grammar below captures the syntax of JSON values,
which are basic values, objects, or arrays. Basic values B include the null value,
booleans, numbers n, and strings s. Objects O represent sets of members, each
member being a name-value pair, and arrays A represent sequences of values.


    J ::= B | O | A                                                 JSON expressions
    B ::= null | true | false | n | s    n ∈ Num, s ∈ Str               Basic values
    O ::= {l1 : J1 , . . . , ln : Jn }   n ≥ 0, i 6= j ⇒ li 6= lj           Objects
    A ::= [J1 , . . . , Jn ]             n≥0                                 Arrays



JSON Schema. JSON Schema is a language for defining the structure of JSON
documents. The syntax and semantics of JSON Schema have been formalized
in [13] (following Draft-04), and we merely present the main keywords informally:
Assertions include required, enum, const, pattern and type, and indicate a
     test that is performed on the corresponding instance.
Applicators include the boolean operators anyOf, allOf, oneOf, not, the ob-
     ject operators properties, patternProperties, additionalProperties,
     the array operator items, and the reference operator $ref. They indicate a
     request to apply a different operator to the same instance or to a component
     of the current instance.
Annotations include title, description, and $comment, they do not affect
     validation but they indicate an annotation associated to the instance.

JSON Schema Validation. In validating a JSON instance against a schema,
we check whether the instance is valid w.r.t. the schema. Implementations are
available in most common programming languages [1]. The JSON Schema Test
Suite [12] is a collection of tests for validators, covering the entire JSON Schema
language. Figure 2 shows a sample test case on the left, which we discuss further
below. Programmers write a script in their preferred programming language, to
parse this JSON-encoded input, and to perform unit tests, e.g., as done in [7].

Example 1. We present the general idea behind the test case in Figure 2 (left).
The schema to be tested starts in line 3. It declares that if the instance is an
array, then all items must be integer-typed. The data instances to be tested are
contained in lines 9, 14, and 19. Lines 10, 15, and 20 state whether the data is
a valid instance w.r.t. the schema. Note that due to the conditional semantics
of JSON Schema, the object in line 19 is indeed a valid instance, because the
integer-type assertion only applies if the instance is an array.
22                           Attouche et al.

                                                                                                1 {




                                                                           Cont. Test Case 1
                                                                                                2   " schema1 ": {
                                                                                                3     " const ": [ 1 , 2 , 3 ]
                        1 {
                                                                                                4   },
                        2 " description ": " schema for items " ,
                                                                                                5   " schema2 ": {
                        3 " schema ": {
                                                                                                6     " items ": { " type ": " integer " }
                        4   " items ": {" type ": " integer "}
                                                                                                7   },
                        5 },
                                                                                                8   " tests ": {
                        6 " tests ": [
Validation Test Case




                                                                                                9     " s1SubsetEqOfs2 ": true
                        7   {
                                                                                               10   }
                        8     " description ": " valid items " ,
                                                                                               11 }
                        9     " data ": [ 1 , 2 , 3 ] ,
                       10     " valid ": true
                       11   },
                                                                                                1   {
                       12   {
                                                                                                2       " schema1 ": {
                       13     " description ": " wrong type items " ,
                                                                                                3         " allOf ": [




                                                                           Cont. Test Case 2
                       14     " data ": [1 , " x "] ,
                                                                                                4           { " items ": { " type ": " integer " } } ,
                       15     " valid ": false
                                                                                                5           { " not ":
                       16   },
                                                                                                6             { " items ": { " type ": " integer " } }
                       17   {
                                                                                                7           }
                       18     " description ": " ignore non - arrays " ,
                                                                                                8         ]
                       19     " data ": {" foo " : " bar "} ,
                                                                                                9       },
                       20     " valid ": true
                                                                                               10       " schema2 ": false ,
                       21   }
                                                                                               11       " tests ": {
                       22 ]
                                                                                               12         " s1SubsetEqOfs2 ": true ,
                       23 }
                                                                                               13         " s2SubsetEqOfs1 ": true
                                                                                               14       }
                                                                                               15   }




Fig. 2. Left: Test case from the JSON Schema Test Suite [12] for testing JSON Schema
validators (for Draft 6, adapted from file items.json, commit hash #fe94275). Right:
Derived test cases for JSON Schema containment. Dashed arrows indicate data flow.


JSON Schema Containment. In the following, we write S ⊆ S 0 to denote that
schema S is contained in S 0 , meaning that any JSON instance valid w.r.t. S is
also valid w.r.t. S 0 . We write S ≡ S 0 to denote schema equivalence, which can
be checked by double inclusion. While JSON Schema validation can be checked
in polynomial time, containment checking results to be EXPTIME-hard [6].


3                        The JSON Schema Containment Test Suite
We next describe how we construct our test suite for checking JSON Schema
containment, by programmatically deriving tests from the validation test suite.
Our main approach is to use the Boolean operators not (for negation), anyOf
(for disjunction), and allOf (for conjunction), to derive pairs of schemas such
that we have a clear understanding of their containment relationship.

Example 2. Let us consider Figure 2. To the left, we see the test case of the JSON
Schema Test Suite discussed earlier. To the right, we see two derived test cases:
Containment Test Case 1 is an alternative encoding of the first validation test
from the left: schema1 declaring an array constant with three integer members
(derived from line 9 on the left) is contained in schema2 (derived from line 3 on
the left). Line 9 states this relationship.
   Test Case 2 is based on a different idea. Line 3 requires that two subschemas
must be matched in a way that is unsatisfiable, the one in line 4 (taken from
the validation test), and its negation. In line 10, we declare the unsatisfiable
                                A Test Suite for JSON Schema Containment             23

schema false. Here, schema1 and schema2 are equivalent (see lines 12 and 13).
While this may seem obvious to human observers, it can be challenging for tools.

We next describe how we systematically derive our test cases.

Reflexivity. For each schema S, it holds that S ≡ S. While this might seem
trivial, it is nevertheless a challenge for existing implementations [8].

Validation. For each valid instance v of a schema S, we encode the validation:

                     { "const": v }        ⊆    S                                    (1)
                     { "const": v }        6⊆   { "not": S }                         (2)

   Given all valid instances v1 , . . . , vn from a validation test, we further derive

   { “anyOf”: [{ “const”: v1 }, ..., { “const”: vn }]                    }   ⊆   S

   Correspondingly, for each invalid instance i, we derive a test case

                            { “const”: i }          6⊆   S

Empty and universal language. Given a schema S, we derive an unsatisfiable
schema, i.e., a schema equivalent to the schema false.

               { “allOf”: [S, { “not”: S }] }                ≡   false

    Similarly, we declare a schema that is universally satisfied (true), by replac-
ing the conjunction allOf by the disjunction anyOf:

                { “anyOf”: [S, { “not”: S }] }               ≡   true

Test case statistics. We provide test cases for all adopted drafts (six at the time
of writing) of JSON Schema. Specifically, for Draft 6, we derive 2 120 schema
pairs. In 60%, the first is a subset of the second. By not exclusively generating
test cases where the first schema is contained in the second, containment checkers
that always declare the first schema to be a subset of the second will fail in 40%.

Putting the Test Suite to the Test. We have used our test suite to assess the
language coverage of our own containment checker [2], as well as of two preceding
implementations [8]. For all tools, we found gaps in coverage. For our own tool [2],
we identified a failure rate of 25%. Discussing the problematic schemas in detail
is beyond the scope of this demo proposal. Overall, as the test suite was able to
reveal actual problems, we find it vastly helpful for our purposes.
24        Attouche et al.

4      Demonstration Scenario
We next outline our planned demonstration scenario.
1. We introduce our audience to the JSON Schema language and its semantics.
   We illustrate how this language can capture complex conceptual models.
2. We review the JSON Schema Test suite, originally designed for validation
   experiments. We then show how we derive containment tests.
3. We engage our attendees in a mini game, where we employ our interactive
   schema management tool Josch [9]1 . We pre-load the test schemas in Josch,
   and show them side-by side, as seen in Figure 1 ( 1 and 2 ).
     – We ask our attendees to vote (e.g., using an online polling tool) whether
       they regard one schema to be a subschema of the other.
     – We resort to existing tools for checking JSON Schema containment ( 3
       in the screenshot), to compare the shown schemas. Our attendees will
       notice that there are tests where the tool implementations do not agree.
     – We reveal the answer to containment checking, based on our test suite.
   We plan our mini game as an entertaining and engaging way to quickly
   gain a certain level of understanding of the JSON Schema language, and
   to recognize the need for a well-principled schema containment test suite.
   We hope that by participating in our demo, the demo attendees will find
   inspiration for their own research in the conceptual modeling field.

Acknowledgments. We thank Luca Escher (University of Passau) for code quality as-
surance and Wolfgang Mauerer (Technical University of Applied Sciences Regensburg)
for sharing the LaTeX template used to create Figure 2.


References
 1. JSON      Schema      Validators.   Available     at:   https://json-schema.org/
    implementations.html (2021)
 2. Attouche, L., Baazizi, M.A., Colazzo, D., Falleni, F., Ghelli, G., Landi, C., Sartiani,
    C., Scherzinger, S.: A Tool for JSON Schema Witness Generation. In: Proc. EDBT.
    pp. 694–697 (2021)
 3. Baazizi, M.A., Colazzo, D., Ghelli, G., Sartiani, C.: Schemas And Types For JSON
    Data. In: Proc. EDBT’19. pp. 437–439 (2019)
 4. Baudart, G., Hirzel, M., Kate, K., Ram, P., Shinnar, A.: Lale: Consistent auto-
    mated machine learning. In: KDD Workshop on Automation in Machine Learning
    (2020), https://arxiv.org/abs/2007.01977
 5. Bazizi, M.A., Colazzo, D., Ghelli, G., Sartiani, C., Scherzinger, S.: An Empirical
    Study on the “Usage of Not” in Real-World JSON Schema Documents. In: Proc.
    ER (2021)
 6. Bourhis, P., Reutter, J.L., Suárez, F., Vrgoc, D.: JSON: Data model, Query lan-
    guages and Schema specification. In: Proc. PODS. pp. 123–135 (2017)
 7. Ebdrup, A.: JSON Schema Benchmark. Available at: https://github.com/
    ebdrup/json-schema-benchmark, version of commit hash: #e9c884f. (2021)
1
     Note that our demo proposal does not feature Josch as its contribution, but our
     JSON Schema containment test suite. We merely use Josch as a convenient editor.
                                A Test Suite for JSON Schema Containment          25

 8. Fruth, M., Baazizi, M.A., Colazzo, D., Ghelli, G., Sartiani, C., Scherzinger, S.:
    Challenges in Checking JSON Schema Containment over Evolving Real-World
    Schemas. In: Proc. EmpER. pp. 220–230 (2020)
 9. Fruth, M., Dauberschmidt, K., Scherzinger, S.: Josch: Managing Schemas for
    NoSQL Document Stores. In: Proc. ICDE. pp. 2693–2696 (2021)
10. Habib, A., Shinnar, A., Hirzel, M., Pradel, M.: Finding Data Compatibility Bugs
    with JSON Subschema Checking. In: ISSTA. p. 620–632 (2021)
11. json-schema-org: JSON Schema (2021), available at: https://json-schema.org
12. json-schema.org: JSON Schema Test Suite. Available online at https://github.
    com/json-schema-org/JSON-Schema-Test-Suite, commit hash: #09fd353. (2021)
13. Pezoa, F., Reutter, J.L., Suárez, F., Ugarte, M., Vrgoc, D.: Foundations of JSON
    Schema. In: Proc. WWW. pp. 263–273 (2016)
14. Tozawa, A., Hagiya, M.: XML schema containment checking based on semi-implicit
    techniques. In: CIAA. pp. 213–225 (2003)