=Paper= {{Paper |id=Vol-3166/paper08 |storemode=property |title=AgriChain: Blockchain Syntactic and Semantic Validation for Reducing Information Asymmetry In Agri-Food |pdfUrl=https://ceur-ws.org/Vol-3166/paper08.pdf |volume=Vol-3166 |authors=Pierluigi Gallo,Federico Daidone,Filippo Sgroi,Mirko Avantaggiato |dblpUrl=https://dblp.org/rec/conf/itasec/GalloDSA22 }} ==AgriChain: Blockchain Syntactic and Semantic Validation for Reducing Information Asymmetry In Agri-Food== https://ceur-ws.org/Vol-3166/paper08.pdf
AgriChain: Blockchain Syntactic and Semantic
Validation for Reducing Information Asymmetry In
Agri-Food
    Pierluigi Gallo1,2,3,∗ , Federico Daidone2 , Filippo Sgroi4 and Mirko Avantaggiato5
1
  Department of Engineering, University of Palermo, 90128 Palermo, Italy
2
  SEEDS s.r.l., academic spin-off of the Dept of Engineering at the University of Palermo, 90141 Palermo, Italy
3
  CNIT, Consorzio Nazionale Interuniversitario per le Telecomunicazioni, Italy
4
  Department of Agricultural and Forestry Sciences, University of Palermo, Palermo, 90128, Italy
5
  former employee of SEEDS s.r.l., 90141 Palermo, Italy


                                         Abstract
                                         Information asymmetry affects the actors of all the segments of the agri-food supply chain and can
                                         arise many problems in the market along the production chain. Transactions of agri-food products are
                                         asymmetric because suppliers and buyers have different levels of knowledge on the provenance, value,
                                         quality, and freshness of food. Collusive relations among the agri-food chain actors, especially between
                                         controllers companies and controlled ones, can cause market failures as they influence customers’
                                         purchase decisions and severe health accidents when food safety is compromised. This paper proposes
                                         using blockchain technology to combat information asymmetry and collusive relations. In addition to
                                         transparency, cryptography and trusts, which are natively provided by the blockchain, our approach
                                         provides a twofold mechanism for validating crowd sensed data: first, a lightweight syntax validation is
                                         run before writing data in the blockchain (providing accountability also thanks to immutability); then, a
                                         dedicated smart contract runs semantic validation in scenarios with multiple data sources. This semantic
                                         validation may reveal collusive behaviours, downgrade colluding nodes and exclude or down-weight
                                         their data in future validations. The smart contract seals data that pass both validations adding metadata
                                         on data quality. Results prove the feasibility of our solution on Hyperledger Fabric under the assumption
                                         that the majority of nodes are honest. Experimental results demonstrate that our implementation of the
                                         twofold validation using smart contracts scales well with the dimension of the blockchain state. Our
                                         mechanism may greatly impact Product Certification and Designation of Origin as it may be applied to
                                         check specific requirements for raw materials, products, and production processes and protect from the
                                         collusion of controlling consortia and certification bodies.

                                         Keywords
                                         Agri-food, economy, blockchain, smart contract, information asymmetry, validation




1. Introduction
In the globalised society, and above all in the developed economies, the quality and safety
of agri-food productions have received increasing attention from the consumer as a result of
the evolution intervened in recent years, in terms of production and marketing of products of

DLT 2022: 4th Distributed Ledger Technology Workshop, June 20, 2022, Rome, Italy
∗
    Corresponding author.
Envelope-Open pierluigi.gallo@unipa.it ( Pierluigi Gallo)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                      103
vegetable and animal origin in a fresh and transformed state. This structural and functional
evolution of the sector is mainly due to some aspects. In fact, in the agri-food sector, there
are new products with differentiated and differentiable characteristics, including agricultural
commodities, vegetable and animal productions, highly-processed and high-service products.
   Moreover, in the agri-food system, there is a strong integration of the productive sector with
the final consumer market in terms of information flows, knowledge of markets, and consumer
needs and expectations. The economic and technical literature reports the growing importance
of food quality and safety. These concepts are related to brands, information transparency,
traceability of production and commercial chains, the fight against counterfeiting, and food
fraud [24, 9]. However, in countries with high per capita income, the current health and
nutritional needs, expressed by new lifestyles, determine a rethinking of production protocols
that are increasingly attentive to the problems of resource sustainability and the protection of
environmental ecosystems and biodiversity. Finally, the continuous evolution of consumers’
tastes and preferences expressed over time by the variations in the demand should not be
overlooked. To manage this new scenario, the public operator and institutional figures have
provided regulations at national and international levels, disciplinary and production controls,
certifications and quality protection, international agreements, and trading platforms. However,
modern technologies require the adoption of systems that can support themselves by minimising
human intervention in data collection and certification processes. In this context, information
availability becomes fundamental for consumers because they quest for valuable information to
perceive and evaluate the quality of products, recognise the added value, and increase willingness
to buy or pay more. For all these reasons, this paper aims to analyse the role of information and
novel ICT technologies in creating higher standards of quality and improving the functional
efficiency of agri-food production markets by reducing information asymmetries on the demand
side.
   The contribution of this paper joins together economy and computer science; first, we explain
the economic and technical implications of the information asymmetry in the agri-food market,
then explore a possible solution to reduce such an asymmetry using blockchain technology.
Blockchain has intrinsic traits such as transparency, trust, and traceability; these features help
to solve the information asymmetry but, alone, they are not enough to guarantee the data
accuracy and validity. Blockchain technology provides data immutability, accountability, and
traceability, but it does not guarantee the data quality. In the agri-food sector, data quality is
the cornerstone; therefore, a blockchain-based AgriChain platform for data quality is necessary.
Using blockchain and smart contracts and applying a novel data validation methodology, we
combat information asymmetry and its negative influence on the net value of investments, the
ranking of agri-food companies and their capability to access credit for financing their activities.
AgriChain uses multiple data sources, in which data are analysed by a set of smart contracts
implementing a two-step validation logic (syntactic and semantic). The syntax validation
works before data are written on the blockchain; it checks both the data and the user’s
identity and guarantees the accountability of the written information. Then, AgriChain smart
contract applies a semantic validation that works after data are written on the blockchain
and ‘seals’ them. This validation smart contract fights information asymmetry, providing
transparency and data accuracy.
   The distortions of information asymmetry in the food market are described in Section 3. The


                                              104
actors and the roles in the agri-food supply chain are discussed in Section 4. Section 6 describes
the AgriChain methodology for validating and assessing data quality. Experimental setup and
results are presented in Section 7, then related works and conclusions are drawn, respectively
in Sections 8 and 9.


2. Background
This section briefly introduces the key elements of the proposed architecture, namely ontology
and blockchain.

2.1. Ontology
In computer science, ontology is a way to represent semantics (the meaning) through the
definition of categories, properties and relationships expressed through description logic [12].
An ontological approach enables or simplifies deductive reasoning, classification, problem-
solving, and the simplification of information exchange among systems. Deductive reasoning is
entrusted to the semantic reasoner, software capable of carrying out reasoning on formalised
knowledge bases. It is capable of elaborating the knowledge base according to some rules to
validate and analyse the knowledge base itself and, therefore, infer logical consequences. In 1999,
the W3C adopted the Resource Description Framework (RDF), which became standard in 2004.
RDF is a data model used to represent ontologies; the atomic data entity is the semantic triple, a
set of three entities: subject-predicate-object. Triples represent a statement on semantic data
(e.g., “Alice is 30”, “Alice knows Bob”). SPARQL Protocol and RDF Query Language (SPARQL) is
a SQL-like query language for receiving and manipulating RDF data. An implementation of
SPARQL is included in Apache Jena, a Java framework for developing semantic web-oriented
applications that include a SPARQL endpoint and supports a specific serialisation format named
Turtle (Terse RDF Triple Language). RDF data validation is entrusted to Shapes Constraint
Language (SHACL), which includes a list of constraints such as cardinality, range of values, etc.
[7].

2.2. Blockchain
Blockchain is a distributed technology that allows for addition-only data storage. Each member
of the distributed network (node) has its data replica on which it tracks every resource exchange
(transaction) between participants. The transactions are grouped into blocks, linked together
through a content hash, to form a chain. Members participate in the validation of transactions
in order to add them to the blocks through a distributed consensus algorithm. There are several
types of protocols, the most famous being Proof of Work (PoW), Proof of Stake (PoS), and
Byzantine fault tolerance (BFT). Ethereum was the first blockchain platform that introduced
smart contracts, small programs for validating transactions and performing the computation in
a distributed way. Ethereum is a permissionless blockchain where anyone can participate in
the network and participate in the consensus protocol.
   Conversely, there are permissioned blockchains, such as Hyperledger Fabric (HLF), where
participants need special permissions to be part of it. HLF is part of the broader Hyperledger


                                             105
framework, which includes other distributed ledgers, libraries and tools, and the Linux Foun-
dation supports it. Here the smart contracts are called chaincodes and enable to read (query
operation) and write (invoke operation) the ledger. The ledger is included in a channel; nodes
that participate in this channel can read, write and invoke smart contracts. An HLF instance
can manage multiple channels and, therefore, multiple ledgers, defining different levels of scope
for each node.
   Since version 2.0, HLF supports chaincodes as an external service. In this case, the chain
code management is independent of the node and allows us to define an endpoint where it is
executed.1 In this endpoint, we can also run more complex services, which the chaincode is
capable of invoking, such as in [18] where external chaincodes are used to query external data
sources. The call can be made in the single execution of the chaincode, or in case of longer
processing times, the chaincode can exploit the oracle paradigm [5]. In this case, the chaincode
emits an event that the service intercepts to start the computation of the request. When the
service has finished the processing, it returns the output to the chaincode.


3. Information asymmetry and market distortion
From an economic point of view, it is well known the possibility to score the perceived quality
of food products using a scale that spans from optimal to poor without interfering with its
potential edibility. However, the hygienic and sanitary safety of the products to the final
consumer markets is challenging to evaluate. Consumers have shown great interest in features
defining food quality, thanks to an excellent spending capability and a more sensitive contest
than in the past. Food quality is a multidimensional and dynamic concept [14]. Quality is “a
complex value whose definition involves objective and subjective components. For this reason,
quality is not a characteristic that can be immediately described or identified. However, it is
an idea that each of us has concerning what we need to satisfy a specific need. The more the
characteristics of a product correspond to the complex expectations we have concerning it,
the more we will be inclined to consider its quality” [25]. It becomes essential to deepen the
analysis on the perception of qualitative aspects, combining technical quality indicators with
measures and models of customer satisfaction interpretation in the information economy’s
theoretical context. Indeed, placing on the market certified quality products is reflected in an
increase in production costs and therefore in prices. Certification requires an estimation of the
economic value attributed to the quality perceived by the customers and the evaluation of the
premium price concerning the different and greater willingness to pay.
   Information is an element that affects the functioning mechanisms of the markets, providing
a twofold perspective. On the one hand, the “control” and the “management” of the information
asymmetry between supply and demand, through the policy of trademarks, certifications, and
labelling of agri-food productions. On the other hand, national and international public and
private organisations and institutions preside over voluntary standardisation and establish
rules and procedures for controlling market transaction costs. They check company and
collective brands as precise quality signals, signals of value and contribute to strengthening


1
    Available at https://hyperledger-fabric.readthedocs.io/en/release-2.4/cc_service.html


                                                        106
the necessary operating conditions for the exchange, contributing to the reduction of the
information asymmetry typical of imperfect markets [1, 20].
   The quality of food production and the economic efficiency of the markets are closely connected
and correlated to the growing role of information. This type of situation does not always safeguard
the security and correctness of the information and the ability to choose given to informed
consumers. From the point of view of the economic production efficiency of the markets, these
elements contribute to creating a sort of functional distortions of the agri-food markets that
can prevent their correct functioning under the profile of economic theory. These specific
conditions seem to simultaneously produce disadvantages for producers and consumers in
terms of the natural relationship between supply and demand, oriented to the balance of short
and long term markets.


4. AgriChain actors and roles
The agri-food supply chain is composed of segments that cooperate to evolve the production
process from field to fork. Information asymmetry typically manifests in the last segment of
the supply chain affecting final customers but, in many cases, also influences other actors. The
various segments concur to a holistic view of the good, including production and transformation
processes. In case of partial or inaccurate information, two consecutive parts of the supply
chain (e.g., production, transportation, transformation, stock) may experience information
asymmetry too. For example, farmers know the history of the grain they grow - origin, timing,
and treatments. This information may be hidden to the miller, whose knowledge is limited to
storage in silos and the milling process. The same issue related to lack of knowledge occurs
between miller and distributors and, more in general, in all the steps between different actors.
The chain of value and responsibility that links those actors from farm to fork is affected by
information asymmetry in all its links.
   Farmers and industries need prompt and trusted information to make better decisions for
growing or transforming agri-food products. The introduction of blockchain in the agri-food
sector has represented a digital innovation aimed at increasing business income by reducing
production inputs (and therefore of costs expressed at constant prices) and increasing the
outputs (the quantity produced and therefore of revenues expressed at constant prices). Digital
innovation is always aimed at increasing the company’s competitiveness and technical and
economic efficiency by optimising production factors and reducing variable costs. For example,
accurate information on the state of plants brings to savings of water for irrigation, avoiding
unnecessary wastes. The same happens for fertilisers and pesticides with knowledge on seasonal
trends and infections. These decisions change the structure of production costs and positively
affect the entrepreneur’s net income.
   The information asymmetry negatively influences production and marketing choices, and
the potential problems along the supply chain may lead to market failure. An important
issue is related to product certification about the designation of origin. Such certifications
are characterised by strict requirements and are guaranteed by consortia and certification
bodies. However, between the controlling and controlled entities may arise collusive relations,
which are then difficult to discover and strongly affect the market. A recent example is given


                                              107
                                        Apache Jena
                                    SHACL                            SPARQL
                syntactic validator semantic validator               endpoint

              input
               data                       5       4
                                                                 Metadata
                                                                 channel
                                3
                                       semanticSC
                            2
                      1              syntacticSC
                                                                   Data
                                              blockchain peer     channel
Figure 1: Blockchain node representation of our validation system. It includes a SPARQL endpoint, a
syntactic validator, and a semantic validator


the production of ham under the Protected Designations of Origin (PDOs) “San Daniele” and
“Parma”, which require the use of a specific breed of pigs. However, a collusive system within
the protection consortium eluded controls on the seed of the pigs and, in contrast with the
production disciplinary, put on the shelves products whose PDO was not valid. The effects
of information asymmetry apply both to product quality and health, as in the cases such as
pistachio, whose origin has implications in terms of aflatoxin and ochratoxin and may cause
risks for consumers’ health [21]. This example shows that information asymmetry may have
different facets. The consumer needs to know a product’s provenance, but this information is
not sufficient if it is not linked to the risks of products from a specific area.


5. Validation Architecture
To combat the information asymmetry, we provide AgriChain, a blockchain-based platform for
semantic and syntactic validation that executes external smart contracts on HLF (see, Section
2.2). In this way, within a blockchain node, we can run complex services such as Apache
Jena, a free and open-source Java framework for building semantic applications [3], otherwise
impossible to be implemented as legacy smart contracts. It includes a SPARQL endpoint, i.e.
Fuseki, and a syntactic and semantic validator, i.e. SHACL. As shown in Figure 1, each node runs
the two smart contracts in yellow that interface with the services mentioned above. We use HLF
channels to separate the essential information from metadata and facilitate their operations.
Semantic validation works on datasets rather than on a single transaction. The traceability
information is stored in the data channel, while the metadata channel is used to store useful
elements for the validation operations.



                                              108
5.1. Syntactic validation
The syntactic validation takes place before storing the information on the blockchain, imple-
menting filtering on the single input data. The smart contract syntacticSC (see, 1 in Figure 1)
receives as input the data and performs a signature validation. This step is shown in Algo-
rithm 1, where 𝑠𝑦𝑛𝑡𝑎𝑐𝑡𝑖𝑐𝑆𝐶 takes 𝑑𝑎𝑡𝑎𝐼 𝑛 as input parameter and passes it to 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒𝑉 𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛()
function (see, Line 2). If that check is successful, we continue with syntactic validation, calling
𝑠𝑦𝑛𝑡𝑎𝑡𝑖𝑐𝑉 𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛() function (see, Line 3), as described in Section 6, to invoke the syntactic
validator service (see, 2 in Figure 1). This validation has to be customised as needed and depends
on the context of the application, for example, to verify that a “𝑤𝑒𝑖𝑔ℎ𝑡” field has a numeric
value expressed in 𝑘𝑔. When the validation is successful, we map 𝑑𝑎𝑡𝑎𝐼 𝑛 into a 𝑑𝑎𝑡𝑎𝑂𝑢𝑡 format
(see, Line 4) valid to be loaded on SPARQL endpoint (see, 3 in Figure 1). We assume that the
reference ontology is preliminary written on the blockchain and imported into the SPARQL
endpoint before starting the data collection process. Writing the ontology on the blockchain
guarantees interoperability and transparency in the definitions of products and links between
them. Finally, we also store 𝑑𝑎𝑡𝑎𝑂𝑢𝑡 on data channel (see, Lines 5 and 6).

Algorithm 1 syntacticSC
Require: 𝑑𝑎𝑡𝑎𝐼 𝑛 as input data
 1: procedure syntacticSC(𝑑𝑎𝑡𝑎𝐼 𝑛)
 2:    if signatureValidation(𝑑𝑎𝑡𝑎𝐼 𝑛) then
 3:        if syntacticValidation(𝑑𝑎𝑡𝑎𝐼 𝑛) then
 4:            𝑑𝑎𝑡𝑎𝑂𝑢𝑡 ← mapping(𝑑𝑎𝑡𝑎𝐼 𝑛)
 5:            putSPARQL(𝑑𝑎𝑡𝑎𝑂𝑢𝑡)
 6:            putBC(𝑑𝑎𝑡𝑎𝑂𝑢𝑡)
 7:        end if
 8:    end if
 9: end procedure




5.1.1. Semantic validation
The semantic data validation process uses SHACL shapes, deriving from the ontology.2 We
assume that they are already present on the blockchain and used by the smart contract seman-
ticSC. The semanticSC, as shown in Algorithm 2, receives as input the parameters 𝑞𝑢𝑒𝑟𝑦, that is,
the SPARQL query which determines the subject of the validation, and 𝑖𝑑𝑆ℎ𝑎𝑐𝑙, the identifier of
a shape stored on the blockchain used in the validation. When this smart contract is invoked, it
retrieves the 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 from the SPARQL endpoint (see, 4 in Figure 1), using 𝑔𝑒𝑡𝑆𝑃𝐴𝑅𝑄𝐿() function
with 𝑞𝑢𝑒𝑟𝑦 parameter (see, Line 2). Similarly, we retrieves the 𝑠ℎ𝑎𝑐𝑙 shape from blockchain with
𝑔𝑒𝑡𝐵𝐶() function (see, Line 3). Then we forward 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 and 𝑠ℎ𝑎𝑐𝑙 shape to SHACL validator (see,
5 in Figure 1). Here, the 𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐𝑉 𝑎𝑙𝑖𝑑𝑎𝑡𝑜𝑟() function calls the semantic validator service (see,
Line 4) which performs a semantic validation and gives back the 𝑟𝑒𝑠𝑢𝑙𝑡. Now, 𝑟𝑒𝑠𝑢𝑙𝑡, along with

2
    We generated the SHACL shapes using the Astrea tool, https://astrea.linkeddata.es/. The shapes have been tuned,
    and we added missing validation elements from the ontology, such as the cardinality range.


                                                       109
𝑑𝑎𝑡𝑎𝑠𝑒𝑡, are examined by a calculateScore() function (see, Line 5) which scores the validation
performed. The single application defines the calculation of the score and its metric; for example,
the closer the harvesting coordinates of different olives are, the more accurate the result that
the crop belongs to an exact agricultural land. At the end, 𝑞𝑢𝑒𝑟𝑦, 𝑖𝑑𝑆ℎ𝑎𝑐𝑙, 𝑟𝑒𝑠𝑢𝑙𝑡, and 𝑠𝑐𝑜𝑟𝑒 are
written on metadata channel as proof via 𝑝𝑢𝑡𝐵𝐶() function (see, Line 6).

Algorithm 2 semanticSC
Require: 𝑞𝑢𝑒𝑟𝑦 as input query for SHACL validation
Require: 𝑖𝑑𝑆ℎ𝑎𝑐𝑙 as input id of SHACL shape
 1: procedure semanticSC(𝑞𝑢𝑒𝑟𝑦, 𝑖𝑑𝑆ℎ𝑎𝑐𝑙)
 2:    𝑑𝑎𝑡𝑎𝑠𝑒𝑡 ← getSPARQL(𝑞𝑢𝑒𝑟𝑦)
 3:    𝑠ℎ𝑎𝑐𝑙 ← getBC(𝑖𝑑𝑆ℎ𝑎𝑐𝑙)
 4:    𝑟𝑒𝑠𝑢𝑙𝑡 ← semanticValidator(𝑑𝑎𝑡𝑎𝑠𝑒𝑡,𝑠ℎ𝑎𝑐𝑙)
 5:    𝑠𝑐𝑜𝑟𝑒 ← calculateScore(𝑑𝑎𝑡𝑎𝑠𝑒𝑡,𝑟𝑒𝑠𝑢𝑙𝑡)
 6:    putBC(𝑞𝑢𝑒𝑟𝑦, 𝑖𝑑𝑆ℎ𝑎𝑐𝑙, 𝑟𝑒𝑠𝑢𝑙𝑡, 𝑠𝑐𝑜𝑟𝑒)
 7: end procedure


   The reasonerSC interfaces the blockchain with the reasoning service. When this smart
contract is invoked, it queries the SPARQL endpoint (see 3 in Figure 1) to obtain the dataset to
forward to the reasoner, indicated with 4). When the reasoner finishes its processing (see 5),
the smart contract stores the result on the blockchain. If the result leads to new inferred triples
from the initial dataset, the new data is updated in the SPARQL endpoint invoking syntacticSC.
In such a case, the initial data are stored on the data channel, and the inferred information goes
on the metadata channel.


6. AgriChain validation methodology
The agri-food sector includes multiple supply chains for the different agricultural products:
tomatoes, wine, dairy, olive oil, etc. These supply chains involve many actors with different roles,
and in most cases, they hold contrasting interests. Agricultural entrepreneurs, transformation
industries, transport, logistics, and great and small distribution are exemplary actors that appear
in many agri-food supply chains. However, any chain has its peculiar actors with specific needs
and roles. For example, in the simplified model of the olive oil supply chain shown in Figure 2,
there are farmers, olive growers’ cooperatives, warehouses, shops, and customers as the main
actors. These actors typically provide data through human operators, which are not trusted by
default. To solve the problem of mistrusted operators, the authors propose to use IoT devices.
However, this strategy shifts the point of trust from humans to IoT devices. IoT sensors are
owned and maintained by those actors indicated above and can be maliciously manipulated
according to their specific interests. To guarantee data quality, AgriChain leverages the double
validation indicated above, invoking dedicated smart contracts.
   The input syntactic validation, performed by smart contract syntacticSC (see, Pseudocode 1),
checking that the transaction contains specific fields, as exemplary shown in Listing 1, including
the actor’s signature. The smart contract checks multiple signatures if multiple actors are


                                              110
                                                                                                            semantic representation
                                                                                                               (classes, attributes,
                                                                                                              relations, events, ...)
                  olive        milling                oil                        selling   bottle
                                                                                                                   product and
                                                                                                                    processes


                                                                                                                     actors

    farmers                olive growers'         warehouse                      shops              customers
                            cooperative
                                                                                                                 IoT devices and
                                                                                                                  operators (data
                                                                                                                     sources)




                                                                  smart contract
                           untrusted data                         validation logic
                                                                                                    blockchain         trust
                                                                                                                  transparency
                                                                                                                    traceability



Figure 2: Data sources for AgriChain and smart contracts for data validation. Syntax-validated
transactions are in gray, those semantically validated are in white.


involved in the transaction. This validation is performed on a transaction before being written
on the blockchain. This preliminary validation guarantees accountability because each piece of
data is linked to an accountable entity, but still, it does not protect from the ‘garbage in, garbage
out’ problem. In other words, this lightweight syntax validation checks the identity of the data
provider, the timestamp, and other metadata without guaranteeing ‘semantic’ validity.
{
         ” actor ”: {
                 ” signature ” : ” ebf3d6a0e54d249ff . . . ”                },
         ” res_details :”: {
                 ” res_name ” : ” o l i v e s 0 1 @ f i e l d 0 1 ” ,
                 ” hasGeoTag ” : t r u e ,
                 ” hasWeight ” : t r u e } ,
         ” data ” : {
                 ” lat ”: 38.120240 ,
                 ” lon ” : 13.357388 ,
                 ” kg ” : 10 } ,
         ” t s ” : ” 2 02 0 − 05 − 3 0 T16 : 0 6 : 4 4 + 0 1 : 0 0 ”
}
                  Listing 1: Syntactic validation - Fields extracted from the transaction.

  The second check involves both syntactic and semantic validation; in what follows, we stress
the semantics aspects. Here, the smart contract semanticSC (see, Pseudocode 2) takes care of the
validation on a more extensive set of data that, grouped, have a special meaning; the validation
logic depends on the specific supply chain and the meaning of data, in our experiments we
focused on the geographical origin of the olive oil product. Unlike the typical blockchain
validation, our semantic validation is performed after the data is written on the blockchain, it is


                                                            111
Figure 3: Exemplary semantic validation for geo coordinates of extra-virgin olive oil origin provided by
120 actors. The majority cluster (67 dots) is in yellow, noisy measures in black (30 elements), colluding
nodes (23) in red.


triggered by new data arrivals that are semantically linked to the previous ones. For example,
the geographic coordinates provided by several harvesting operators through their smartphones
and IoT devices with GPS receivers are in Listing 1, providing the location of the product and
farm field01. As shown in Figure 3, the syntax validation smart contract uses clustering to
estimate the position (the mean of the majority cluster) from malicious and colluding nodes (in
red).

6.1. Costs and benefits of the proposed solution
When an agri-food related business choices to use blockchain technology to implement its food
supply chain in some or all aspects, it is choosing to undergo some change. Change is not
always good for business, so why should a business decide to switch to a blockchain-based
solution? Because using blockchain expresses the company care about transparency, thus
inspiring old customers to possibly buy more products and/or new ones to switch from another
brand to this one. Of course, every kind of IT infrastructure comes with costs of installation and
maintainability. We propose those costs to be proportionally assigned to the 𝑛 involved actors.
This solution could be thought of as a blockchain-based pay-per-use like a subscription system.




                                                112
Figure 4: Protégé class hierarchy overview for sb:OliveOil.


7. Experimental Setup and Results
Part of the platform presented in this paper was proposed within the DEMETER 3 project, which
leads the digital transformation of the European agri-food sector through the rapid adoption of
advanced IoT technologies, data science and smart farming, ensuring its long-term viability and
sustainability. Our blockchain currently runs within the DEMETER ecosystem, and the project
partners can invoke its services. A fundamental part is the semantic model, used as a common
language between different project entities. It is based on the GS1 vocabulary, extended, revised
and refined to be able to describe an entire supply chain. We exemplary show the olive oil
supply chain (see, Figure 4), where we have extended the gs1:FoodBeverageTobaccoProduct 4
class with sb:OliveOil 5 to be able to map the entire process. In addition to the interoperability
offered by the semantic model and its mappings with other ontologies, the platform offers APIs
compliant with the OpenAPI standard. Seeing the generality of the platform, we, as a case
study, have implemented the validation of olive harvesting in the olive oil supply chain. Within
the SHACL validator, we have added a clustering algorithm, the DBSCAN [19], to calculate the

3
  Available at https://h2020-demeter.eu/
4
  Available at: https://www.gs1.org/voc/FoodBeverageTobaccoProduct
5
  Available at: https://seedsbit.com/ontology/#OliveOil


                                                  113
                                                      (a)




                                                      (b)




                                                      (c)




Figure 5: Time spent by the chaincode for clustering geographical points with DBSCAN for 100
consecutive invokes (a); Average number of points to create at least one cluster (in red) (b); Number of
entries written on the blockchain (c). One typical run is depicted in green; the average value on 15 runs
appears in blue. When a new pair of coordinates is added, the smart contract is triggered; the 𝑖th call
works on a bigger state than the 𝑖 − 1th.


proximity of the harvested olives to the soil. Our blockchain platform of choice to illustrate our
work is Hyperledger Fabric, although the SeedsBit platform uses multiple blockchain platforms,
including MultiChain and Ethereum.
   As introduced in Section 6, we used Hyperledger Fabric to implement our model partly and
to give some experimental results in terms of performances. Our test network was composed of
two Fabric organisations, having two peers each. Moreover, we used the RAFT algorithm [17],
which is the default consensus protocol for Hyperledger Fabric. RAFT is a CFT (Crash-Fault
Tolerant), but it can be easily substituted with a BFT (Byzantine Fault Tolerant) as Fabric has a
modular approach to the consensus protocol [4].
   Thus we had five nodes running for consensus purposes. The blockchain was deployed on a
single host configuration on a machine with the following specs: Intel© Xeon© CPU E5-1660 v3
@ 3.00GHz with 32 gigabytes of RAM. Figure 5 shows, out of 100 consecutive invocations of the
smart contract semanticSC, the time spent by the DBSCAN algorithm for clustering (see, Figure
5a), the number of clusters found (see, Figure 5b), and the number of entries used by DBSCAN
(see, Figure 5c). At each invocation, we assume that the number of entries has increased by 1
unit, so syntacticSC has inserted a new entry into the blockchain. We can see how the analysis
of 100 points, the most computationally expensive part, uses about 8 ms, with is compatible with
the smart contract execution. The clusterisation of the terrain, with about 75 points, required 6
ms.



                                                114
8. Related work
The problem of information asymmetry in food traceability has multiple facets that have been
traditionally tackled singularly and using old paper documents and product specifications. Our
approach towards information asymmetry is to improve transparency under multiple points of
view: economy, blockchain technology, data quality.
    From an economic point of view, it is well known the possibility to score the perceived
quality of food products using a scale that spans from optimal to poor without interfering with
its potential edibility. However, the hygienic and sanitary safety of the products to the final
consumer markets are challenging to evaluate. Consumers have shown great interest in features
defining food quality, thanks to a greater spending capability and a more sensitive contest
than in the past. Food quality is a multidimensional and dynamic concept [14]. Quality is a
complex feature made by objective and subjective components. For this reason, quality cannot
be immediately described or identified, but it is a subjective idea that involves personal needs.
The more the characteristics of a product match our expectations, the more we will be inclined
to consider its quality [25]. It becomes important to deepen the analysis on the perception
of qualitative aspects, combining technical quality indicators with measures and models of
customer satisfaction interpretation in the information economy’s theoretical context. Indeed,
placing on the market certified quality products is reflected in an increase in production costs
and therefore in prices. Certification requires an estimation of the economic value attributed
to the quality perceived by the customers. This requires the evaluation of the premium price
concerning the difference and greater willingness to pay. Information is an element that affects
the functioning mechanisms of the markets, providing a twofold perspective. On the one hand,
the “control” and the “management” of the information asymmetry between supply and demand,
through the policy of trademarks, certifications and labelling of agri-food productions. On
the other hand, national and international public and private organisations and institutions
preside over voluntary standardisation and establishing rules and procedures for controlling
market transaction costs. Company brands, collective brands, signals of quality and value work
as media communication and contribute to strengthening the operating conditions necessary
for the realisation of the economic exchange, contributing to the reduction of the information
asymmetry typical of imperfect markets) [1, 2, 20].
    From the point of view of the economic efficiency of the product markets, these elements
contribute to creating a sort of functional distortions of the agri-food markets that can prevent
their correct functioning under the profile of economic theory. These specific conditions
seem to simultaneously produce disadvantages for producers and consumers in terms of the
natural relationship between supply and demand, oriented to the balance of short and long
term markets. In fact, in [22, 10, 23, 13] many different ways to leverage blockchain technology
in this direction are illustrated. In [22] it is explained why a food traceability system based on
RFID and blockchain would be ideal in China after many food safety accidents happened. These
accidents were related to inadequate and primitive food supply chain management. In [10], the
typical steps and places of a blockchain-based food traceability system are shown. The authors
of [8] conclude their work stating that ‘there are still few uses to support that some properties
of blockchain implementation might be useful towards supply chain management’. In [13],
it is reported how Walmart - one of the biggest American corporations in the hypermarket’s


                                             115
field - in collaborations with IBM, reduced the time needed to track the origins of mango “from
seven days to 2.2 seconds”. These performances also show how blockchain is, without doubt,
a solution to at least consider when talking about food safety and food supply management.
The blockchain used in this pilot study was Hyperledger Fabric. Among others, we found the
high customisation possibilities offered by Hyperledger Fabric and its growing community
and scientific literature response and usage. We see in [11] that performance is not going
to be an issue at least in terms of transactions/second (the authors state that - after heavy
re-engineering - they reached 20000 transactions/second). On the other hand, in [16] we see
possible problems in critical scenarios if the blockchain physical network undergoes latency.
In addition to performance, the blockchain has been used for guaranteeing high-quality data
[26, 15].


9. Conclusion and Future Work
Quality of food production and the economic efficiency of the markets are closely connected and
correlated to the growing role of information. This type of situation does not always safeguard
the security and correctness of the information and the ability to choose given to informed
consumers. The central role of the agri-food sector requires quality of data because erroneous,
malicious, and missing information affect the food supply chain in terms of quality and safety.
This paper presented AgriChain as a mechanism for validating data syntactically before being
included in the blockchain and semantically before being sealed. These two validations are
executed through a distributed logic, implemented with one or more dedicated smart contracts.
Typically the blockchain is the preferred technology when seeking trust, transparency and
traceability among actors who do not trust each other or have contrasting interests. We demon-
strated how AgriChain goes beyond this vision on data management, breaking the simplistic
concept that data written on the blockchain are trustful because they have been validated
in advance. Indeed, Agrichain performs only a lightweight validation before including the
information into a block; this only guarantees accountability and syntax consistency. From the
semantic point of view, the second validation guarantees first data cleaning second data quality
assessment. AgriChain performs data cleaning applies clustering algorithms implemented as
smart contracts on data collected through crowd-sensing. Standard data cleaning methods
aim at detecting and removing repeated entries, detecting outliers, checking data volumes. In
general, such methods do not deal with malicious data sources. Then, AgriChain smart contract
checks accuracy, timeliness, completeness, uniqueness, and consistency [6] and provides KQIs,
Key Quality Indicators which are added, as metadata, as a data seal on the blockchain. This
paper presented a new methodology for using smart contracts to enforce a twofold validation
and guaranteeing the quality of data for food traceability.


Acknowledgments
The authors would like to thank the SNAPP laboratory of Security, Network Applications and
Positioning http://www.unipa.it/SNAPPLab/ at the Department of Engineering of the University
of Palermo, and SEEDS s.r.l. for experimenting on SeedsBit platform https://seedsbit.com/. This


                                            116
work has been partially supported by the H2020 EU DEMETER project https://h2020-demeter.
eu/.


References
 [1] George A. Akerlof. The market for ”lemons”: Quality uncertainty and the market mecha-
     nism. In Decision Science, pages 261–273. Elsevier, 2017.
 [2] Gervasio Antonelli. Unione Europea, qualità agro-alimentare e commercio mondiale. Oppor-
     tunità e minacce per i prodotti tipici delle Marche. QuattroVenti, 2001.
 [3] Apache. A free and open source java framework for building semantic web and linked
     data applications, 2022.
 [4] Artem Barger, Yacov Manevich, Hagar Meir, and Yoav Tock. A byzantine fault-tolerant
     consensus library for hyperledger fabric, 2021.
 [5] Abdeljalil Beniiche. A study of blockchain oracles, 2020.
 [6] Hongju Cheng, Danyang Feng, Xiaobin Shi, and Chongcheng Chen. Data quality anal-
     ysis and cleaning strategy for wireless sensor networks. Eurasip Journal on Wireless
     Communications and Networking, 2018(1), 2018.
 [7] Ben De Meester, Pieter Heyvaert, Dörthe Arndt, Anastasia Dimou, and Ruben Verborgh.
     Rdf graph validation using rule-based reasoning. Semantic Web, 12(1):117–142, 2021.
 [8] S. Matthew English and Ehsan Nezhadian. Application of Bitcoin Data-Structures & Design
     Principles to Supply Chain Management. arXiv preprint arXiv:1703.04206, 2017.
 [9] Huanhuan Feng, Xiang Wang, Yanqing Duan, Jian Zhang, and Xiaoshuan Zhang. Applying
     blockchain technology to improve agri-food traceability: A review of development methods,
     benefits and challenges. Journal of cleaner production, 260:121031, 2020.
[10] Juan F. Galvez, J. C. Mejuto, and J. Simal-Gandara. Future challenges on the use
     of blockchain for food traceability analysis. TrAC - Trends in Analytical Chemistry,
     107:222–232, 2018.
[11] Christian Gorenflo, Stephen Lee, Lukasz Golab, and Srinivasan Keshav. FastFabric: Scaling
     Hyperledger Fabric to 20,000 Transactions per Second. In ICBC 2019 - IEEE International
     Conference on Blockchain and Cryptocurrency, pages 455–463, 2019.
[12] Nicola Guarino, Daniel Oberle, and Steffen Staab. What is an ontology? In Handbook on
     ontologies, pages 1–17. Springer, 2009.
[13] Reshma Kamath. Food Traceability on Blockchain: Walmart’s Pork and Mango Pilots with
     IBM. The Journal of the British Blockchain Association, 1(1):1–12, 2018.
[14] Kelvin J. Lancaster. A New Approach to Consumer Theory. Journal of Political Economy,
     74(2):132–157, 1966.
[15] Danwei Liang, Jian An, Jindong Cheng, He Yang, and Ruowei Gui. The quality control
     in crowdsensing based on twice consensuses of blockchain. In Proceedings of the 2018
     ACM International Joint Conference and 2018 International Symposium on Pervasive and
     Ubiquitous Computing and Wearable Computers, pages 630–635, 2018.
[16] Thanh Son Lam Nguyen, Guillaume Jourjon, Maria Potop-Butucaru, and Kim Loan Thai.
     Impact of network delays on Hyperledger Fabric. In INFOCOM 2019 - IEEE Conference on
     Computer Communications Workshops, INFOCOM WKSHPS 2019, pages 222–227, 2019.


                                           117
[17] Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm.
     In 2014 USENIX Annual Technical Conference ATC 14), pages 305–319, 2014.
[18] Srinath Perera, Amer A Hijazi, Geeganage Thilini Weerasuriya, Samudaya Nanayakkara,
     and Muhandiramge Nimashi Navodana Rodrigo. Blockchain-based trusted property
     transactions in the built environment: Development of an incubation-ready prototype.
     Buildings, 11(11):560, 2021.
[19] Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. Dbscan
     revisited, revisited: why and how you should (still) use dbscan. ACM Transactions on
     Database Systems (TODS), 42(3):1–21, 2017.
[20] Joseph E Stiglitz. The causes and consequences of the dependence of quality on price.
     Journal of economic literature, 25(1):1–48, 1987.
[21] Seyedeh Faezeh Taghizadeh, Ramin Rezaee, Gholamhossein Davarynejad, Javad Asili,
     Seyed Hossein Nemati, Marina Goumenou, Ioannis Tsakiris, Aristides M Tsatsakis, Kobra
     Shirani, and Gholamreza Karimi. Risk assessment of exposure to aflatoxin b1 and ochra-
     toxin a through consumption of different pistachio (pistacia vera l.) cultivars collected from
     four geographical regions of iran. Environmental toxicology and pharmacology, 61:61–66,
     2018.
[22] Feng Tian. An agri-food supply chain traceability system for China based on RFID &
     blockchain technology. In 2016 13th International Conference on Service Systems and Service
     Management, ICSSSM 2016, pages 1–6. IEEE, 2016.
[23] Feng Tian. A supply chain traceability system for food safety based on HACCP, blockchain
     & Internet of things. In 14th International Conference on Services Systems and Services
     Management, ICSSSM 2017 - Proceedings, pages 1–6. IEEE, 2017.
[24] S. Vieri. Quality Products and Genetically Modified Organisms in Italy: Hazards and
     Possibile Enhancements. Journal of Nutritional Ecology and Food Research, 1(1):68–77, 2013.
[25] S. Vieri. Conflitti di maniera e accordi di sostanza, 2015.
[26] Jingzhong Wang, Mengru Li, Yunhua He, Hong Li, Ke Xiao, and Chao Wang. A Blockchain
     Based Privacy-Preserving Incentive Mechanism in Crowdsensing Applications. IEEE Access,
     6:17545–17556, 2018.




                                             118