Centralized Versus Decentralized Digital Identity Architectures:
                       Simulation Models of Data Exchange
                                       Yoshiaki Fukami,1 Takumi Shimizu, 2 Teruaki Hayashi, 3
                                                Hiroki Sakaji, 4 Hiroyasu Matsushima5
                                  Keio University,1,2 The University of Tokyo,3, 4 Shiga University,5
                                         yofukami@sfc.keio.ac.jp,1 takumis@sfc.keio.ac.jp,2
                hayashi@sys.t.u-tokyo.ac.jp, 3 sakaji@sys.t.u-tokyo.ac.jp,4 hiroyasu-matsushima@biwako.shiga-u.ac.jp5

                                     Abstract                                             authentication, and without linking to specific ID providers
   In order to utilize big data generated from distributed cloud-                         such as Google and Facebook.
   based services, a digital ID is required to link between data                                From the service provider's point of view, it is
   and its subjects. Decentralized Identifiers (DID) have been
   developed to manage data from various services with privacy                            advantageous to be able to obtain and utilize diverse data at
   protection. We analyzed two ID architectures, DID and                                  low cost, and it will encourage the emergence of innovations
   centralized ID (CID), with simulation models to evaluate the                           in the form of new services. Both architectures, CID and
   efficiency of ID architectures. In a monopoly market where
   there is no competition between ID providers, there is no                              DID, have their advantages and disadvantages, and it is
   difference between DID and CID. However, if there are                                  difficult to determine which is better simply. Therefore, we
   multiple ID providers without interoperability, service                                use a simulation approach in order to study many factors in
   providers have access to more data in the DID architecture
   compared to CID. However, this result was affected by the                              an integrated manner.
   design of the model without ID federation technologies.                                      In multi-agent simulation, people and objects can be
   Currently, service providers can receive data from many                                represented as agents, and phenomena resulting from their
   third-party services with the ID federation standard. Also, the
   simulation results that DID is very efficient for data                                 interactions can be observed. For example, it is applied to
   distribution should be carefully interpreted by considering                            fields such as traffic (Bazzan & Klügl, 2009), pedestrian
   the upcoming costs for implementation.                                                 flow (Yamashita et al., 2014), and market transactions
                                                                                          (Hirano et al., 2020; Yagi et al., 2020). By confirming the
                                                                                          simulation results, it is possible to support decision-making
                                 Background                                               in planning and policy making related to them.
In recent years, consumers have come to have a large
number of user accounts linked to more and more cloud-                                                              Models
based services. This has led to the accumulation of a wide
variety of attribute data in the cloud, increasing the potential                          This study employs simulation models to analyze the CID
for the creation of new services, while at the same time                                  and DID structures and their impacts on data exchange. In
developing a means of sharing data that is fragmented                                     the CID model, each user has some data which is managed
between services in a way that is easy to use and protects the                            by ID providers. Service providers have their needs (i.e.,
rights of consumers. Service providers can identify                                       which data a service provider needs to create products) and
consumers with digital IDs provided by third party                                        try to obtain the data they need by accessing the IDs users
companies and obtain attribute data stored by other services                              have. Verifiers may or may not get the data depending on an
under consumer authentication.                                                            ID that bridges transactions between users and verifiers. For
      Most of the data accumulated from multiple services is                              instance, if a verifier asks a user to share the data “a” and
linked to the ID issued by a specific small number of                                     the user uses the ID “A” for this transaction, the verifier can
companies, and such companies also provide functions of                                   get the data “a”. If the user uses the ID “B” in this case, the
authorization. This means that there is some risk that                                    verifier cannot get the data. In the DID model, there is no ID
distributed data could be accumulated, analyzed and utilized                              provider in the transaction. A verifier directly contacts a user
for unintended use under malicious intent. The risk of                                    and requests the data it needs. Each user decides whether
privacy infringement is increased by aggregating various                                  he/she accepts the request from a verifier. These models aim
attribute data. While the ID federation enhances consumer                                 to uncover the efficient data exchange structure considering
convenience, it also increases the risk of privacy breaches.                              various parameters such as the number of users and CID
      DID is an architecture in which the entity that provides                            providers and the cost of transactions. Figure 1 describes the
attribute information issues digital IDs in a distributed                                 model structures.
manner enabled by blockchain technologies. In contrast to
DID, an architecture that uses existing ID federation
technology is called a Centralized Identifier (CID). With
DID, aggregated data can be utilized only with consumer's
___________________________________
In T. Kido, K. Takadama (Eds.), Proceedings of the AAAI 2022 Spring Symposium
“How Fair is Fair? Achieving Wellbeing AI”, Stanford University, Palo Alto, California,
USA, March 21–23, 2022. Copyright © 2022 for this paper by its authors. Use permitted
under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                                                                                                                   94
                                                                                          Discussion
                                                                 The result is that service providers have access to more data
                                                                 in the DID architecture compared to CID. However, this
                                                                 result was affected by the design of the model that only
                                                                 introduced the authentication / authorization function of
                                                                 independent third parties without ID federation technologies.
                                                                 Currently, service providers are able to receive data from
                                                                 many third-party services with the ID federation standard
                                                                 such as OpenID connect.
                                                                       On the other hand, the simulation results show that
                                                                 DID is very positive for data distribution. However, DID has
             Figure 1: The overview of the models                not been diffused yet, and it costs for both data providers
                                                                 and acquirers to implement DID technology. The benefits of
                                                                 DID architecture may be offset or negated by the costs of
                          Results                                dissemination, which are not reflected in this model.
We evaluate the models based on the number of data that a              Future research needs more fine-grained models which
service provider can access depending on the ID structures.      reflect real-world ID operations and practices being
In the CID models, the key parameter is the number of CID        developed at standard developing organizations and issues
providers. If there is one CID provider, a service provider      mentioned above such as ID federation and cost structures
can access all the user data via this particular CID provider.   of ID architectures. This study opens up new research
Our simulation assumes 10,000 users in the model, so a           avenues for digital identity structure and data exchange by
service provider can access 10,000 user data in this case. As    showing a basic understanding and implications of CID
the number of CID providers increases, user data is              versus DID architectures.
dispersed across CID providers and a service provider can
obtain only subsets of user data via a CID provider. In the
DID models, the key parameter is the attrition rate of service                       Acknowledgments
provider’s data request. Since the DID requires users to         This work was supported by JSPS KAKENHI 19K23235,
manage each transaction per data record by themselves            20H02384, and 20K13599.
unlike the CID which allows CID providers to manage it, a
service provider sometimes cannot obtain the data due to
this burden of user’s data management. Figure 2 shows the                                 References
results of our simulation models considering various levels
of key parameters. As the graph indicates, the number of         Bazzan, A.; and Klügl, F. (Eds.). 2009. Multi-Agent Systems for
                                                                        Traffic and Transportation Engineering. IGI Global.
data that a service provider can access dramatically
                                                                        doi.org/10.4018/978-1-60566-226-8
decreases as the number of CID providers increases. On the       Hirano, M.; Izumi, K.; Matsushima, H., and Sakaji, H. 2020.
other hand, the number of accessible data in the context of             Comparing Actual and Simulated HFT Traders’ Behavior
DID stays relatively large even in the case of high attrition           for Agent Design. Journal of Artificial Societies and Social
rate.                                                                   Simulation, 23(3). doi.org/10.18564/jasss.4304
                                                                 Yagi, I.; Masuda, Y.; and Mizuta, T. 2020. Analysis of the Impact
                                                                        of High-Frequency Trading on Artificial Market Liquidity.
                                                                        IEEE Transactions on Computational Social Systems,
                                                                        7(6): 1324-1334. doi.org/ 10.1109/TCSS.2020.3019352.
                                                                 Yamashita, T.; Matsushima, H.; and Noda, I. 2014. Exhaustive
                                                                        analysis with a pedestrian simulation environment for
                                                                        assistant of evacuation planning. Transportation Research
                                                                        Procedia, 2: 264–272. doi.org/10.1016/j.trpro.2014.09.047


      Figure 2: The number of accessible data in CID/DID


                                                                                                                            95