Don’t Stop Thinking about Tomorrow:
                  Use Cases Demonstrating
    the Asymmetric Impact of Contextual Temporal Links
         in Knowledge Graph Evolution & Retrieval1

                            Waterman, K. Krasnow1[0000-1111-2222-3333]
     1 Decentralized Information Group, Computer Science and Artificial Intelligence Lab,

                        Massachusetts Institute of Technology 02139, USA
                                        kkw@mit.edu


        Abstract. This short paper presents use cases to prompt consideration of the
        asynchronous and asymmetric nature of context updates when devising schemes
        and standards for managing and preserving decentralized knowledge graphs. As
        data are increasingly connected in knowledge graphs that evidence the relation-
        ships among them, an open challenge is how to manage and preserve decentral-
        ized data so that a graph updates, and a query returns, data that correctly evi-
        dences the contextual relationship. Much of the focus on managing and preserv-
        ing the evolution of data has been about preserving the internal (internal to a
        dataset or source) history, where preservation and retrieval are synchronous. But,
        as demonstrated here, in many real-world use cases the correct linkage and, there-
        fore, preservation and retrieval, is neither a temporal match nor related version
        match.

        Keywords: Knowledge graph, knowledge graph evolution, temporal nodes,
        temporal relationships, web standards, data management, data preservation,
        data context, context mapping, linked data, semantic web.


1       Introduction

Context tells us about the environment in which information exists. Context educates
us as to how information is relevant, by its relation to other things - from the most
common comparators of time and location to little known events that impact or are
impacted by our data. From the inception of this workshop on Managing the Evolution
and Preservation of the Data Web (“MEPDaW”), there has been a recognition of the
importance of temporal relationships in the update, recording, storing, and retrieval of
linked data [1][2][3]. More recent work has expanded on the abilities to work at scale


1 Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License At-

tribution 4.0 International (CC BY 4.0).
and with ever increasing numbers of versions [4][5]. Overwhelmingly, work has oper-
ated on the presumption that linked data should be preserved or analyzed at a moment
in time, past or present. But decentralized data providing context in an evolving graph
is not necessarily a temporal match nor an exact versioning match. This paper provides
use cases in which searching for historical versions of a knowledge graph will require
the ability to identify and retrieve data which does not share the same archival date
and/or requires the retrieval of more than one version of some but not all nodes, and
possibly edges, of the graph.


2      Temporal Context & Graph Evolution

In the initial design of a knowledge graph, the relationships are often defined based
upon knowledge or theory of a use case as a snapshot. As graphs are deployed in pro-
duction, and the data flows, it becomes apparent that an additional sort of descriptor is
required. What should be defined and where – when the impact of updates to temporal
context on the evolution of the graph is known?

2.1    Simultaneous Context
Circumstances where a simultaneous set of facts provides context are perhaps the eas-
iest to call to mind. For example, periods of rain readily provide the context for many
traffic accidents [6]. In such a case, a decentralized graph might tie the exact time and
geo-perimeter of meteorologic data about a phenomenon [7], highway data about the
number of vehicles in the vicinity at that time from EZpass or traffic cam counts [8][9],
and law enforcement data about accidents from published police reports [10]. In this
case, it is straight-forward to retrieve the data by querying event_date. Even if, as so
often occurs, smaller accidents are reported and entered on later days, retrieval of all
the graph’s data based on event_date will still be effective.


Fig. 1. Simultaneous Temporal Context: Retaining the temporal (& geospatial) edges when one
decentralized node is updated.
2.2    Lagging Context
There are many circumstances in which there is a time lag between one set of facts
which provides the context for another set of facts. A common example is the relation-
ship between national testing scores of a local school and changes in house prices in
the district [11]. In such a case, the testing scores are typically released and ranked once
in a year [12][13][14], while house prices, averages, etc. are updated at least monthly
by real estate companies and governmental agencies [15]. In these cases, the relevant
temporal nodes share neither the same name nor date. For example, the relevant score
date is not the event_date (regardless of whether that is defined as test_date or scor-
ing_date), but the pub_date – the first date that the scores could have been known by
others; the relevant price date is not the pub_date but arguably the offer_date – the first
date there may be evidence of the market response; and the timing of the relationship
begins at pub_date+N – the number of days after publication that it could have reached
a real estate agent or a buyer. Statistics on either side of the graph can be corrected or
updated for a particular date. For these cases, it is important to remember that, even
though there is not an exact temporal match, the modification of either set should not
break the graphed relationship. And, retrieval should be of the final corrected versions
only.


Fig. 2. Lagging Context: Temporal updates treated as new nodes, where one decentral-
ized source provides context for later data from another source.

2.3    Predictive Context
Conversely, there are instances in which retrieval should pull all the iterations, not just
the final. Consider this circumstance, where data in one store has a predictive link to
data in another store. For example, over the summer of 2021, there was a record number
of dogs surrendered to a local animal control agency and this appeared to be a predictor
of the number of households to be in distressed circumstances at the end of Covid-
related eviction moratoria. Figure 4 shows one possible graph in which the Surren-
dered_Dog_Count node is a separate and distinct daily report, but it causes only inter-
mittent updates to the versioning of the singular node for Updated_Evictions_Forecast.
It is possible that the iterations of the edges (and resulting node versions) may not be
consistently temporally spaced, for reasons ranging from testing and refining the fore-
casting model to additional forecasting when there is a significant influx of dogs. What
then is the appropriate query to restore the history?
Fig. 3. Predictive Context: One decentralized source creates separate nodes for each
update to support a single updated node where another source is calculating predictions.

2.4    Bi-directional Context
Another sort of context which would require the reevaluation of the relationship based
upon knowledge at a particular time, is when the change can be prompted by any node.
For example, consider advances in knowledge about human reactions to substances and
changes in grocery contents. There may be new medical practice or research reporting
– for example, the impact that Sucralose has on blood sugar [16] – which changes the
graphed labels between diabetes and numerous foods. Sucralose was recently the lead-
ing ingredient globally for new foods and beverages with sugar-related claims [17], an
example of the constant changes to the contents of groceries [18][19] – foods, toiletries,
cleaning supplies – which can also change the nature of the label between an item and
a medical condition (e.g., allergy, celiac, diabetes).


Fig. 4. Bi-directional context: Temporal updates to nodes in either decentralized source can
change the edge between the sources.

       This particular example is complicated by the fact that, for most purposes, the
primary users of each dataset would prefer different outcomes from the updates. The
medical researcher more likely would wish to see the medical conclusion mapped to
each version of a product’s ingredient list, requiring the retention of each as a separate
node and a separate edge. While the consumer would likely prefer to see the medical
conclusion mapped only to the form of the product currently stocked on shelves, re-
quiring an overwrite that retains only one node and one edge.


3      Discussion

The provided use cases show instances in which searching for historical versions of a
knowledge graph will require the ability to identify and retrieve data which does not
share the same archival date and/or requires the retrieval of more than one version of
some but not all nodes of the graph. These challenges are complicated by the data being
owned by different parties, in different subjects, who may not even be aware of the use
to which their data is put. Generally, these are not challenges that can be solved by
simply using date+n_days, as the number of days and versions may be inconsistent.
For knowledge graphs to evolve appropriately, there must be a notation to indicate, and
a mechanism to produce, the desired impact of a contextual temporal relationship. As
shown with the temporal context examples of simultaneity, lag, prediction, and bi-di-
rectionality, graph creators need to be able to express whether a temporal update to the
data in a node should create a new node or overwrite the existing one, and whether it
can result in a change to an edge or create a new one. To facilitate historical retrieval,
there should be a standard for data owners to describe not only the data and time pro-
duced, but also versioning methodology – for example, metadata indicating whether
data is overwritten or new date named versions produced; whether there is a marker for
the final version of iterated data; and whether there is a graphed relationship that causes
changes to this data.
References2
 1. Taelman, R., et al, Continuously Updating Query Results over Real-Time Linked Data, In
    Proceedings of the 2nd Workshop on Managing the Evolution and Preservation of the Data
    Web (MEPDaW 2016) co-located with 13th European Semantic Web Conference (ESWC
    2016) CEUR-WS, vol. 1585, pp. 1-10, Heraklion, Crete, Greece (2016) (http://ceur-
    ws.org/Vol1585/mepdaw2016_paper_01.pdf).
 2. Anderson, J & Bendiken, A., Transaction-Time Queries in Dydra (Industry Paper), In Pro-
    ceedings of the 2nd Workshop on Managing the Evolution and Preservation of the Data Web
    (MEPDaW 2016) co-located with 13th European Semantic Web Conference (ESWC 2016)
    CEUR-WS, vol. 1585, pp. 11-19, Heraklion, Crete, Greece (2016) (retaining past and cur-
    rent state as separately addressable stores) (http://ceur-ws.org/Vol-1585/mepdaw2016_pa-
    per_02.pdf).
 3. Fernandez, J.D., Polleres, A., & Umbrich, J., Towards Efficient Archiving of Dynamic
    Linked Open Data, In Proceedings of the First DIACHRON Workshop on Managing the
    Evolution and Preservation of the Data Web co-located with 12th European Semantic Web
    Conference (ESWC 2015), CEUR-WS, vol. 1377, pp. 34-49, Portorož, Slovenia (2015)
    (http://ceur-ws.org/Vol-1377/paper6.pdf).
 4. Quevas, I. & Hogan, A., Versioned Queries over RDF Archives: All You Need is SPARQL?
    In Proceedings of the 6th Workshop on Managing the Evolution and Preservation of the
    Data Web (MEPDaW) co-located with the 19th International Semantic Web Conference
    (ISWC 2020), CEUR-WS, vol. 2821, pp. 43-52, Virtual (instead of Athens, Greece) (2020)
    (exploring querying in and across massive versioned archives) (http://ceur-ws.org/Vol-
    2821/paper6.pdf).
 5. Gleim, L. & Decker, S., Open Challenges for the Management and Preservation of Evolving
    Data on the Web, In Proceedings of the 6th Workshop on Managing the Evolution and
    Preservation of the Data Web (MEPDaW) co-located with the 19th International Semantic
    Web Conference (ISWC 2020), CEUR-WS, vol. 2821, pp. 11-16, Virtual (instead of Athens,
    Greece) (2020) (referring to the resolution of synchronization possibly through TimeMaps)
    (http://ceur-ws.org/Vol-2821/paper9.pdf).
 6. See, e.g., https://ops.fhwa.dot.gov/weather/q1_roadimpact.htm.
 7. See, e.g., https://www.weather.gov/media/aly/Past_Events/2015/PNS_Microburst_
    Jun_9_2015.pdf (example of open web data re: a microburst, showing time, longitude and
    latitude).
 8. See, e.g., NY Open Data https://data.ny.gov/Transportation/Annual-Average-Daily-Traffic-
    AADT-Beginning-1977/6amx-2pbv (providing average daily vehicle usage per stretch of
    roadway).
 9. https://catalog.data.gov/dataset/e-zpass-usage-statistics-beginning-2008 (providing EZ Pass
    usage by year by toll plaza).
10. See, e.g., https://data.ny.gov/Transportation/Motor-Vehicle-Crashes-Case-Information-
    Three-Year-/e8ky-4vqe (providing timestamp and DOT mileage marker for location of ac-
    cidents).
11. See, e.g., https://www.opendoor.com/w/blog/how-school-ratings-impact-home-prices and
    https://www.niche.com/k12/search/best-school-districts/ (offering houses for sale tied to
    each school district ranking).
12. See, e.g., https://nces.ed.gov/programs/digest/mrt_tables.asp (annual release of education
    statistics).


2 All web citations are as of September 5, 2021 and are not listed separately in each reference.
13. See, e.g., http://www.globalreportcard.org/about.html (downloadable global school district
    data).
14. See, e.g., https://infohub.nyced.org/reports/school-quality/information-and-data-overview
    (New York City open data on education, including testing scores).
15. See, e.g., https://www1.nyc.gov/site/finance/taxes/property-rolling-sales-data.page (NYC
    rolling sales data for residential real estate).
16. Pepino, Y.M., Tiemann, C.D., et al, Sucralose Affects Glycemic and Hormonal Responses
    to an Oral Glucose Load, Diabetes Care, Vol. 36(9), pp. 2530-2535 (American Diabetes
    Association, Sept. 2013) (https://care.diabetesjournals.org/content/36/9/2530 ).
17. “Sugar Reduction Innovation,” Aug. 2021 (https://www.foodingredientsfirst.com/analysis-
    popup/sugar_reduction_aug_2021.html).
18. See, e.g., https://www.foodingredientsfirst.com/ (a website for the food industry with focus
    on rising and declining ingredient trends).
19. See, e.g., USDA Branded Food Products Database (https://data.nal.usda.gov/dataset/usda-
    branded-food-products-database/resource/cfceb689-7dab-498f-8762-707cd299646b)
    (providing ingredients for branded foods).