<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HR-VEAW: A Human Rights Violation Exploration, Analytics, and Warning System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiaozhong Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiawei Xu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Merve Keskin</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael P. Colaresi</string-name>
          <email>mcolaresi@pitt.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladimir I. Zadorozhny</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Panos K. Chrysanthis</string-name>
          <email>panos@cs.pitt.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science, University of Pittsburgh</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Informatics &amp; Network Science, University of Pittsburgh</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dept. of Political Science, University of Pittsburgh</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The availability of information from social media, such as tweets, and human rights monitors, such as Amnesty International, Human Rights Watch, and the US State Department has led to new opportunities to measure repression and human rights protections in higher resolution. In this paper, we present HR-VEAW, a Human Rights Violation Exploration, Analytics, and Warning system, to support understanding of social conflict dynamics and human rights violations/protections with quantitative data. After briefly discussing HR-VEAW's data acquisition and analysis components, we demonstrate how it visualizes rich spatio-temporal and conceptual information, enabling the examination of changes in patterns of violation and protection in aggregate over time, or across both space and time. This way HR-VEAW helps to explain social instability and conflicts and to guide decision-making, theorizing, and predictions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;data warehousing</kwd>
        <kwd>data visualization</kwd>
        <kwd>data exploration</kwd>
        <kwd>data analytics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        array of rights and behaviors for specific groups. This
enables us to not only look at changes in patterns of
vioThrough a qualitative understanding of conflict dynamics, lations and protections in aggregate over time or across
human rights violations and protections, and ethnic poli- both space and time, but fundamentally explore which
tics and relations, policy analysts and conflict researchers groups are being targeted or privileged by the
governbuild mental models of the underlying grievances and ment and other actors and on what specific dimensions.
alliances that structure war and peace. Currently, there For example, it helps to explain why in Ethiopia, the
is no system that directly allowed policy makers and re- Tigrayan Peoples Liberation Front (TPLF) lost influence
searchers to both inform their visions of these grievances due to their broad repression as well as provides clues
and policies with systematic, quantitative data, as well for why the Oromo Liberation Army (OLA) and TPLF are
as share the resulting maps of the spatial and concep- now cooperating against the government and when that
tual patterns that guide decision-making, theorizing, and cooperation might end [
        <xref ref-type="bibr" rid="ref4">3</xref>
        ].
predictions. The most sophisticated, interactive visualiza- HR-VEAW implements a scalable information
processtions (e.g., [
        <xref ref-type="bibr" rid="ref2 ref3">1, 2</xref>
        ]) provide event-views that count events, ing pipe-line combining traditional database technologies
but outside of the context of specific grievances. with data streaming, NLP and sentiment-aspect
represen
      </p>
      <p>In this paper, we present a first Human Rights Violation tations, data visualization, and interpretable ML. In this
Exploration, Analytics, and Warning system (HR-VEAW) demo paper, after briefly discussing HR-VEAW’s data
acthat allows users to visualize rich spatial and conceptual quisition and analysis components (S 2), we demonstrate
information that is relevant to both the escalation of in- how it visualizes rich spatio-temporal and conceptual
stability, as well as to how negotiators might wind down information, helping users to explain social instability
tensions, and with whom. We process textual data from and conflicts and to guide decision-making, theorizing,
human rights reports and other social media that commu- and predictions (S 3).
nicates both historical and contemporaneous information
on who is alleged to have violated or protected a broad</p>
    </sec>
    <sec id="sec-2">
      <title>2. HR-VEAW System Overview</title>
      <p>
        Figure 1 illustrates an overview of the data flow of the
HR-VEAW system, consisting of two phases. The first
phase is Data Acquisition, in which data from diferent
sources are transformed by the ETL process and stored
into a data warehouse. The second phase is Data Analysis,
in which data visualization and interpretation are used
to explore the data and to discover patterns of social The data interpretation module aims to find the logical
instability for early warnings. relationship between diferent features in the data. One
way to accomplish it is to booleanise some of the
aggre2.1. Data Acquisition gated information from the data warehouse to get the
corresponding binary indicators. These binary indicators
This phase consists of a variety of ETL (Extract, Trans- can serve as learning features in machine learning
methform, Load) processes for diferent data sources. Human ods, such as the Tsetlin Machine [
        <xref ref-type="bibr" rid="ref6">8</xref>
        ], to conduct logical
rights reports, news, and tweets, etc. will go through interpretable learning to discover relationship, pattern,
the human rights text parser PULSAR [4]. PULSAR uses and rules between observables and target indicators in
rule-based and machine learning-based models to predict the data.
and extract structured information such as region, time, The data visualization module is designed to help the
victim and human rights aspect from the documents. So- data analyst explore and understand the data. It allows
cial and economic statistics on country and subcountry the user to visualize human right conditions across
diferlevels are extracted from the ViEWS project [5] by spe- ent dimensions such as region, time, victim and human
cialized filtering and aggregation queries. Geographic rights aspect. The interactive visualization can help the
information such as region names, administrative hier- user discover interesting patterns in the data, such as
archies, and region boundaries are retrieved from the inequalities across dimensions and specific grievances
online geographic database OpenStreetMap (OSM) [6]. concerning specific subsections of certain dimensions.
All extracted and transformed data are then stored in a Such specific grievance, for instance, could be Integrity
data warehouse (discussed in S 3.1). rights violation against Oromo people in Ethiopia.
      </p>
      <p>The whole data acquisition phase is structured along The final module of HR-VEAW provides early warning
the producer/consumer paradigm and topic channels, of future conflicts based on the data analysis results from
powered by Apache Kafka [7]. There are two major the data interpretation and data visualization modules.
advantages of using Kafka in HR-VEAW. First, Kafka
can process high throughput data streams from diferent
sources and can be easily integrated with the ETL pro- 3. Data Visualization in HR-VEAW
cess as an event driven message bus. Second, while the
data warehouse only stores the structured data after the This section provides implementation details of the data
ETL process, Kafka can persist data/messages during the warehouse that drives the data visualization in
HRwhole data acquisition phase, so that messages are never VEAW, describes the data visualization GUI and services,
lost and can be retrieved at a later time as needed. and presents an example of data visualization use case.</p>
      <sec id="sec-2-1">
        <title>2.2. Data Analysis</title>
        <sec id="sec-2-1-1">
          <title>In this phase, diferent types of data from the data warehouse could be generated at diferent aggregation levels and forms suitable for the data interpretation and data visualization modules.</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>3.1. Data Warehousing</title>
        <sec id="sec-2-2-1">
          <title>The data warehouse serves as the connection point between data acquisition and data analysis. It is designed to have a star schema which includes a fact table and four dimension tables. The four dimension tables are REGION,</title>
          <p>
            TIME, VICTIM and ASPECT. The hierarchy (i.e., ontol- cell with valid data across time, victim and human rights
ogy) information for regions, victims and human rights aspect. The more opaque a cell is, the more negative the
aspects is from OSM, WordNet [
            <xref ref-type="bibr" rid="ref8">9</xref>
            ] and the U.S. State valence sum (i.e., more severe the human rights problem).
Department, respectively. Each dimension table has a Using opacity as the human rights severity indicator can
primary key and multiple attributes representing multi- capture diferent problem patterns at the same time. For
ple levels in the dimension. For example, for a record in example, an area in the 3D visualization will be dark
the REGION table, the primary key is the OSM ID for that when either there is a couple of highly severe cells or
region, and the other attributes are the OSM IDs of that there are many intermediately severe cells in the area.
region’s ancestor regions in the administrative hierarchy. Cells without data will be left transparent.
          </p>
          <p>The fact table is called VALENCE. Valence is an output The user can pan, tilt, rotate and zoom the visualization
value of PULSAR for each human rights record, repre- with simple mouse movements. Further, there is a legend
senting the polarity of the record. A negative valance displaying valence ranges for diferent opacity levels. An
represents human rights violation and a positive valence info box is also present to display the cell information
represents human rights protection. The VALENCE table (i.e., the corresponding attribute values and the valence
has five attributes: one dimension ID for each dimension sum) when a cell is clicked.
and a valence value. If a human rights record contains The clicking of a cell will also trigger the rendering of
more than one value in one dimension, then each value 1D and 2D plots which are displayed beside the 3D plot.
will generate one record in VALENCE. For example, if These plots correspond to 1D and 2D data cube slices that
a human rights record involves two regions, then each contain the cell. For example, the first 2D plot in Figure 2
region will generate a separate valence record. (3rd plot from top) displays the temporal valence change</p>
          <p>In addition to the five tables above, the data warehouse across diferent aspects for the clicked region and victim.
has one more basic table and one more view. The table is This feature helps the user to acquire clear and precise
named BOUNDARY, and contains geographic boundary co- readings of data subspaces.
ordinates for the regions in the REGION table. BOUNDARY The recommendation section is designed to help
has a foreign key referencing the primary key in REGION user discover interesting patterns more easily. HR-VEAW
and thus it has a many-to-one relation to REGION. The dynamically generates recommendation listings for the
view is tentatively called valence_dimensions. It is ifltered data subset based on diferent ranking criteria.
generated by joining VALENCE with all dimension tables Examples of two ranking criteria are: valence sum and
so that the group-by attributes that can be requested by change point. The ranking scores are calculated for each
the data analysis modules, namely all levels in all dimen- valence value time series (i.e., the column of cells
perpensions, are present in the view. dicular to the map surface in the 3D visualization) that
has a specific combination of region, victim and aspect</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3.2. Data Visualization values.</title>
      <p>The valence sum criterion calculates the sum of the
valence values in each time series and its ranking score
is based on Equation 1:
The main purpose of the data visualization module is
to help data analysts discover interesting patterns in
human rights situations. These patterns could include
bdeifetrweenecne,rseigmioinlasr,ittiymaenrdancgoerrse,lsaotcioianl ginrovuaplse,nacnedvhaulumesan (,,) = ∑︁ (,,) (1)
rights aspects, which are very helpful in understanding =1
and analyzing human rights situations. where (, , ) is a specific combination of region, victim
The data visualization front-end is shown in Figure 2. and aspect values and 
(,,) are the time series values
The GUI has three sections: a filter section, a visualization for that combination.
section, and a recommendation section. The change point criterion calculates the maximum</p>
      <p>The filter section contains hierarchical filters that of absolute changes between adjacent cells in each time
allow the user to select attribute values in diferent levels series and its ranking score is based on Equation 2:
of diferent dimensions. The levels and values in the
ifldtiemrsernespiorensetanbtltehse.iBr ycoaullnotweripnagrttsheinutsheer ctoorsreelsepcotnvdailnuges (,,) = 1m≤a&lt;x(|(+,1,) − (,,)|) (2)
at diferent levels, this filter design enables cross-level where (· ) is the maximum operator and | · | is the
display and comparison of the data. absolute value operator.</p>
      <p>The visualization section displays the aggregate Each recommendation criterion will provide a listing of
values (i.e., sum of valence) for the filtered data subset. the top 10 entries. By clicking an entry, the corresponding
Specifically, for each selected region, a 3D visualization time column will be highlighted in the 3D visualization.
will be generated displaying the valence sums in each</p>
      <sec id="sec-3-1">
        <title>3.3. Example Use Case</title>
        <p>case, then dealing with the grievances reflected by those
civil events could potentially prevent the increase of the
instability events in the following years.</p>
        <p>To conclude, through data visualization and
exploration with the help of HR-VEAW, the analyst was able to
form a hypothesis that the increase of civil events before
year 2009 in the Somali region could be the cause to the
increase of force events in 2009 and after in the same
region, which could guide their further investigation.</p>
        <p>An example data visualization use case is illustrated in
Figure 3, in which the analyst tries to identify specific
grievances that may have caused instability events in the
history of Ethiopia.</p>
        <p>Instability events are mainly reflected by human rights
violations in the sub-aspect Force under the the aspect
of Integrity, which contains primarily armed conflicts
between the government and the ethnic groups.
Therefore, the analyst first chooses to visualize that sub-aspect
across all victims in Ethiopia as shown in Figure 3a. 4. Demonstration Scenario</p>
        <p>They realize that there was a large increase in force
events (i.e., human rights violation events under the Force The following demonstration introduces the key
consub-aspect) in year 2009 and want to find out if this pat- cepts and visualization abstraction of HR-VEAW to the
tern is universal across the country or specific to certain attendees. They will comprehend the performance of
sub-regions. So they select all sub-regions in the filter our propositions during their interactions with our
usersection to see the time columns across the sub-regions. friendly interface.</p>
        <p>Then they check the top-1 recommendation from the
valence sum criterion, and find out that the Somali region 4.1. Demo Artifact
contributed greatly to the aforementioned force event
increase, as shown in Figure 3b. The demo artifact is a web application prototype
repre</p>
        <p>So the data analyst focuses into the Somali region and senting the data visualization module of HR-VEAW. The
selects all aspects as well as all sub-aspects under the software stack of the web app consists of Flask, MySQL,
Integrity aspect. Then they check the top-1 recommen- JavaScript and HTML. The web app will be served on a
dation from the change point criterion, and find out that dedicated HR-VEAW server, which can be publicly
accivil events (i.e., human rights violation events under cessed during the demo.
the Civil aspect) also had a large increase around year
2009 in the Somali region with a slightly earlier start, as 4.2. Demo Plan
shown in Figure 3c. This indicates that the increase of
the civil events could be the cause to the increase of the
force events in the following years. And if this is the</p>
        <sec id="sec-3-1-1">
          <title>Equipment: The conference attendees will have the op</title>
          <p>portunity to interact with the web app through any web
browser on a standard laptop or a tablet.
(a)</p>
          <p>The authors would like to thank Daniel Gustafson for
his work on PULSAR. This work was partially funded
by NSF grant SES-2017614 and reflects only the authors’
opinions.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>Datasets: We will pre-load data from U.S. State Department human rights reports, which will primarily cover countries in Africa, where human rights condition is rel- [1] D. of</article-title>
          <string-name>
            <surname>Peace</surname>
          </string-name>
          , C. R. at Uppsala University,
          <article-title>Uppsala atively worse. conflict data program</article-title>
          , https://ucdp.uu.se,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Scenario 1: The first scenario asks the user to discover [2] C. for Systemic Peace, Major episodes of political diference, similarity or correlation between events in violence 1946-2019</article-title>
          , http://www.systemicpeace.
          <article-title>org/ diferent human rights violation aspects for a specific warlist/warlist</article-title>
          .htm,
          <year>2020</year>
          . region.
          <article-title>One example result from such discovery could be</article-title>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Anna</surname>
          </string-name>
          ,
          <article-title>Ethiopia armed group says it has the use case discussed in Section 3.3, in which correlation alliance with tigray forces (</article-title>
          <year>2021</year>
          ).
          <article-title>URL: https: between force events and civil events are discovered for //apnews.com/article/africa-only-on-ap-ethiopia-</article-title>
          ...
          <source>the Somali region. b280e6622d66b7e7f9b12cd1d0041ae8.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>Scenario 2: The second scenario asks the user to dis</article-title>
          - [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Greene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Colaresi</surname>
          </string-name>
          ,
          <article-title>Human rights are (incover diference, similarity or correlation between events creasingly) plural: Learning the changing taxonomy in diferent regions for a specific human rights violation of human rights from large-scale text reveals inforaspect. For example, if we look back to Figure 3b, we may mation efects, American Political Science Review realize that with the increase of force events in the Somali 114 (</article-title>
          <year>2020</year>
          )
          <fpage>888</fpage>
          -
          <lpage>910</lpage>
          . region around year
          <year>2009</year>
          ,
          <article-title>the force events in the neigh</article-title>
          - [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hegre</surname>
          </string-name>
          , et al.,
          <article-title>Views: a political violence earlyboring Oromia region was actually decreasing during the warning system</article-title>
          ,
          <source>Journal of peace research 56</source>
          (
          <year>2019</year>
          )
          <article-title>same period, which may indicate a negative correlation 155-174. between the two</article-title>
          . [6]
          <string-name>
            <surname>OpenStreetMap</surname>
            <given-names>contributors</given-names>
          </string-name>
          , Planet dump re-
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>Scenario 3: The attendees will have the opportunity trieved from https://planet</article-title>
          .osm.org , https://www. to
          <article-title>interact freely with HR-VEAW to conduct any other openstreetmap</article-title>
          .org,
          <year>2017</year>
          .
          <article-title>data exploration tasks such as comparing human rights [7</article-title>
          ]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kreps</surname>
          </string-name>
          , et al.,
          <article-title>Kafka: A distributed messaging sysconditions across time ranges, social groups, etc. tem for log processing</article-title>
          ,
          <source>in: Proceedings of the NetDB,</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          volume
          <volume>11</volume>
          ,
          <year>2011</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.-C.</given-names>
            <surname>Granmo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. I.</given-names>
            <surname>Zadorozhny</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>GoodAcknowledgment win, A relational tsetlin machine with applications</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>ligent Information Systems</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Wordnet: A lexical database for english,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Commun. ACM</surname>
          </string-name>
          38 (
          <year>1995</year>
          )
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>