=Paper= {{Paper |id=Vol-2173/paper7 |storemode=property |title=Moving Disambiguation of Regulations from the Cathedral to the Bazaar |pdfUrl=https://ceur-ws.org/Vol-2173/paper7.pdf |volume=Vol-2173 |authors=Manasi Patwardhan,Richa Sharma,Abhishek Sainani,Shirish Karande,Smita Ghaisas |dblpUrl=https://dblp.org/rec/conf/hcomp/PatwardhanSSKG18 }} ==Moving Disambiguation of Regulations from the Cathedral to the Bazaar== https://ceur-ws.org/Vol-2173/paper7.pdf
      Moving Disambiguation of Regulations from the Cathedral to the Bazaar

       Manasi Patwardhan, Richa Sharma, Abhishek Sainani, Shirish karande, Smita Ghaisas
                              TCS Research, 54-B Hadpasar Industrial Estate, Pune 411013, India
                                               manasi.patwardhan@tcs.com




                          Abstract                                 We conduct a series of pilot crowdsourcing experiments
                                                                   that help us design a 3-step workflow composed entirely
  Regulatory compliance is critical to the existence, conti-
  nuity, and credibility of businesses. Regulations, however,      of micro-tasks. Micro-task crowdsourcing has a potential
  are ridden with ambiguities that make their comprehension        which is yet to be fully explored in the field of software
  a challenge that seems surmountable only by experts. Ex-         engineering (Adriano and van der Hoek 2016; Weidema
  perts’ involvement in understanding regulatory requirements      et al. 2016; Zhao and van der Hoek 2015; LaToza and
  for every software development project is expensive and not      van der Hoek 2016). We employ micro-tasking to break
  scalable. Having software engineers perform disambiguation       down the complex task of disambiguation into smaller
  of such requirements would be a great value addition. We         chunks of tasks, sequentially executed as the steps of the
  present our design of a 3-step crowdsourcing workflow that       workflow, causing less cognitive load, and resulting in a
  aims to convert the task of disambiguation into a series of      better quality and scalability. We use an already proven
  micro-tasks to be performed by a crowd of software engi-
                                                                   method of peer-evaluation as a part of crowdsourcing work-
  neers. We demonstrate that the outcome of this workflow is
  at par with the expert-enabled disambiguation at 4.5 times       flow to produce reliable data (Goto, Ishida, and Lin 2016;
  lower cost.                                                      Ambati, Vogel, and Carbonell 2012; Hansen et al. 2013;
                                                                   Huang and Fu 2013). The outcome of the micro-task ex-
                                                                   ecuted in the ith step of the workflow is peer-evaluated in
                       Introduction                                the (i + 1)th step, ensuring successive and incremental en-
Since regulations aim to safeguard the wellbeing of citi-          hancement in quality.
zens, they are written with a great rigor and discipline to        For other complex tasks, such as tasks in linguistics (Hong
minimize incidents of violations. However, their diction is        and Baker 2011) and the medical domain (Zhai et al. 2013),
so highly specialized that it is almost incomprehensible to        use of lay crowd to replace experts is proven to be a feasible
business communities, who need to ensure regulatory com-           option leading to more scalable and less costlier solution for
pliance. Mechanisms to assure and demonstrate regulatory           data collection. On the similar lines, in this work, we prove
compliance have been researched for a long time (Breaux,           that the crowd annotations we receive for ambiguity detec-
Vail, and Anton 2006). However, researchers have noted             tion and disambiguation, upon reaching consensus, match
that the ambiguities in regulations pose a challenge to re-        with those made by the experts, providing a clear indication
quirements engineers and thus the process of deriving sys-         that the wisdom of software engineers’ can equate experts’.
tem requirements tends to be error prone.                          We demonstrate that our approach moves this highly spe-
Massey et.al. have created a legal ambiguity taxonomy for          cialized task of disambiguation From the Cathedral to the
identifying and classifying ambiguities in regulations that        Bazaar (Raymond 1999) and leads to 4.5 times reduction in
govern software systems (Massey et al. 2014). In their ex-         cost of experts.
periments involving software engineers (undergraduate and
graduate students) in resolving ambiguities, they found that                           Disambiguation
the engineers could identify ambiguous terms or phrases            There are six distinct types of regulation ambiguities de-
in a regulation statement, but were not able to agree on a         fined by Massey et al (Massey et al. 2014), viz. 1. Lexical 2.
consistent rationale. The authors therefore suggest that soft-     Syntactic 3. Semantic 4. Incompleteness 5. Vagueness and
ware engineers need expert inputs to validate their interpre-      6. Referential. As a part of this study, we have focused on
tations of ambiguities (Massey et al. 2015). Involving legal       the first three types of ambiguities. A term /phrase in a reg-
experts in every software project is expensive and therefore       ulation statement is lexically ambiguous if it has multiple
not scalable. In our work, we explore this line of research        dictionary meanings. Disambiguation here would mean ex-
further by involving a crowd of professional software en-          plicating the exact meaning. Syntactic ambiguity points at
gineers to not only identify ambiguities; but also to disam-       multiple word associations leading to multiple parse trees
biguate regulations, with an aim to find viable and scalable       and disambiguation here amounts to clarifying the scope of
alternative to the current expensive mode of disambigua-           the word association. Semantic ambiguity occurs if a state-
tion.                                                              ment is not self-contained and disambiguation would mean
Copyright c 2018for this paper by its authors. Copying permitted   providing the additional contextual information for inter-
for private and academic purposes.                                 pretation. Table 1 illustrates examples of regulation state-
 Ambiguity Regulatory Statement (marked term in bold)             Question              Answers (valid answers in bold)
 Lexical       Implement hardware, software, and/or proce-        In the given sen-     a) to put in writing or digital form
               dural mechanisms that record and examine ac-       tence what is the     for future use. b) information stored on
               tivity in information systems that contain or      meaning of word       a computer. c) best performance. d) to
               use electronic protected health information.       ‘record’?             make a permanent or official note of.
                                                                                        e) a piece of evidence from the past.
 Syntactic     Implement policies and procedures to address       In the given sen-     a) electronic protected health informa-
               the final disposition of electronic protected      tence the phrase      tion b) policies c) hardware d) address
               health information, and/or the hardware or         ‘final disposition    e) electronic media
               electronic media on which it is stored.            of’ refers to?
 Semantic      Implement hardware, software, and/or proce-        What does ”ex-        a) Keep a log of what was done b) No-
               dural mechanisms that record and examine ac-       amine  activity”      tify admin that something was done c)
               tivity in information systems that contain or      mean?                 Stop/block what is being done d) Iden-
               use electronic protected health information.                             tify what was done e) Classify what was
                                                                                        done

                                                  Table 1: Ambiguity Examples

ments per ambiguity type, with an ambiguous term/ phrase            rather than textual ones. We presented regulation statements
marked. A question posed on the term would highlight the            and supplementary text in the form of policy statements ex-
source and type of ambiguity and a list of valid explanatory        tracted from university websites which publish their HIPAA
answers to the question would result in disambiguation. For         policies (NYU ). The crowd provided binary annotations in-
our study, we sought ground truth inputs from three experts         dicating whether a given policy in response to a regulation
who have worked with Health Insurance Portability and Ac-           seemed to implement what was intended by the regulation
countability Act (HIPAA) regulations (Dwyer III, Weaver,            statement. We received an increased participation (24 out
and Hughes 2004) for more than 3 years. We asked experts            of 30) with reduced completion time (average 1 minute per
to select 5 regulation statements from HIPAA, each having           task) alleviating cognitive load. However, 74% of the re-
terms/phrases depicting the three types of ambiguities.             sponses were incorrect. Moreover, the design of this pilot
                                                                    task is not scalable as it requires collection of policy state-
                                                                    ments for every regulation statement from web sources.
                        Pilot Tasks                                 For all the three pilot tasks, we noted that the tasks involved
To conducted pilot crowdsourcing experiments with a spe-            comprehension of the regulation and strategizing for com-
cific intent to evaluate the design trade-offs w.r.t. cognitive     pliance. The comprehension was subjective because our
load, scalability and quality. Our experiments consisted of         crowd consisted of software engineers working in differ-
3 crowdsourcing tasks to collect regulation disambiguation          ent domains. Accordingly, their foci while formulating or
data for 5 regulation statements. We targeted a crowd of 30         selecting policy statements and/or posing questions as re-
professional software engineers with 3 to 4 years of experi-        sponses were different. This led to a lot of variations in the
ence (henceforth referred to as crowd workers). They were           responses, making it an impossible task to draw a consen-
asked to perform this task during their working hours. In           sus on the source of ambiguity. To address this challenge,
the first task, we tried to achieve disambiguation in a sin-        we needed to direct their attention to specific ambiguities
gle step. We presented regulation statements and asked the          in the regulation statements, which are indicated by specific
crowd workers to either write their own policy statement(s)         terms or phrases.
in response to the regulations or produce policies from cred-
ible sources which comply with the regulations. These pol-                             Workflow Design
icy statements would serve as explanatory texts for disam-
biguation. We achieved a very low participation (3 out of 30        With the observations made from our pilot studies, we ar-
crowd workers) with 27% error rate (incorrect/spam inputs)          rived at the following conclusions: (1) The complex task
and completion time of average 3 minutes per regulation             of disambiguation has to be divided into smaller chunks of
statement indicating a high cognitive load. To address this         micro-tasks, so that, the reduction in cognitive load would
issue, as a part of second pilot task, we designed the dis-         achieve better participation and quality of inputs (2) The
ambiguation as a two-step process: (i) Pose questions about         micro-task design should be (i) amenable to achieve scal-
the ambiguities and (ii) Provide answers. We still got a low        ability, (ii) lead to discrete set of responses, which eases
participation (4 out of 30) with a small reduction in the er-       the process of achieving consensus, (iii) highlight source of
ror rate 24% and completion time (average 2.5 minutes per           ambiguity in a regulation statement, alleviating the prob-
regulation statement). Thus, the reduction in cognitive load        lem of varying focal points. (3) There is a need to design
was not significant enough.                                         a workflow which consists of a sequence of micro-tasks,
Both these pilot tasks sought textual inputs, leading to            such that the solicited crowd responses in the ith step of the
high cognitive load. Furthermore, algorithmically evaluat-          workflow, are reviewed and validated by other set of crowd
ing consensus is a challenge. To address this challenge, as         workers (peers), in the (i + 1)th step, followed by pro-
the third pilot task, we decided to seek discrete responses         viding responses on validated inputs. Such peer-evaluation
would ensure successive and incremental enhancements in                                 Expert Annotations      P     R       F
disambiguation without expert involvement and also would          Crowd Consensus       Valid InvalidTotal
achieve crowd engagement since they are required to ratio-
nalize their validations by providing responses. The resul-              Valid          17     2      19
tant workflow is described below.                                 Step 1 Invalid        3      24     27        89% 85% 87%
                                                                         Total          20     26     46
Workflow Step 1: Marking Ambiguous Terms and Pos-
                                                                         Valid          30     5      35
ing Questions In this micro-task a crowd worker is (i)
                                                                  Step 2 Invalid        1      9      10        86% 97% 91%
presented with a regulation statement, (ii) asked to mark a
                                                                         Total          31     14     45
(set of) term(s) and/or phrase(s) in the statement, which are
ambiguous, and (iii) to pose a (set of) question(s) to every             Valid          31     7      38
term or phrase marked, the answer to which would cause            Step 3 Invalid        3      34     37        82% 91% 86%
disambiguation. We apply majority voting to find consen-                 Total          34     41     75
sus on terms/phrases. Thus, the outcome of this micro-task                              P: Precision, R: Recall, F: F-score
is a set of regulation statements with valid set of ambiguous
terms/phrases and a set of corresponding questions for each
term.                                                                   Table 2: Confusion Matrix for the Workflow

Workflow Step 2: Validating Questions and Providing
Answers The outcome of the prior micro-task is used as           posed by the crowd workers to each of these terms in the
an input here. A crowd worker is (i) presented with a reg-       prior micro-task. For each of these 45 micro-tasks (5 regula-
ulation statement, along with a validated ambiguous term         tory statements * 3 terms * 3 questions) we received inputs
or phrase and the corresponding set-of questions (ii) asked      from a distinct set of 15 crowd workers. After majority vot-
to validate each question for its meaning, grammar,and ap-       ing, we had 35 valid questions and 10 invalid questions. For
plicability (if the answer actually leads to disambiguation),    Step 3, we selected the same 5 regulation statements with
by providing binary input (Valid/Invalid). (iii) For all the     the same set of 3 ambiguous terms. For each term, we ran-
questions marked as valid, (s)he needs to provide a succinct     domly selected 1 majority-voted question that matched with
answer to the question, which would cause disambiguation.        that from experts. For every question, we randomly selected
We ensure that the set-of crowd workers attempting this          5 answers provided in the earlier step by crowd workers.
micro-task are different than those who have worked on           Thus, we had a total of 75 micro-tasks. For each answer, we
Step 1; or if they are the same set of workers, they do not      expected 5 binary responses from 15 crowd workers. After
get to work on their own set of responses (questions) from       majority voting, we had 40 answers marked as valid and 35,
the earlier step. We use majority voting for consensus on the    invalid. The crowd consensus (majority voting) results were
valid set of questions. Thus, the outcome of this step is set    validated by experts. The results of all three micro-tasks are
of regulation statements containing a term and/or a phrase       illustrated in table 2.
marked as ambiguous for which at least one question is val-      To complete these micro-tasks the crowd workers took 1 to
idated. In addition, each of these questions is accompanied      5 minutes, which is of the same order of the time taken for
by a (set-of) answer(s) provided by the crowd.                   completing the pilot-tasks. However, we received a 100%
                                                                 participation, with higher quality inputs. This shows that the
Workflow Step 3: Validating Answers The outcome of               micro-tasking and the workflow have reduced the cognitive
the prior micro-task serves an input to this micro-task. A       load and achieved a higher crowd engagement.
crowd worker is asked to (i) read the regulatory statement
along with the marked term or phrase, (ii) read the question     Projected Cost Analysis HIPAA has about 5000 regula-
posed on the marked term, and (iii) choose any subset of the     tion statements. A crowd of software engineers working for
answers as valid answers to the posed question, considering      an hour daily at the rate of 4 USD, spending 3 minutes per
the context of the regulation statement. Her response to an      task per worker would cost USD 86.5 Kfor HIPPA anno-
answer would be ‘yes’ if she thinks that the answer is valid;    tations. On the other hand, legal experts working for 200
otherwise it would be ‘no’. We follow the same strategy as       USD per hour (rate validated by legal experts in a personal
discussed in the prior step to select and allocate micro-tasks   communication) at the rate of 1.5 minutes per task would
to the crowd workers. We use majority voting for consensus       cost USD 395 K (4.5 times that of software engineers).
on the valid set of answers.
                                                                            Conclusion and Future Work
           Data Collection and Analysis                          We have early indications of success for disambiguating
For step 1 in the workflow, we selected five regulation state-   regulations by utilizing a crowd consisting of software en-
ments from our expert annotated data. We remind that each        gineers. Our approach could lead to a 4.5 fold reduction in
statement contains terms or phrases that demonstrate all the     cost compared to employing legal experts. In future, we in-
three types of ambiguities. 15 crowd workers marked 46           tend to extend this work to other ambiguity types: referen-
unique set of terms. Of these, 19 terms were majority-voted      tial, incompleteness and vagueness and employ techniques
as ambiguous. For Step 2 in the workflow, for the same set       (such as adaptive task allocation, online Expectation Max-
of five regulation statements, we selected 3 terms/phrases       imization, active learning, etc.) which could help acquire
(one per ambiguity type) for which crowd consensus was           annotations on a large scale, so that, machine/ deep learn-
achieved in step 1 and were in agreement with expert an-         ing algorithms can be trained to provide automated disam-
notations. We also included 3 randomly chosen questions          biguation.
                       References                               Zhao, M., and van der Hoek, A. 2015. A brief perspec-
                                                                tive on microtask crowdsourcing workflows for interface
Adriano, C. M., and van der Hoek, A. 2016. Exploring
                                                                design. In Proceedings of the Second International Work-
microtask crowdsourcing as a means of fault localization.
                                                                shop on CrowdSourcing in Software Engineering, 45–46.
arXiv preprint arXiv:1612.03015.
                                                                IEEE Press.
Ambati, V.; Vogel, S.; and Carbonell, J. 2012. Collaborative
workflow for crowdsourcing translation. In Proceedings of
the ACM 2012 conference on Computer Supported Cooper-
ative Work, 1191–1194. ACM.
Breaux, T. D.; Vail, M. W.; and Anton, A. I. 2006. To-
wards regulatory compliance: Extracting rights and obliga-
tions to align requirements with regulations. In Require-
ments Engineering, 14th IEEE International Conference,
49–58. IEEE.
Dwyer III, S. J.; Weaver, A. C.; and Hughes, K. K. 2004.
Health insurance portability and accountability act. Security
Issues in the Digital Medical Enterprise 72(2):9–18.
Goto, S.; Ishida, T.; and Lin, D. 2016. Understanding
crowdsourcing workflow: modeling and optimizing itera-
tive and parallel processes. In Fourth AAAI Conference on
Human Computation and Crowdsourcing.
Hansen, D. L.; Schone, P. J.; Corey, D.; Reid, M.; and
Gehring, J. 2013. Quality control mechanisms for crowd-
sourcing: peer review, arbitration, & expertise at family-
search indexing. In Proceedings of the 2013 conference on
Computer supported cooperative work, 649–660. ACM.
Hong, J., and Baker, C. F. 2011. How good is the crowd
at real wsd? In Proceedings of the 5th linguistic annotation
workshop, 30–37. Association for Computational Linguis-
tics.
Huang, S.-W., and Fu, W.-T. 2013. Enhancing reliability
using peer consistency evaluation in human computation. In
Proceedings of the 2013 conference on Computer supported
cooperative work, 639–648. ACM.
LaToza, T. D., and van der Hoek, A. 2016. Crowdsourc-
ing in software engineering: Models, motivations, and chal-
lenges. IEEE software 33(1):74–80.
Massey, A. K.; Rutledge, R. L.; Antón, A. I.; and Swire,
P. P. 2014. Identifying and classifying ambiguity for regu-
latory requirements. In Requirements Engineering Confer-
ence (RE), 2014 IEEE 22nd International, 83–92. IEEE.
Massey, A. K.; Rutledge, R. L.; Antón, A. I.; Hemmings,
J. D.; and Swire, P. P. 2015. A strategy for addressing ambi-
guity in regulatory requirements. Technical report, Georgia
Institute of Technology.
New york university hipaa policies.
Raymond, E. 1999. The cathedral and the bazaar. Knowl-
edge, Technology & Policy 12(3):23–49.
Weidema, E. R.; López, C.; Nayebaziz, S.; Spanghero, F.;
and van der Hoek, A. 2016. Toward microtask crowdsourc-
ing software design work. In CrowdSourcing in Software
Engineering (CSI-SE), 2016 IEEE/ACM 3rd International
Workshop on, 41–44. IEEE.
Zhai, H.; Lingren, T.; Deleger, L.; Li, Q.; Kaiser, M.;
Stoutenborough, L.; and Solti, I. 2013. Web 2.0-based
crowdsourcing for high-quality gold standard development
in clinical natural language processing. Journal of medical
Internet research 15(4).