=Paper=
{{Paper
|id=None
|storemode=property
|title=Risk Analysis and Prevention in Procedures: Extraction and Preliminary Results
|pdfUrl=https://ceur-ws.org/Vol-674/Paper14.pdf
|volume=Vol-674
|dblpUrl=https://dblp.org/rec/conf/ekaw/Saint-Dizier10
}}
==Risk Analysis and Prevention in Procedures: Extraction and Preliminary Results==
Risk Analysis and Prevention in Procedures: extraction
and preliminary results
[Extended Abstract]
Patrick Saint-Dizier
IRIT-CNRS 118 route de Narbonne 31062 Toulouse cedex France
stdizier@irit.fr
1. PROBLEMATICS times as important as instructions: they provide a strong
Maintenance operations as well as production launches are and essential internal cohesion and coherence to procedural
essentially based on procedures which describe how to install texts. They also indicate, among other things, the difficul-
and use a product and how to maintain it. Due to the com- ties, the risks to avoid, and the consequences on the target
plexity of to-day’s equipements, and to the complexity of goal of an incorrect or incomplete execution of the associated
their interactions it is difficult to maintain up-to-date docu- instruction.
mentations. These procedural documents become more and
more complex, even if simplified language constraints and This is realized in the [3] project, where a
revision scenarios are imposed. According to several anal- number of structures are tagged. An example, in readable
ysis, out of 377 technicians working in different domains, form, from didactics, is given hereafter.
45% of them indicate that they have identified major er-
rors in maintenance documents. About 75% indicate that
there are major gaps (missing instructions) or obscure or im- 2.1 Measuring the intrinsic difficulty rate d of
complete instructions, and 78% admit that often need help an instruction
because they feel they are not operating the right way. We It is of much interest to be able to measure the inherent com-
are all confronted to situations where we wish to follow in- plexity or difficulty of an instruction. This notion obviously
structions (DIY, software installation, etc.) with pictures, depends on the reader profile. Nevertheless, we think that
diagrams, etc. and that these are not understandable, have some linguistic features introduce some inherent difficulties
obvious gaps or do not correspond to the situation at stake. in any situation.
In some industrial areas, such difficulties are common and
lead to accidents (aeronautics, nuclear energy, health, etc.).
The most frequently encountered parameters are, informally:
Risk analysis and prevention are therefore a major concern. - presence of ’complex’ manners (e. g. very slowly), by com-
plex we mean either a manner which is inherently difficult
to realize or a manner reinforced by an adverb of intensity,
2. A DOMAIN-INDEPENDENT ANALYSIS - technical complexity of the verb or the verb compound
OF RISKY SITUATIONS FROM TEXT ANAL-used: if most instructions include a verb which is quite sim-
YSIS ple, some exhibit quite technical verbs, metaphorical uses,
Procedural texts consist of a sequence of instructions, de- or verbs applied to unexpected situations, for which an elab-
signed with some accuracy in order to reach a goal (e.g. oration is needed.
assemble a computer) [2, 3, 7, 8]. Procedural texts are com- - duration of execution as specified in the instruction (the
plex structures, they often exhibit a quite complex ratio- longer the more difficult),
nal (the goal-instructions) and ’irrational’ structure which - synchronization between actions, in particular in instruc-
is mainly composed of advice, conditions, preferences, eval- tional compounds,
uations, user stimulations, etc. They form what we call the - uncommon tools, or uncommon uses of basic tools (open
explanation structure [6], which motivates and justifies [4] the box with a sharp knife) however this is quite difficult to
the goal-instructions structure, viewed as the backbone of chracaterize, besides statistical analysis (e.g. via bootstrap-
procedural texts. A number of these elements are forms of ping on the net),
argumentation [1, 9], they appear to be very useful, some- - presence of evaluation statements or resulting states, for
example to indicate the termination of the action (as soon
as the sauce turns brown add flour).
For some of these criteria, some application-dependent knowl-
edge linguistic resources are needed: some lexical data, basic
ontological data, and a few business rules. These observa-
tions allow us to introduce a very preliminary measure of
complexity. To be able to have an indicative evaluation,
each of the points above counts for 1, independently of its
importance or strength in the text. Complexity c therefore
[procedure [purpose Writing a paper: [elaboration Read light sources, then thorough ]]
[assumption/circumstance Assuming you’ve been given a topic,]
[circumstance When you conduct research], move from light to thorough resources [purpose to make sure you’re moving in the
right direction].
Begin by doing searches on the Internet about your topic [purpose to familiarize yourself with the basic issues;]
[temporal−sequence then ] move to more thorough research on the Academic Databases;
[temporal−sequence finally ], probe the depths of the issue by burying yourself in the library.
[warning Make sure that despite beginning on the Internet, you don’t simply end there.
[elaboration A research paper using only Internet sources is a weak paper, [consequence which puts you at a disadvantage... ]]]
While the Internet should never be your only source of information, [contrast it would be ridiculous not to utilize its vast sources
of information. [advice You should use the Internet to acquaint yourself with the topic more before you dig into more academic
texts. ]]]
Figure 1: The explanation structure annotated in a procedure
ranges from 0 to 6. The complexity rate di of instruction i action, the set of warnings associated with it. An action is
is c/6 to keep it in [0,1]. characterized by a verb and its object argument(s), whatever
their position in the instruction. Following argumentation
theory, instructions with warnings have the following form:
2.2 Measuring the expliciteness rate t of an in- instruction because warning, as in
struction Carefully plug-in the mother card vertically, otherwise you
Expliciteness characterizes the degree of accuracy of an in- will damage the connectors, where the otherwise section is
struction. Several marks, independently of the domain, con- the support: it indicates the risks of not doing the action
tribute to making more explicit an instruction: correctly. In this work, if the action is ’plug-in the mother
- when appropriate: existence of means or instruments, card’ the risks are the list of those warnings associated with
- pronominal references as minimal as possible, and predi- it over the whole corpus.
cate argument constructions as comprehensive as possible,
- length of action explicit when appropriate (stir for 10 min- 4. PERSPECTIVES
utes), In this short paper, we presented the main lines of a pre-
- list of items to consider as explicit and low level as possible liminary approach to risk identification in procedures. This
(mix the flour with the sugar, eggs and oil), is a huge problem in the industry, to prevent accidents (hu-
- presence of an argument, advice or warning, mans and ecological). We proposed a simple solution to
- presence of some help elements like images, diagrams, etc. capture domain dependent knowledge acquired from proce-
- presence of elaborations, illustrations or goal specification, dure warnings. Obviously, this is just one useful facet of
- presence of a frame or a condition to limit the scope of the the problem, since a lot of knowledge is implicit and almost
action. never expressed. Our users estimates is that we cover about
40% of the risks using this approach.
Those criteria may be dependent on the domain, for exam-
ple length of an action is very relevant in cooking, somewhat
in do-it-yourself, and much less in the society domain. Sim- 5. REFERENCES
ilarly as for d, each item counts for 1 at the moment, ex- [1] Amgoud, L., Parsons, S., Maudet, N., Arguments,
pliciteness e therefore ranges from 0 to 8. The expliciteness Dialogue, and Negotiation, in: 14th European
rate is ti = e/8 to keep it in [0,1]. Note also that the higher Conference on Artificial Intelligence, Berlin, 2001.
ti is, the more chances the instruction has to succeed since [2] Di Eugenio, B. and Webber, B.L., Pragmatic
it is very explicit and has a lot of details. Overloading in Natural Language Instructions,
International Journal of Expert Systems, 1996.
Now, if we consider the product di × (1 − ti ), the more [3] Fontan, L., Saint-Dizier, P., Analyzing the explanation
it tends towards 1, the higher the risk is for the action to structure of procedural texts: dealing with Advices and
fail. Therefore, when di is high, it is also necessary that Warnings, STEP conference, Venice, August 2008.
ti is high to compensate the difficulty. Given that di re- [4] Moens, M-F , Boiy, E. , Mochales Palau R. , Reed, C.,
mains unchanged (if the instruction cannot be simplified), Automatic Detection of Arguments in Legal Texts, in
the strategy is then to increase ti as much as possible. Proceedings of the Eleventh International Conference on
Artificial Intelligence and Law, ACM Press, NY, 2007.
[5] Pollock, J.L., Knowledge and Justification, Princeton
3. A DOMAIN-DEPENDENT ANALYSIS OF university Press, 1974.
RISKS [6] Reed, C., Generating Arguments in Natural Language,
A number of factors of risk are clearly domain-dependent. PhD dissertation, University College, London, 1998.
The difficulty is to be able to identify and evaluate risks [7] Takechi, M., Tokunaga, T., Matsumoto, Y., Tanaka,
without any access to a deep semantic analysis of the dif- H., Feature Selection in Categorizing Procedural
ferent actions of the domain at stake since this is seldom Expressions, IRAL2003, pp.49-56, 2003.
available. [8] Walton, D., Reed, C., Macagno, F. (eds),
Argumentation Schemes, Cambridge University Press,
In a first stage, as an exploration, our strategy is to extract 2008.
from a large corpus of documents of the domain, for each