1. Introduction

TyRaL: End-to-End Document-level Relation Extraction via Type-Constrained Rule Learning

Mierzhati Alimu

Chaochao Du

Xiaowang Zhang

0 0 College of Intelligence and Computing, Tianjin University , Tianjin, 300350 , China

In recent years, Document-level Relation Extraction (DocRE) has encountered significant challenges in capturing complex entity relationships and reasoning over long-range dependencies. Existing methods primarily focus on learning implicit representations or applying chain-like logical rules, but they often overlook diferences in entity types and the significance of type constraints, potentially leading to errors in relation reasoning. This poster introduces a type-constrained enhanced chain-like rule (TC rule) and proposes an end-to-end document-level relation extraction framework (TyRaL) to address this issue. By incorporating a novel rule reasoning module, TyRaL transforms the discrete rule learning problem into a parameter optimization task in continuous space, enabling both explicit and implicit learning of entity type constraint rules and thereby enhancing the model's logical consistency and interpretability. Experimental results on the standard DWIE dataset show that TyRaL significantly outperforms existing rule-enhanced methods in both F1 and Ign F1 metrics. It demonstrates superior logical modeling and semantic reasoning capabilities while ofering new perspectives and solutions for research in the DocRE field.

eol>Document-level Relation Extraction Logical Rules Type Constraints

1. Introduction Input Document

[ 1 ]Prince Harry gets engaged to actress Meghan Markle. [ 2 ]Britain's Prince Harry is engaged to his US partner Meghan Markle, his father Prince Charles has announced. [ 3 ]... and the couple are to live in Kensington Palace. [ 4 ]Ashwathy Kurup, better known by her stage name Parvathy, is an Indian film actress and classical dancer ...

Type-Constrained Rule Reasoning

B a c k b o n e l o g i t s ×N

Module

Rule N Predicate selection Predicate selection forPcrhedaincaetdeasteolmecstion for chained atoms for chained atoms Predicate selection Predicate selection for Ptyrepdeicaotnestsrealienctsion for type constraints for type constraints l = 1 thru

Backbone

Entity embedding Logsumexp SoftMax Classifier Rule N Predicate selection Predicate selection forPcrhedaincaetdeasteolmecstion for chained

atoms for chained atoms Predicate selection Predicate

selection for Ptyrepdeicaotnestsrealienctsion for type constraints for type constraints

l = 2 Residual connection

L o g i t outperforms existing rule-based models in logical consistency and relation extraction.

Training

Loss λＬ1

Loss Ｌ2 Ｌ

Logical rules hasChild(x,y) ← hasSpouse(x,z) ∧ hasChild(z,y) hasFather(x,y) ← hasParent(x,z) ∧ Male(y)

··· hasFather(x,y) ← hasParent(x,z) ∧ (brotherOf(y,u) ∨ uncleOf(y,v)) maleLeadOf(x,y) ← actCharacter(x,z) ∧ CharacterOf(z,y) ∧ mainCharacter(z) ∧ (brotherOf(x,u) ∨ uncleOf(x,v)) I I I d d d t t t h h h (Meghan Markle, Harry)

proach 2.1.

Proble m Definition

Given a

document the semantic relation

∈ ∪ predefined relation types, and the document, denoted as { {NA}

NA } =1 containing a set of named entities

between all distinct entity pairs represents no relation.

An = { } =1 entity , the goal of

DocRE is to predict ( ℎ , ), where denotes a set of may have multiple mentions in .

The existence of a relationship between entities needs to be judged based on comprehensive contextual evidence between these mentions in the document.

An original

DocRE model usually calculates a score vector ( ℎ , , ) ∈ ℝ ||+1 for each entity pair, where the k-th element represents the logit value of the k-th relation type, and the last element corresponds to ”no relation”

NA.

During the training phase,

Binary

Cross-Entropy (BCE) or

Adaptive Thresholding (AT) loss functions are usually used.

In the inference phase, the model uses an activation function (such as

Softmax) to map logits to probability values, and filters them according to a threshold to predict the set of relation triples, which is in the form: = {( , , ℎ ) ∣ [ ( ( ℎ , , ) )] > } where

is the set confidence threshold. 2.2.

Chain-like and Type-Constrained Rules

We introduce an interpretable logical rule structure to model implicit semantic paths between entities in a document.

Define a binary variable ( , ) to indicate whether the relation ∈ holds between entities and .

W hen the relation is true, ( , 1; otherwise, ( , ) = 0.

A chain-like logical rule consists of a rule head a rule body.

The rule

head represents the target ) = and relation ℎ ( , ), and the rule body is a conjunction of binary atoms, where each body atom shares a variable with the adjacent previous atom and another variable with the adjacent next atom, forming a chain structure.

The general form

of a chain-like logical rule is as follows: ℎ ( , ) ←

∈ are entity types, and are intermediate relation paths. This rule not only depends on the relational path structure but also requires each entity node on the path to satisfy specific type conditions, thereby improving the semantic rationality and interpretability of the rule. 2.3. Type-Constrained Rule Reasoning Module the downstream relation prediction target.

Let

be the maximum number of rules to be learned, the maximum number of atoms in each rule, and define the extended relation set as ∗ = ∪

− ∪{ } , where = { }1≤≤ denotes the original relation set, − = { }+1≤≤2 logit +(, , ) ∈ ℝ all 1 ≤ ≤ , and [ +(, , )] 2+1 , where [ +(, , )] = [ ( (, , ))]

and [ +(, , )] + = [ ( ( , , ))] the inverse relations, and = 2+1 the identity relation. We define the extended for 2+1 = 1 if = , or 0 otherwise, with denoting the sigmoid function.

The goal of our rule reasoning module is, given an entity pair (, ) ∈ × and a document , to estimate a truth degree ,,, ( ,)

for each relation ∈ ∗, indicating whether the relation can be inferred through at most type-constrained rules with length . For each original relation ∈ , the -th rule (1 ≤ ≤ ), and the -th rule atom (1 ≤ ≤ ), the intermediate truth degree ,,, (,) is defined as ⎧ (,1) ⎪ ,,, ⎨ (,) ⎪ ,,, ⎩ 2+1 =1 = (,1) ( ) (,+1) ( ) ∑ (,,1) [ + (, , )] ,

= (,+1) ( ) ∑ (,,) 2+1 =1

∑ (, , )∈ × ∗×

= 1 (,−1) [ + (, , ,,, )] , 2 ≤ ≤ follows: where ,, ditions: (,) ,,, = (4) (5) (6) ∈ [ 0, 1 ]2+1 is the predicate selection weight of the -th atom in the -th rule, normalized by Softmax to approximate one-hot, simulating the predicate selection process.

(,) ( )is a type constraint function representing the score that entity satisfies specific type con ∑ ℎ

=1 (,) ( )= 01 ( (,,) (,,) ( (, , )∈ ) + (,,) 2 ∑ ℎ+ =1 (,,) )) where 01() = max(min(, 1), 0), ℎ,, ∈ [ 0, 1 ]+2 is trainable type selection weights, and denotes the interaction between entity and relation . The parameters (,,) and (,,) control = ⊤ whether explicit and implicit type constraints are applied.

The ultimate truth degree is calculated by aggregating the intermediate degrees of N rules: ( ,) ,,,

= ∑ =1 ()

(,) ⋅ ,,, Where

() ∈ [ −1, 1 ] is the confidence of rule , normalized by the Tanh activation function.

Then, we define the final logit prediction by combining output logits from the original DocRE model with the ultimate truth degrees from the type-constrained rule reasoning module: ( , , ) = [ ( , , )

( ,) ] + ,,, (7)

3. Experiments

We uniformly denote the enhanced model as TyRaL-X, where represents the name of the original DocRE model. Table 1 shows the experimental results of TyRaL on the DWIE dataset. The results indicate that TyRaL achieves stable and significant performance on all integrated DocRE backbone models, and is comprehensively superior to the original models in F1 and Ign F1 metrics, demonstrating good generality and robustness. Compared with the current state-of-the-art rule-enhanced methods, CaDRL and JMRL, TyRaL introduces key innovations in logical modeling. CaDRL relies on diferentiable chain-like rule learning to improve logical consistency, while JMRL alleviates the error propagation problem through a joint training mechanism. In contrast, TyRaL proposes more refined type-constrained rules, significantly expanding the expressive power of the rules and enabling the capture of more fine-grained semantic constraints and structural relationships between entity types—rules of this kind have not been systematically modeled in existing methods. In our experiments, we adopt the F1 metric. However, some relational facts appear in both the training and the dev/test sets. As a result, a model may memorize these relations during training and achieve artificially high performance on the dev/test set, introducing evaluation bias. Such overlap is inevitable, since many common relational facts are likely to occur across diferent documents. Therefore, we also report the F1 scores after excluding those relational facts shared by the training and dev/test sets, which we denote as Ign F1.

4. Limitation

While our study has made some progress, several limitations remain. First, the experiments were conducted exclusively on the DWIE dataset, which raises concerns about the generalizability of the findings to other domains and datasets. In addition, the current evaluation relies primarily on quantitative metrics and lacks case studies. We plan to address these limitations in future work.

5. Conclusion

In this poster, we propose an end-to-end learning framework, TyRaL, featuring a type-constrained rule reasoning module that simulates logical rules to enhance reasoning ability. Experiments on the DWIE dataset demonstrate its efectiveness and superiority. Future work will explore integrating logical constraints into large language models to discover more accurate and generalizable rules.

Acknowledgments

This work was supported by the Project of Science and Technology Research and Development Plan of China Railway Corporation (N2023J044).

Declaration on Generative AI

During the preparation of this work, we used ChatGPT in order to: Grammar and spelling check. After using this tool, we reviewed and edited the content as needed and take full responsibility for the publication’s content.

[1]

Fan ,

Mo ,

Niu , Boosting document-level relation extraction by mining and injecting logical rules , in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , 2022 , pp. 10311 - 10323 .

[2]

Liu ,

Zhu ,

Zhang ,

Feng ,

Chen ,

Li , Document-level relationship extraction by bidirectional constraints of beta rules , in: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , 2023 , pp. 2256 - 2266 .

[3]

Zhang , P. Wu,

Yu ,

Wu ,

Zheng ,

Huang ,

Zhu ,

Peng ,

Zan ,

Song , Cadrl: Document-level relation extraction via context-aware diferentiable rule learning , in: Proceedings of the 31st International Conference on Computational Linguistics , 2025 , pp. 8272 - 8284 .

[4]

Qi ,

Du ,

Wan , End-to-end learning of logical rules for enhancing document-level relation extraction , in: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long

Papers)

, 2024 , pp. 7247 - 7263 .

[5]

Zaporojets ,

Deleu ,

Develder , T. Demeester, Dwie: An entity-centric dataset for multi-task document-level information extraction , Information Processing & Management 58 ( 2021 ) 102563 .

[6]

Yao ,

Ye ,

Li ,

Han ,

Lin ,

Liu ,

Huang ,

Zhou ,

Sun , Docred: A large-scale document-level relation extraction dataset , arXiv preprint arXiv: 1906 . 06127 ( 2019 ).

[7]

Zeng ,

Xu ,

Chang ,

Li , Double graph based reasoning for document-level relation extraction , arXiv preprint arXiv: 2009 . 13752 ( 2020 ).

[8]

Zhou ,

Huang , T. Ma, J. Huang, Document-level relation extraction with adaptive thresholding and localized context pooling , in: Proceedings of the AAAI conference on artificial intelligence , volume 35 , 2021 , pp. 14612 - 14620 .