1. Introduction

A Novel Neuro-symbolic Approach to Irony Detection Based on Structural Components of Ironic Statements

Hiroshi Shigenobu

Michal Ptaszynski

Shunsuke Dan

Fumito Masui

Yuzu Uchida

Rafal Rzepka

1 0 Faculty of Engineering, Hokkai-Gakuen University , 1-1, Nishi 11-chome, Minami 26-jo, Chuo-ku, Sapporo, Hokkaido 064-0926 , Japan 1 Faculty of Information Science and Technology, Hokkaido University , Kita-ku, Kita 14, Nishi 9, 060-0814, Sapporo , Japan 2 Text Information Processing Laboratory, Kitami Institute of Technology , 165 Koen-cho, Kitami, 090-8507, Hokkaido , Japan

2026

172 183

This paper introduces a novel neuro-symbolic method for irony detection that ofers both high accuracy and interpretability. Our two-stage approach first uses a Transformer model to translate sentences into symbolic sequences representing their core linguistic components, such as sentiment expressions and irony targets. A machine learning classifier then uses this symbolic representation for the final classification. By explicitly modeling the internal structure of ironic statements, our method outperforms strong end-to-end baselines while providing a transparent, human-readable decision process.

eol>Irony Detection Neuro-symbolic AI Explainable AI Natural Language Processing Computational Linguistics

1. Introduction 2. Related Work

Research in automatic irony detection has primarily followed two broad directions. Early studies mainly employed sequence-based neural models, such as recurrent neural networks (RNNs), including LSTM and GRU architectures, which demonstrated that modeling word order and contextual dependencies was efective for irony detection. These approaches established an important foundation for neural irony classification.

More recent work has increasingly adopted end-to-end supervised classification frameworks based on convolutional neural networks and Transformer architectures, in which models are trained to map raw text directly to ironic or non-ironic labels. A representative study by Chia et al. [ 1 ] established strong baselines for English by comparing various standard classifiers on Twitter data. Similar neural pipeline-based approaches have also been developed for other languages, including Japanese [ 2 ]. While these end-to-end models often achieve high performance, their decision-making processes are dificult to analyze, as predictions are produced without an explicit representation of the internal structure of ironic expressions.

A second line of research focuses on identifying linguistic features that signal irony, with particular emphasis on modeling diferent forms of incongruity. Some studies detect incongruity by incorporating external knowledge sources, such as Wikipedia, to identify factual contradictions [ 3 ]. Others focus on internal semantic conflicts by using lexical or sememe-based resources to capture clashes in word meanings within a sentence [ 4 ]. Another feature-based approach links irony to afective properties of language, using features such as hurtfulness as strong indicators of sarcasm [ 5 ]. Although these approaches allow researchers to relate predictions to explicit linguistic cues, they often depend on manually designed lexicons or external resources. More recent work has also extended irony detection beyond binary classification to finer-grained tasks, such as distinguishing diferent types of irony, including sarcasm and satire [ 6 ].

Our work is positioned between neural end-to-end approaches and feature-based methods. Similar to prior neural models, we exploit the representation learning capabilities of neural networks to capture complex patterns in text. At the same time, rather than relying solely on raw text representations, we explicitly model the internal components of ironic statements by translating them into symbolic sequences. Although the use of neural models means that full interpretability cannot be achieved, the proposed neuro-symbolic framework enables the final classification to be grounded in an explicit and human-readable structural representation. This design allows for a clearer analysis of the factors contributing to irony detection compared to conventional end-to-end neural classifiers, without relying on external knowledge bases or predefined sentiment lexicons.

3. Dataset

Our neuro-symbolic approach requires a dataset annotated with both sentence-level irony labels and token-level structural component tags. This section details the process of creating this resource, from the initial data collection to the multi-stage annotation process.

3.1. Data Collection

The foundation of our corpus is a Japanese dataset of tweets collected by Uozumi et al. [ 2 ] This dataset consists of two parts collected during the same time period, and an overview of the sentence-level re-annotation results is presented in Table 2. • A set of 2,700 tweets collected by searching for the explicit self-declaratory tag hiniku (皮肉, “sarcasm”). This served as the initial set of positive examples. • A set of 2,700 tweets collected randomly from the same period that did not contain this tag. This subset served as the initial set of negative candidates.

For all experiments, occurrences of the sarcasm tag and its surface variants, including diferences in parenthesis width, were removed from the text to ensure that the models did not learn to depend on this superficial feature.

3.2. Data Annotation

The initial keyword-based collection method was inherently noisy, as users do not always apply the “(sarcasm)” tag consistently or accurately. To create a more reliable ground truth, we conducted a two-phase manual annotation process: first at the sentence level to refine the irony labels, and second at the token level to identify the structural components of irony.

3.2.1. Phase 1: Sentence-Level Irony Annotation

To guide our annotation, we first established a working definition of irony based on a review of prior linguistic research: “Irony is a statement in which the speaker uses an evaluative expression that is opposite to their true intention to either afirm or negate a target.” Based on this definition, we manually re-evaluated all 5,400 tweets. Annotators were asked to classify each tweet into one of five categories, taking into account the amount of contextual information available within the tweet itself. The categories were defined as follows: • Ironic: The tweet contains suficient context to be unambiguously understood as ironic. • Probably Ironic: The context is limited, but the phrasing strongly suggests an ironic interpretation. • Ambiguous: The tweet could plausibly be interpreted as either ironic or literal. • Probably Not Ironic: The context is limited, but the phrasing suggests a literal interpretation. • Not Ironic: The tweet contains suficient context to be unambiguously understood as non-ironic.

The results of this re-annotation are summarized in Table 2. A key finding was that the original keyword-based collection method was unreliable. In the dataset collected using the “(sarcasm)” tag, less than half of the tweets were confidently labeled as “Ironic” or “Probably Ironic” with the largest group being “Ambiguous” (842 tweets). Furthermore, a significant portion (25%) was classified as non-ironic. Conversely, the randomly collected dataset contained a small number of tweets (2%) that were identified as ironic. These results highlight the dificulty of irony detection and confirm the need for careful manual annotation rather than relying on noisy self-declaratory tags.

3.2.2. Phase 2: Irony Component Span Annotation

Our working definition suggests that ironic statements are constructed from core components: a target of the irony, the speaker’s true intention, and an opposing surface expression. A preliminary annotation study confirmed these elements and identified three additional pragmatic features that frequently signal ironic intent: modifiers, honorifics, and colloquialisms. Based on this, we established a final tagset of six categories to capture the structural components of irony. The complete tagset is defined in Table 3.

In addition to these core elements, we observed during an initial exploratory analysis of the dataset that several pragmatic expressions repeatedly appeared in sentences judged to be ironic by human annotators. These observations were based on the authors’ manual inspection of the data, rather than on prior theoretical claims established in the literature. Specifically, modifiers, honorific expressions, and colloquial or slang expressions were frequently present in ironic tweets and appeared to contribute to the perceived ironic intent.

Based on this combination of established theoretical insights and our own empirical observations, we defined a final tagset consisting of six categories to capture the structural components of irony. The complete tagset is summarized in Table 3. These tags are designed to represent not only sentiment polarity and its target, but also pragmatic cues that were found by the authors to systematically co-occur with ironic usage in the analyzed data.

For this annotation task, we used the 2,200 tweets classified in Phase 1 as “Ironic,” “Probably Ironic,” or “Ambiguous,” totaling 93,160 characters. To ensure consistency, we created a detailed annotation guideline document1.

We recruited 20 annotators through the Japanese crowdsourcing platform CrowdWorks. The annotators were provided with the guidelines and trained on the task. The annotation was performed using LightTag, a web-based tool well-suited for team-based token classification. Each tweet was independently annotated by two diferent annotators. Any disagreements in tag type or span were resolved by a third, senior annotator to produce the final ground truth. 1The annotation guideline (in Japanese) is available at: https://t.ly/H54nM

3.3. Final Dataset Statistics

The resulting annotated corpus forms the basis for training and evaluating our neuro-symbolic model. Table 4 presents the overall statistics for each of the six tags across the 2,200 annotated sentences. ‘Positive Expression’, ‘Negative Expression’, and ‘Colloquial/Slang’ are the most frequent tags, suggesting they are common components in Japanese irony on Twitter. In contrast, ‘Honorific Expression’ is less common, indicating it is a more specialized device. Notably, a ‘Target’ was identified in only 1,769 of the 2,200 sentences, meaning the target of the irony was implicit in approximately 20% of cases.

For our experiments, we divided the 2,200 annotated sentences into a training set of 1,980 sentences and a test set of 220 sentences, maintaining a 90/10 split. The distribution of tags in both the training and test sets is shown in Tables 5 and 6. The distributions are balanced, ensuring that the model is trained and evaluated on a representative sample of the data.

4. Methods

To systematically evaluate the benefit of modeling the internal structure of irony, we designed experiments to compare two distinct approaches: a standard end-to-end classification model that serves as our baseline, and our proposed two-stage neuro-symbolic method.

4.1. Baseline: End-to-End Irony Classification

Our baseline follows the standard and widely adopted paradigm for text classification. This approach uses a pre-trained Transformer-based language model, such as BERT or RoBERTa, and fine-tunes it for a sentence-level, binary classification task.

In this setup, the model is given the entire raw text of a tweet as input. This text is tokenized and passed through the Transformer’s encoder layers to generate a context-aware representation, typically using the embedding of the special ‘[CLS]’ token. A classification head, usually a single linear layer with a softmax or sigmoid activation function, is added on top of the encoder. The entire model is then ifne-tuned end-to-end on our labeled dataset to predict a single binary label: ‘Ironic’ or ‘Not Ironic’. This method is powerful because it can learn complex, non-linear relationships directly from the text, but its decision-making process is opaque, functioning as a “black box.” A wide range of pre-trained Japanese models were evaluated using this approach to establish a strong performance baseline.

4.2. Proposed Neuro-Symbolic Method

In contrast to the end-to-end baseline, our proposed method is a two-stage neuro-symbolic pipeline designed to be both efective and interpretable. The core idea is to first convert the unstructured text of a sentence into a structured, symbolic representation based on its ironic components, and then perform the final classification based on this representation. The complete workflow of this method is illustrated in Figure 1.

4.2.1. Stage 1: Automatic Annotation of Irony Components

The first stage of our pipeline is responsible for translating the input text into a symbolic sequence. To accomplish this, we re-purpose a pre-trained Transformer model to perform a token classification task. Specifically, we fine-tune the model on our dataset described in Section 3.2.2, where each token in a sentence is labeled with one of the six irony component tags (Target, Positive Exp., Negative Exp., Modifier Exp., Honorific Exp., Colloquial/Slang) or a standard ‘O’ tag for tokens that do not belong to any of these categories.

The model architecture consists of the pre-trained Japanese language model followed by a linear layer that outputs a probability distribution over the possible tags for each token. By training on our manually annotated data, the model learns to identify the spans of text that correspond to each structural component of irony. The output of this stage is the original sentence where key phrases have been annotated with our symbolic tags.

4.2.2. Stage 2: Symbolic Classification of Tag Sequences

The second stage performs the final irony classification using only the sequence of symbolic tags generated by Stage 1, completely disregarding the original words. This abstraction forces the model to base its decision on the detected linguistic structure rather than on specific lexical items.

The process involves two steps: 1. Feature Extraction: The sequence of predicted tags (e.g., [NEG] [OBJ] [POS]) is first converted into a numerical feature vector. We explore several standard text featurization techniques for this task, including Bag-of-Words (based on tag counts), TF-IDF weighting, and n-grams (e.g., unigrams, bigrams, skip-grams) to capture patterns in how the tags are ordered. For example, a bigram feature would capture the co-occurrence of a Negative Exp. followed by a Positive Exp., a common structure in irony. 2. Classification: The resulting feature vector is then used as input to a traditional machine learning classifier. To identify the most suitable algorithm for this task, we conduct a comprehensive evaluation of multiple classifiers, including ensemble methods like AdaBoost and Random Forest, Support Vector Machines (SVC), and Naive Bayes models.

By decoupling the process into these two stages, this method allows for a highly interpretable final decision. The success or failure of a classification can be traced back to the specific sequence of structural components identified by the neural model in Stage 1.

5. Experiments

To evaluate our proposed neuro-symbolic method, we conducted a series of experiments designed to answer three key questions: 1. What is the performance of standard, end-to-end Transformer models on this irony detection task? This establishes a strong baseline for comparison. 2. How accurately can a neural model perform Stage 1 of our pipeline, i.e., extracting the structural components of irony? 3. Does our proposed two-stage neuro-symbolic method outperform the end-to-end baseline, and what do its internal components reveal about the structure of irony?

5.1. Experiment 1: Baseline End-to-End Classification 5.1.1. Experimental Setup

To create a clear binary classification task, we constructed a dataset from our sentence-level annotations (Section 3.2.1). Sentences labeled “Ironic” or “Probably Ironic” were combined to form the positive class, while those labeled "Not Ironic" or “Probably Not Ironic” formed the negative class. Sentences labeled “Ambiguous” were excluded from this experiment. To ensure a balanced dataset, we randomly sampled from these categories to create a training set of 2,222 texts (1,111 ironic, 1,111 not ironic) and a test set of 248 texts (124 ironic, 124 not ironic).

We evaluated 22 publicly available, pre-trained Japanese language models. Each model was fine-tuned on our training set for the binary classification task. We tested two diferent learning rates, 1e-4 and 1e-5, to assess the impact of this hyperparameter on performance. We report accuracy (Acc), precision (Prec), recall (Rec), and the F1-scores for the positive (ironic) and negative (not ironic) classes, as well as the macro F1-score.

5.1.2. Results and Discussion

The results are presented in Table 7 (learning rate 1e-4) and Table 8 (learning rate 1e-5).

With a learning rate of 1e-4, performance varied significantly, and several models failed to converge, resulting in F1-scores of 0. However, when the learning rate was lowered to 1e-5, the results improved across the board. The models that previously failed to learn now achieved respectable scores, and nearly all other models showed an increase in performance. This indicates that a lower learning rate of 1e-5 is more suitable for this task.

Under the optimal learning rate of 1e-5, the top-performing model was ku-nlp/roberta-base-japanesechar-wwm, which achieved a macro F1-score of 0.739 and an accuracy of 0.742. This model demonstrated a particularly strong ability to identify ironic statements (F1-positive score of 0.768), making it a robust and challenging baseline for our proposed method. Based on these results, we selected this model for all subsequent comparisons.

5.2. Experiment 2: Performance of Irony Component Extraction 5.2.1. Experimental Setup

This experiment evaluates Stage 1 of our pipeline: the automatic annotation of irony components. We ifne-tuned the top 6 performing models from Experiment 1 on our token classification dataset (Section 3.3). The task is to predict the correct tag (Target, Positive Exp., etc.) for each token in a sentence.

We used the seqeval framework for evaluation, which is standard for named entity recognition and other token-level tasks. This metric computes precision, recall, and F1-score based on exact matches of both the tag category and the span of tokens for each annotated entity.

5.2.2. Results and Discussion

The overall token classification performance is shown in Table 9. The best performance was achieved by ku-nlp/roberta-base-japanese-char-wwm with a learning rate of 1e-4, reaching a macro F1-score of 0.616. This model is a RoBERTa-based architecture trained on Japanese text using a character-level tokenization scheme combined with whole-word masking, which allows it to robustly handle the lack of explicit word boundaries in Japanese. This result confirms that modern Transformer architectures are capable of learning to identify these abstract, functional components of irony with reasonable accuracy.

5.3. Experiment 3: Performance of the Neuro-Symbolic Method

This final set of experiments evaluates the full two-stage pipeline and compares it to the baseline.

5.3.1. Analysis of Symbolic Features

First, to understand which structural patterns are most indicative of irony, we analyzed the predictive power of diferent n-gram features extracted from the ground-truth tag sequences. We used a logistic regression model to measure the contribution of each n-gram type. Table 11 shows the results.

The 2-gram analysis reveals that the sequence ‘NEG_POS’ (a negative expression followed by a positive one) is the single most predictive feature of irony. This provides strong empirical evidence for the classic linguistic theory of irony as a clash of sentiments. Other important bigrams like ‘MOD_COL’ and ‘HNF_COL’ show that the interplay between modifiers, politeness, and informal language is also a key structural signal. The 3-gram ‘POS_HNF_COL’ further reinforces this, indicating that a positive statement made politely but ending with a colloquialism is a powerful ironic pattern.

5.3.2. Selection of the Symbolic Classifier

Next, we evaluated 27 diferent machine learning classifiers from the Scikit-learn library for Stage 2 of our pipeline. Each classifier was trained on TF-IDF features derived from the tag sequences predicted by our best Stage 1 model. The goal was to find the most efective algorithm for classifying these symbolic representations.

As shown in Table 12, the AdaBoostClassifier achieved the highest weighted F1-score (0.77), closely followed by Bernoulli Naive Bayes (0.75). The strong performance of AdaBoost, an ensemble method, suggests it is well-suited to capturing the complex, non-linear interactions between the diferent irony components. Based on this, we selected AdaBoost as the default classifier for Stage 2.

5.3.3. Final Performance Comparison

Finally, we compared the performance of three models on the test set: 1. RoBERTa (Baseline): The end-to-end ku-nlp/roberta-base-japanese-char-wwm model from Experiment 1. 2. AdaBoost (SymbolicOnly): The AdaBoost classifier trained on the tag sequences predicted by the Stage 1 RoBERTa model. 3. Proposed Method (Hybrid): A hybrid model that uses the AdaBoost prediction for sentences where at least one irony tag is detected. If the Stage 1 model predicts no tags, it falls back to using the prediction from the end-to-end RoBERTa baseline. This ensures that the model can handle both structurally explicit irony and more subtle, contextual cases.

The results are summarized in Table 13. Our Proposed Hybrid Method achieved the best performance across all metrics, with an accuracy of 0.7863 and a macro F1-score of 0.7829. This represents a substantial improvement of over 4.4 percentage points in both accuracy and F1-score compared to the strong RoBERTa baseline. The symbolic-only AdaBoost model also performed competitively, slightly outperforming the baseline, which demonstrates the power of the structural features alone.

To assess the statistical significance of these improvements, we conducted a McNemar’s test. The comparison between the Proposed Method and the RoBERTa baseline yielded a p-value of 0.267. While this does not meet the conventional threshold for statistical significance ( = 0.05), likely due to the limited size of our test set (n=248), the magnitude of the performance gain is practically meaningful. The results strongly suggest that explicitly modeling the structural components of irony provides a significant advantage over standard end-to-end approaches. Note: Bold indicates the best performance. ‘ns‘ indicates the result is not statistically significant ( = 0.05).

6. Discussion

This study introduced and evaluated a neuro-symbolic method for irony detection, comparing it against standard end-to-end models. This section interprets the experimental findings, acknowledges the limitations of the work, and discusses the ethical implications of this research.

6.1. Interpretation of Results

Our experiments showed that explicitly modeling the internal structure of irony leads to a more accurate and interpretable system. The proposed hybrid neuro-symbolic method signicfiantly outperformed a strong RoBERTa baseline, and the analysis of its symbolic features validated core linguistic theories. For instance, the high predictive power of the NEG_POS bigram confirmed that a clash of opposing sentiments is a key structural marker of irony. The success of the final hybrid model highlights the complementary strengths of its components: the symbolic classifier excels at identifying irony with clear structural cues, while the end-to-end neural model, used as a fallback, captures more subtle cases that depend on deeper contextual understanding. This synergy creates a more robust and comprehensive detection system.

6.2. Study Limitations

Our findings are promising, but the study has limitations. The primary challenge is the dificulty of the token-level component extraction task. Performance was constrained by the variable length of expressions like Negative Expression and a strict, exact-match evaluation metric that may underestimate the model’s practical ability. This dificulty was compounded by the nature of our dataset, which consists of short, context-dependent texts from Japanese Twitter that are often inherently ambiguous. Furthermore, our annotation schema has some boundary issues, such as the occasional overlap between the Colloquial/Slang tag and sentiment-bearing expressions. These factors mean the generalizability of our current model to other domains or languages requires further investigation.

6.3. Ethical Considerations

The application of this technology, particularly in content moderation, requires careful ethical consideration. Models trained on public social media data risk developing biases that unfairly penalize the linguistic norms of specific communities. Since irony is often used for humor and social bonding, misclassifications could lead to unwarranted censorship. Therefore, any deployment of this technology should incorporate human oversight and provide clear channels for users to appeal automated decisions. The interpretability of our method is intended to assist, not replace, human judgment, as over-reliance on its structural rules could create new, rigid biases.

7. Conclusion

In this paper we introduced a novel, two-stage neuro-symbolic approach for irony detection that improves both performance and interpretability over standard end-to-end models. By first translating sentences into a symbolic representation of their core linguistic components, our method empirically validated that irony is constructed from predictable structural patterns, such as the clash of opposing sentiments. Our hybrid model, which combines this structural analysis with a powerful neural baseline, achieved a significant performance gain, confirming that explicitly modeling linguistic structure is a highly efective strategy for this task. This research provides a foundation for developing more transparent and reliable NLP systems for nuanced language understanding.

Future work will focus on three key areas. First, we will enhance the accuracy of the initial component extraction stage by refining our annotation schema and expanding the training dataset. Second, we will test the generalizability of our method on diferent domains and adapt it for other languages. Finally, we plan to explore more sophisticated strategies for integrating the symbolic and neural components of our hybrid model to further improve its performance.

Declaration on Generative AI

During the preparation of this work, the authors used Gemini 2.5 Pro in order to correct grammar and spelling.

[1]

Z. L.

Chia ,

Ptaszynski ,

Masui , G. Leliwa,

Wroczynski , Machine learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection , volume 58 , Elsevier , 2021 , p. 102600 . doi: 10 .1016/j.ipm. 2021 . 102600 .

[2]

Uozumi ,

Uchida , Hiniku kenshutsu ni okeru kanjo¯ seiki yo¯in no yu¯ko¯sei [efectiveness of emotional factors in sarcasm detection] , The 17th Forum on Information Science and Technology , Volume 2 ( 2018 ) 163 .

[3]

Ren ,

Wang ,

Peng ,

Ji , A knowledge-augmented neural network model for sarcasm detection , Information Processing & Management 60 ( 2023 ) 103521 . doi: 10 .1016/j.ipm. 2023 . 103521 .

[4]

Wen ,

Gui ,

Wang ,

Guo ,

Yu ,

Du ,

Xu , Sememe knowledge and auxiliary information enhanced approach for sarcasm detection , Information Processing & Management 59 ( 2022 ) 102883 . doi: 10 .1016/j.ipm. 2022 . 102883 .

[5]

Frenda ,

A. T.

Cignarella ,

Basile ,

Bosco ,

Patti ,

Rosso , The unbearable hurtfulness of sarcasm , Expert Systems with Applications 193 ( 2022 ) 116398 . doi: 10 .1016/j.eswa. 2021 . 116398 .

[6]

Wen ,

Wang ,

Gui ,

Long ,

Chen ,

Liang ,

Yang , R. Xu, FGVIrony: A Chinese dataset of fine-grained verbal irony , Information Processing & Management 62 ( 2025 ) 104169 . doi: 10 .1016/j.ipm. 2025 . 104169 .