<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Highlight Generation from Scientific Papers Using XGBoost Regressor Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Papun Roy</string-name>
          <email>roypapun.md@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kamal Sarkar</string-name>
          <email>jukamal2001@yahoo.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Forum for Information Retrieval Evaluation</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>In this work, we present a simple and efective method to automatically generate highlights from scientific papers or their abstracts. The goal is to identify the most important sentences in an abstract that best represent its key ideas. We use two diferent models, one is the XGBoost Regressor Model and the XGBoost Classifier Model. In the XGBoost Classifier Model, each sentence is determined by comparing it with the reference highlight; if the sentence overlaps significantly with the highlight, it is assigned 1; otherwise, it is assigned 0. The XGBoost Regressor Model captures the gradient of importance, allowing finer ranking to score each sentence based on how similar it is to the human-written highlights (like 0.4, 0.9, 0.6, etc.). The models learn from these scores and select the top sentences to produce highlights. We tested our methods on training, validation, and test datasets. The results show that the regressor model performs better than the classifier model in most cases, especially when using ROUGE-1, ROUGE-2, and ROUGE-L evaluation metrics. In this shared task, out of many participating teams, 12 were selected, and our team secured the 8th rank. However, the diference between our XGBoost Regressor model's result and that of the top-ranked team is very small. Our approach is easy to apply and can be useful for automatic summarization tasks in scientific writing.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Highlight Generation</kwd>
        <kwd>Scientific Papers</kwd>
        <kwd>XGBoost</kwd>
        <kwd>Regressor</kwd>
        <kwd>Classifier</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Scientific research papers often include a short summary section called "highlights" that captures the
main points of the work. These highlights help readers quickly understand the key contributions of a
paper. However, writing highlights manually can be time-consuming for authors, especially when
dealing with many documents. This has created a need for automatic methods to generate highlights
from scientific texts.</p>
      <p>In this work, we focus on generating highlights from the abstract section of scientific papers.
Abstracts are already short summaries, but they often contain multiple sentences that vary in
importance. Our goal is to identify the most important sentences from the abstract and use them as
highlights.</p>
      <p>We explore machine learning techniques for this task, comparing classification and regression-based
models. Our experiments show that the regression-based XGBoost model outperforms the classifier,
ofering finer-grained and more accurate sentence ranking. The model is trained using sentence-level
features and uses ROUGE-based similarity scores as supervision.</p>
      <p>This paper presents a simple, fast, and efective pipeline for sentence-level highlight prediction. We
evaluate our approach using standard metrics and test it on separate datasets to confirm its general</p>
      <p>Our main contributions are:
1. We build a sentence-level highlight prediction system using XGBoost Regressor as well as
Classiifer.
2. We use simple string overlapping (for the Classifier) and ROUGE-L similarity (for the Regressor)
scores between sentences and reference highlights as training labels.
3. We compare regression and classification approaches and show that regression performs better.
4. We provide a complete pipeline that includes training, validation, and testing.</p>
    </sec>
    <sec id="sec-2">
      <title>2. State of the Art</title>
      <p>
        Automatic summarization is a well-studied problem in natural language processing (NLP). Traditionally,
summarization methods are grouped into two types: extractive and abstractive. Extractive methods aim
to select important sentences from the original text, while abstractive methods generate new sentences
based on the meaning of the text [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        For extractive summarization, early approaches used statistical features such as term frequency
(TF), inverse document frequency (IDF), sentence position, and similarity measures. Classical machine
learning models like Naive Bayes, SVMs, and decision trees were later applied using these features [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        In recent years, deep learning models such as BERT and its variants (e.g., BERTSum, T5) have shown
strong performance in summarization tasks. These models use pre-trained language representations
to understand sentence semantics and context [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, they often require large datasets, long
training times, and significant computing resources.
      </p>
      <p>
        To reduce complexity and resource usage, some researchers have explored tree-based models like
XGBoost for sentence scoring and selection. While classification models are often used to decide
whether a sentence is a highlight or not, regression models provide a more fine-grained ranking of
sentence importance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, there is limited research that directly compares these two strategies.
      </p>
      <p>The task of generating research highlights from scientific texts lies at the intersection of automatic
summarization and scientific document processing. Early work in this field focused mainly on extractive
methods, where sentences from the source document are selected to form a summary. Such approaches
were often based on statistical or heuristic features like sentence position, frequency, or TF-IDF
weighting. While simple, these approaches struggled with semantic understanding and often produced
highlights that were redundant or lacked coherence.</p>
      <p>
        With the advent of deep learning, research in highlight generation has moved towards more
sophisticated models. For instance, Rehman et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed the use of sequence-to-sequence models
with attention, pointer-generator networks, and coverage mechanisms to generate highlights directly from
abstracts, achieving promising ROUGE and METEOR scores on a large dataset of scientific papers[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Their work demonstrated that abstractive methods can better capture the essence of a paper compared
to purely extractive approaches.
      </p>
      <p>
        Later advancements introduced domain-specific embeddings to further enhance highlight generation.
Rehman et al.[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] combined pointer-generator networks with SciBERT embeddings, trained specifically on
scientific texts, to generate higher-quality highlights. Their system outperformed baseline models on
benchmark datasets such as CSPubSum and MixSub, achieving state-of-the-art results with significantly
improved ROUGE, METEOR, and BERTScore metrics [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        In subsequent work, Rehman and colleagues explored multiple strategies to improve scientific
summarization. They introduced contextual embedding-based models using ELMo [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and proposed a
named-entity-driven highlight generation approach using NER features [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Further, they analysed
abstractive text summarization techniques leveraging pre-trained transformer models [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and provided
an extensive study on highlight generation from abstracts in EEKE [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Together, these studies represent
the most comprehensive research efort in automatic highlight generation from scientific texts, showing
how deep contextual models and hybrid architectures can enhance both factual accuracy and content
relevance.
      </p>
      <p>
        Parallel to this, the broader NLP community has witnessed groundbreaking improvements with
transformer-based pre-trained models. Notably, BERT (Bidirectional Encoder Representations from
Transformers) introduced by Devlin et al.[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] enabled the learning of deep bidirectional contextual
embeddings, setting new benchmarks across multiple NLP tasks including summarization [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Subsequent models such as SciBERT and PEGASUS further specialized this pre-training for scientific or
summarization-specific tasks.
      </p>
      <p>Recent studies also highlight the limitations of abstractive methods, such as hallucinations and
factual inconsistencies. To address these, hybrid approaches combining extractive and abstractive
methods have been explored. For example, some works adopt extractive steps for sentence selection
followed by abstractive refinement, ensuring both factual correctness and conciseness.</p>
      <p>Despite these advances, most state-of-the-art systems rely on large-scale pre-trained models requiring
extensive computational resources. This poses challenges for practical deployment. In contrast, our
work focuses on a more lightweight yet efective approach: using XGBoost regression and classification
models with sentence-level features to predict highlights. By comparing extractive strategies under a
machine learning framework, we aim to provide a resource-eficient alternative while still maintaining
competitive performance.</p>
      <p>Our work fills this gap by providing a direct comparison between regression and classification models
for extractive summarization, using XGBoost. We show that the regression approach performs better,
especially when evaluated with ROUGE metrics.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Our method is based on extractive summarization at the sentence level. The main idea is to rank
sentences from an abstract based on how well they match the human-written highlights.</p>
      <p>
        The goal of our research is to find an efective and light- weight method to automatically generate
sentence-level highlights from scientific abstracts. Specifically, our goal is to compare two diferent
machine learning strategies, classification and regression, using the XGBoost model. Our main
motivation of this research is can we find or collect the highlights from a scientific paper? For that
we use a predefined model named “XGBoost” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Here we use two diferent part of the same model
one is classification model and another is regression model and here we see which approach,
classiifcation or regression using XGBoost, performs better in extracting meaningful highlights from abstracts.
      </p>
      <p>To answer this question, we designed an experiment where both XGBoost Classifier and Regressor
are trained and tested on the same dataset, using the same feature representations and evaluation metrics.</p>
      <p>Now, we describe our methodology in detail, corresponding workflow diagram to the steps shown in
the Figure 1. Each step is critical for achieving the overall goal of generating sentence-level highlights
from abstracts using XGBoost.</p>
      <p>The steps in our method are as follows:</p>
      <sec id="sec-3-1">
        <title>3.1. Input – Abstract or Scientific Paper:</title>
        <p>Generally in this process, we begin with the input, which is typically a scientific abstract or, in some
cases, the full text of a research paper. The abstract is chosen as the primary input because it provides
a condensed version of the entire work and often contains the essential contributions, background,
and findings of the study. By focusing on abstracts, we reduce the complexity of the summarization
task while still capturing meaningful highlights. The research question guiding this step is: Can we
extract highlights directly from abstracts in a way that mirrors human-written summaries? Preparing the
input involves ensuring that the text is clean, free of unnecessary formatting, and ready for processing.
This preparation phase ensures consistency across datasets and provides a uniform starting point for
subsequent steps.</p>
        <p>Here we used the well-organized data set given by the FIRE-2025 organizer, which have 3 diferent
ifelds like Filename, Abstract, and Highlights.</p>
        <p>We used three data sets: training, validation, and testing. Each entry contains a filename, an abstract,
and (except in the test data set) the ground-truth highlights.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Sentence Tokenization:</title>
        <p>
          Once the abstract is obtained, the first technical step is sentence tokenization. This process splits
the abstract into individual sentences. We use the Natural Language Toolkit (NLTK) tokenizer[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] ,
which is well-suited for handling scientific text. Tokenization is essential because the model operates
at the sentence level, evaluating each sentence separately to determine its likelihood of being part
of the highlight. The accuracy of tokenization is critical: poorly segmented sentences could lead to
broken meaning units, reducing the quality of extracted features and final predictions. Beyond simple
splitting, sentence tokenization prepares the foundation for aligning sentences with their corresponding
highlight scores, which is a key element of supervised training in both classification and regression
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. TF-IDF Feature Extraction:</title>
        <p>
          After tokenization, we transform each sentence into a numerical representation using Term
Frequency–Inverse Document Frequency (TF-IDF)[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. This step captures the importance of words in
a sentence relative to the entire dataset. Sentences that contain domain-specific keywords or frequently
occurring technical terms often have higher TF-IDF weights, making them more likely candidates
for highlights. TF-IDF is advantageous because it balances local word importance (term frequency
within a sentence) with global rarity (inverse frequency across the corpus). This transformation
results in feature vectors that XGBoost models can interpret. While deep embedding models like BERT
capture context-rich features, TF-IDF is computationally eficient, interpretable, and suficient for our
lightweight summarization framework. TF-IDF is used to convert each sentence into numerical features,
capturing word-level importance. Features are normalized using Min-Max scaling.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Label Assignment:</title>
        <p>
          Label generation is the core step for supervised training. For the classifier, we assign binary labels:
a sentence is labeled 1 if it overlaps significantly with the reference highlight, and 0 otherwise. For
the regressor, we compute a continuous similarity score using ROUGE-L between each sentence and
the reference highlight. For example, sentences receive scores such as 0.4, 0.9, or 0.6 depending on
their similarity. This approach allows the regression model to capture nuanced diferences in sentence
importance. Label assignment ensures that the model has reliable ground truth data to learn from. The
careful design of labels is critical, since poor labeling could mislead the model into learning irrelevant
sentence patterns. This step efectively aligns sentences with their “highlight value,” setting the stage
for efective model training. For the regression model, each sentence is assigned a score based on its
ROUGE-L similarity with the reference highlight [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. XGBoost Training (Regressor/Classifier):</title>
        <p>
          In this step, we train two separate models: the XGBoost Classifier and the XGBoost Regressor. XGBoost
(Extreme Gradient Boosting) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is chosen for its eficiency, ability to handle sparse TF-IDF vectors,
and strong performance in many text-related tasks. XGBoost builds many small decision trees, each
ifxing the errors of the previous one, guided by gradient information. With regularization, it balances
accuracy and generalization. A model is required that performs eficiently with TF-IDF features (sparse,
high-dimensional), Transformers (like BERT) require huge resources; XGBoost is lightweight and still
efective. Instead of building one strong model, XGBoost builds an ensemble of weak learners (decision
trees). Each new tree tries to correct the errors made by the previous trees. Over many iterations, the
ensemble becomes very strong.
        </p>
        <sec id="sec-3-5-1">
          <title>1. XGBoost Classifier Model</title>
          <p>The XGBoost Classifier is a supervised machine learning algorithm based on the principle of
gradient boosted decision trees. It is designed for binary or multi-class classification tasks and is
known for its high eficiency, scalability, and predictive performance. In the context of highlight
generation, the XGBoost Classifier is used to determine whether a sentence from an abstract
should be included in the final highlight or not.</p>
          <p>At its core, XGBoost builds an ensemble of weak learners (decision trees) in a sequential manner,
where each new tree attempts to correct the errors made by the ensemble of previous trees.
The classifier optimizes a logistic loss function through gradient boosting, where the gradients
(first derivatives) and hessians (second derivatives) of the loss function guide how new trees are
constructed. This enables the model to eficiently capture non-linear relationships in the data.
In our implementation, sentences are first transformed into TF–IDF feature vectors, which
represent the importance of words within the abstract relative to the corpus. Each sentence is
then assigned a binary label:
• 1 if it matches or overlaps with the reference highlight (indicating it should be selected),
• 0 otherwise.</p>
          <p>The XGBoost Classifier then learns to distinguish between these two classes. During training,
the model minimizes the binary cross-entropy loss, which measures the divergence between the
predicted probability and the true label. The final model outputs a probability score between
0 and 1 for each sentence, reflecting the likelihood of that sentence being part of the highlight.
Sentences with probabilities above a certain threshold (commonly 0.5) are classified as highlights.
An important feature of the XGBoost Classifier is its regularization mechanism, which penalizes
overly complex trees and prevents overfitting. Additionally, XGBoost eficiently handles sparse
input data, making it well-suited for TF–IDF representations where many entries are zero.
Overall, the XGBoost Classifier provides a fast, interpretable, and resource-eficient method for
sentence-level highlight prediction. While it simplifies the task into a binary decision-making
process, its performance demonstrates the efectiveness of boosted decision trees in scientific
text summarization tasks.</p>
        </sec>
        <sec id="sec-3-5-2">
          <title>2. XGBoost Regressor Model</title>
          <p>The XGBoost Regressor is a variant of the gradient boosting framework designed for continuous
prediction tasks. Instead of predicting discrete class labels, the regressor learns to assign a
continuous score that reflects the degree of relevance or similarity. In the context of highlight
generation, this makes the XGBoost Regressor particularly suitable, since highlight-worthy
sentences are not always strictly binary (highlight or non-highlight), but may lie on a spectrum
of importance.</p>
          <p>Similar to the classifier, the regressor builds an ensemble of decision trees sequentially, where
each new tree corrects the residual errors of the previous trees. However, instead of optimizing
logistic loss, the XGBoost Regressor minimizes the mean squared error (MSE) between the
predicted score and the target value. This design allows the model to learn subtle diferences in
sentence importance, producing finer-grained predictions compared to the binary framework of
the classifier.</p>
          <p>In our system, sentences from abstracts are first transformed into TF–IDF vectors, capturing the
importance of words in relation to the overall corpus. Each sentence is then assigned a continuous
label based on its similarity to the reference highlight. For example, if a sentence has strong lexical
overlap with the reference highlight, it receives a higher score (e.g., 0.9), while a less relevant
sentence may receive a lower score (e.g., 0.2). These continuous labels serve as the ground truth
for training the regressor.</p>
          <p>During training, the regressor learns to map TF–IDF features to similarity scores, guided by the
gradients and hessians of the MSE loss. The output of the model is a real-valued score for each
sentence, indicating how closely it resembles the human-written highlight. Sentences are then
ranked by these predicted scores, and the top-k (in our case, top 3) are selected as the generated
highlights.</p>
          <p>The XGBoost Regressor also benefits from regularization mechanisms (L1 and L2 penalties) and
eficient handling of sparse TF–IDF vectors. These features allow it to maintain a balance between
accuracy and generalization while remaining computationally eficient.</p>
          <p>Overall, the XGBoost Regressor ofers a ranking-based approach to highlight generation. By
capturing graded relevance rather than enforcing strict binary decisions, it provides more flexible
and accurate sentence selection. This explains why, in our experiments, the regressor consistently
outperformed the classifier, achieving higher ROUGE scores and better alignment with the
reference highlights.</p>
          <p>The classifier learns to distinguish between highlight and non-highlight sentences, while the regressor
learns a continuous mapping from sentence features to highlight scores. One of the key contributions of
our work is the comparison of these two approaches head-to-head. We found that regression provides
ifner granularity in ranking, whereas classification is limited to a binary decision boundary.</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Sentence Scoring and Ranking:</title>
        <p>Once trained, the model is used to assign scores to each sentence in unseen abstracts. In the case of
the classifier, this is a probability score indicating the likelihood of being part of the highlight. For
the regressor, this is a continuous value reflecting how closely the sentence matches the ground truth
highlight. Sentence scoring is where the model applies its learned knowledge, and the quality of these
scores directly determines the efectiveness of the final highlight generation. By ranking sentences
according to these scores, the system ensures that the most relevant and informative sentences are
prioritized.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.7. Sentence Selection:</title>
        <p>After scoring, the next step is to select the top three sentences with the highest scores. This number was
chosen based on the common practice of research journals requiring three highlights. Selecting three
sentences strikes a balance between conciseness and coverage, ensuring that the highlights provide a
quick yet informative overview of the abstract. The ranking process is deterministic: sentences are
sorted in descending order of their scores, and the top three are chosen. This step operationalizes the
sentence-level scoring into a usable summary format. For unseen abstracts (for test dataset abstracts),
the trained model predicts a score for each sentence. We rank the sentences and select the top 3 as the
predicted highlight.</p>
      </sec>
      <sec id="sec-3-8">
        <title>3.8. Final Highlight Generation (Output):</title>
        <p>
          The last step produces the final highlight set for each abstract. The selected top three sentences are
presented together as the predicted highlight, which can then be compared against the human-written
highlight for evaluation. Evaluation is conducted using ROUGE-1, ROUGE-2, ROUGE-L[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], and
sentence-level accuracy. This step closes the loop of the workflow, translating the model’s predictions
into a concrete, human-readable output. The generated highlights can then be used in academic
repositories, search engines, or research support tools to help readers quickly grasp the essence of a
scientific paper.
        </p>
        <p>By elaborating on each step, we demonstrate how the workflow systematically transforms raw text
into structured highlights. Each component contributes uniquely to the final outcome, and together,
they provide a coherent pipeline for extractive highlight generation using XGBoost.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset Details</title>
      <p>
        The dataset used in this study was provided as part of the SciHigh shared task at FIRE 2025 (Forum for
Information Retrieval Evaluation). It contains three subsets: training, validation, and test data. Each
record in the dataset includes the paper filename, abstract, and human-written highlights (except in the
test set). The dataset is derived from the MixSub corpus, which includes research papers from multiple
scientific domains. Following standard practice, we use the training and validation data for model
development and report test performance based on the leaderboard results. Details of the dataset and
the shared task can be found in the oficial FIRE 2025 SciHigh track documentation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We conducted extensive experiments using both XGBoost Classifier and XGBoost Regressor models
for the task of highlight generation. The evaluation was performed using widely accepted metrics:
ROUGE-1, ROUGE-2, ROUGE-L, and sentence-level accuracy.</p>
      <p>From our experiments, the regressor consistently outperformed the classifier. This is because the
regression framework allows the model to capture partial relevance: a sentence may not be an exact
match to the reference highlight but can still be closer than others. By assigning continuous values,
the regressor provides a finer ranking of sentences, which is more suitable for extractive highlight
generation than the rigid binary decisions of the classifier. Initially, our validation experiments indicated
that the regression model (run1) provided better performance across all evaluation metrics compared to
the classification model. The validation results are summarized in Table 1.</p>
      <p>This result confirms that while deep learning models (such as BERT-based architectures) often
dominate leaderboards, carefully designed lightweight models like XGBoost can still achieve strong
performance in automatic highlight generation. Our method, relying solely on TF-IDF features and
gradient-boosted decision trees, provides a computationally eficient alternative to transformer-based
models, making it more suitable for resource-constrained environments.</p>
      <p>Moreover, achieving 8th place out of multiple participating systems validates the efectiveness of
using regression-based scoring rather than binary classification for extractive summarization. The
regressor’s ability to assign continuous importance values allows for better ranking of candidate
sentences, leading to improved overlap with human-written highlights.</p>
      <p>Overall, the results highlight the trade-of between eficiency and absolute performance. Although
our ROUGE-L score (0.2206) is definitely lower compared to top transformer-based systems, but the
margin is very small (0.009 from the fourth team). Our approach is interpretable, faster to train,
and significantly less resource-intensive. These qualities make the model practical for real-world
deployment in digital libraries, academic repositories, and summarization tools where scalability and
speed are as important as accuracy.</p>
      <p>Here we give an model result comparison chart which will explain well our result shows in Figure-2</p>
      <p>The regressor model (run1) provided better ranking of important sentences and achieved higher
overlap with human-written highlights. The final model was used to generate highlights for the
test dataset, and the results were saved as a CSV file. This demonstrates that efective extractive
summarization can still be achieved without relying on large-scale pre-trained transformers.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>The experimental findings and the results of the leader board provide valuable insight into the strengths
and limitations of our approach. One of the most important observations is that the regression model
consistently outperforms the classifier, we see in local validation and in the oficial evaluation we can
see that our regressor model works well. This confirms our hypothesis that assigning continuous
importance scores to sentences allows the model to better capture nuances in sentence relevance than binary
classification. By treating highlight prediction as a ranking problem rather than a simple yes/no
decision, we enable the system to prioritize sentences with subtle but meaningful diferences in importance.</p>
      <p>Another important aspect of the discussion is the trade-of between lightweight machine learning
models and resource-intensive transformer-based models. While deep learning approaches such as
BERT, SciBERT, and PEGASUS dominate in absolute performance on summarization tasks, they require
significant computational power, large annotated datasets, and considerable training time. In contrast,
our XGBoost-based system relies on TF-IDF features and gradient-boosted decision trees, making it
interpretable, easy to train, and computationally eficient. This makes our approach especially suitable
for institutions with limited resources or for deployment at scale in digital libraries and repositories
where speed and eficiency are crucial.</p>
      <p>The oficial leader board result, where our team JU_CSE_ PR_KS achieved 8th place with a
ROUGE-L score of 0.2206, further demonstrates the practical competitiveness of our model. Despite
being narrowly behind the top-ranked team, which achieved a ROUGE-L of 0.2345, the diference
is relatively small. This indicates that carefully engineered classical machine learning models can
still remain competitive against modern deep learning systems in certain contexts. Furthermore,
our validation results, which showed ROUGE-L values above 0.39, suggest that the model
generalizes reasonably well even though performance drops on unseen test data, a common issue in NLP tasks.</p>
      <p>A limitation of our current system is its reliance solely on surface-level lexical features, which
means it may miss deeper semantic relationships between sentences and highlights. This explains
why transformer-based models, which encode context and semantics more efectively, achieve slightly
higher scores. Future work could focus on hybrid models that integrate semantic embeddings (such
as sentence transformers) into the XGBoost framework, thus combining eficiency with improved
representation.</p>
      <p>Overall, the discussion highlights that our regression-based XGBoost approach provides a good
balance between accuracy, interpretability, and eficiency. It achieves competitive results on a challenging
benchmark task, validates the usefulness of regression for sentence-level scoring, and opens pathways
for future research into hybrid or feature-augmented systems that bridge the gap with large-scale deep
learning models.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In this work, we investigated the task of automatic highlight generation from scientific abstracts
using XGBoost-based models. Our research focused on comparing two approaches: classification and
regression. Through systematic experimentation, we demonstrated that the regression-based model
provides more efective sentence ranking and generates highlights that better align with human-written
ones. Validation experiments confirmed that the regressor consistently outperforms the classifier
in terms of ROUGE scores and accuracy, and the oficial leader board evaluation placed our team,
JU_CSE_PR_KS, in 8th position with a ROUGE-L score of 0.2206.</p>
      <p>The main contribution of our study lies in showing that even lightweight, interpretable models like
XGBoost can achieve competitive performance against more resource-demanding transformer-based
systems. This makes our approach attractive for environments with limited computational resources
and for large-scale applications where eficiency and interpretability are crucial. By leveraging TF-IDF
features, sentence tokenization, and regression-based scoring, we designed a workflow that is not only
efective but also practical to implement and deploy.</p>
      <p>
        However, the study also highlights certain limitations. The reliance on surface-level lexical features
restricts the model’s ability to capture deeper semantic relationships between sentences and highlights.
While this keeps the system lightweight, it also explains why transformer-based systems can achieve
slightly higher performance. A promising direction for future work is the integration of semantic
embeddings, such as BERT or SBERT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or sentence-transformer vectors, with XGBoost to create a
hybrid system that maintains eficiency while improving semantic understanding.
The authors declare that generative AI tools, such as ChatGPT, were used to assist in language editing,
grammar checking, and formatting during manuscript preparation. All intellectual contributions,
analyses, and experimental results are solely the authors’ own work.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Rouge: A package for automatic evaluation of summaries</article-title>
          , in: Text summarization branches out,
          <year>2004</year>
          , pp.
          <fpage>74</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nallapati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Summarunner:</surname>
          </string-name>
          <article-title>A recurrent neural network based sequence model for extractive summarization of documents</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>31</volume>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers</article-title>
          ),
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          ,
          <source>in: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>785</fpage>
          -
          <lpage>794</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rehman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Sanyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Automatic generation of research highlights from scientific</article-title>
          , in: 2nd Workshop on Extraction and
          <article-title>Evaluation of Knowledge Entities from Scientific Documents (EEKE'21), collocated with</article-title>
          <source>JCDL'21</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rehman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Sanyal</surname>
          </string-name>
          , S. Chattopadhyay,
          <article-title>Research highlight generation with elmo contextual embeddings</article-title>
          ,
          <source>Scalable Computing: Practice and Experience</source>
          <volume>24</volume>
          (
          <year>2023</year>
          )
          <fpage>181</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rehman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Sanyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Generation of highlights from research papers using pointer-generator networks and scibert embeddings</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>91358</fpage>
          -
          <lpage>91374</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2023</year>
          .
          <volume>3292300</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rehman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Sanyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <article-title>Named entity recognition based automatic generation of research highlights</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Scholarly Document Processing (SDP</source>
          <year>2022</year>
          )
          <article-title>collocated with COLING 2022, Association for Computational Linguistics</article-title>
          , Gyeongju, Republic of Korea,
          <year>2022</year>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>169</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .sdp-
          <volume>1</volume>
          .
          <fpage>18</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rehman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Sanyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <article-title>An analysis of abstractive text summarization using pre-trained models</article-title>
          ,
          <source>in: Proceedings of International Conference on Computational Intelligence, Data Science and Cloud Computing: IEM-ICDC 2021</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>253</fpage>
          -
          <lpage>264</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Klein</surname>
          </string-name>
          , E. Loper,
          <article-title>Natural language processing with Python: analyzing text with the natural language toolkit, "</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          ,
          <source>Inc."</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sparck Jones</surname>
          </string-name>
          ,
          <article-title>A statistical interpretation of term specificity and its application in retrieval</article-title>
          ,
          <source>Journal of documentation 28</source>
          (
          <year>1972</year>
          )
          <fpage>11</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>A. F. AlShammari</surname>
          </string-name>
          ,
          <article-title>Implementation of keyword extraction using term frequency-inverse document frequency (tf-idf) in python</article-title>
          ,
          <source>Int. J. Comput. Appl</source>
          <volume>185</volume>
          (
          <year>2023</year>
          )
          <fpage>9</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>