<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X (E. B. Uyar);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">2468-0672</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Low-Level Hardware Requirement Classification Using Large Language Models: Challenges, Insights, and Future Directions for Embedded Control Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ekrem Bilgehan Uyar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ali Ergin Gürsoy</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cemil Gökçe</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tuğba Taşkaya Temizel</string-name>
          <email>ttemizel@metu.edu.tr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Requirements Classification, Large Language Models, Embedded Control Systems.1</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Graduate School of Informatics, Middle East Technical University</institution>
          ,
          <addr-line>Ankara 06800</addr-line>
          ,
          <country country="TR">Türkiye</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>In: A. Hess, A. Susi</institution>
          ,
          <addr-line>E. C. Groen, M. Ruiz, M. Abbas, F. B. Aydemir, M. Daneva, R. Guizzardi, J. Gulden, A. Herrmann, J. Horkoff, S. Kopczyńska, P. Mennig, M. Oriol Hilari, E. Paja, A. Perini, A. Rachmann, K. Schneider, L. Semini, P. Spoletini</addr-line>
          ,
          <institution>A. Vogelsang. Joint Proceedings of REFSQ-2025 Workshops, Doctoral Symposium, Posters \&amp; Tools Track, and Education and Training Track.</institution>
          <addr-line>Co-located with REFSQ 2025. Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Roketsan Inc.</institution>
          ,
          <addr-line>Ankara 06780</addr-line>
          ,
          <country country="TR">Türkiye</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Automated Requirements Engineering (RE) activities can streamline development processes, reduce errors, and facilitate informed decision-making, particularly for low-level hardware requirements where modifications are costly. Classification is a widely studied automated RE activity for software requirements. Yet, its applicability remains underexplored due to the lack of structured datasets. This study adapts and evaluates software requirement classification techniques for hardware by extracting low-level requirements from open-source hardware design artifacts of Embedded Control Systems. We evaluate two classification methods: fine-tuning a BERT-based model and zero-shot prompting with a quantized LLM (Qwen2.5). While fine-tuning achieved high accuracy, zero-shot classification with specific prompts outperformed it in overall performance, achieving an average F1-score of up to 90% on the hold-out test set. Our findings suggest that automating downstream RE activities for low-level hardware requirements may not require large, task-specific datasets; however, classification performance can be further improved and can serve as an enabler for advanced tasks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Embedded Control Systems (ECS) are widely employed in automotive, aerospace, and consumer
electronics. Their hybrid hardware-software nature and ubiquity make them an intriguing subject
for Requirements Engineering (RE) research [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Managing their hardware requirements is critical,
as changes impact physical design and production, often incurring high costs and delays [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ].
Requirement classification can support this process, enabling downstream activities such as
classbased test planning or physical feasibility analysis.
      </p>
      <p>
        While classification is well-studied for software requirements, hardware requirements differ due
to factors such as interdependencies and quantitative constraints. They often involve strict
quantitative constraints (e.g., minimum and maximum values) rather than the abstract definitions
found in software. In addition, interdependence between requirements [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] adds complexity, where
altering one can necessitate additional changes or create new constraints. Despite these differences,
software requirement classification techniques can be adapted if suitable datasets are available.
However, open-access low-level hardware datasets are scarce, as most requirement documents are
proprietary or embedded within technical documentation.
      </p>
      <p>To address this, we collaborated with domain experts (DEs) to create and annotate a dataset
derived from open-source hardware design materials, refining inferred constraints to align with
established dataset-creation frameworks. Then we experimented with two baseline classification
methods—fine-tuning a BERT-based model and applying a quantized large language model
(Qwen2.5) with zero-shot prompting—evaluating their performance using precision, recall, and
F1score. Our findings suggest that in highly specialized domains, leveraging larger models with
wellcrafted prompts is a more efficient alternative to constructing extensive datasets for fine-tuning.
Furthermore, our analysis of misclassified requirements underscores the critical role of context in
classification accuracy. These insights form a basis for downstream RE tasks and more advanced
applications, including the integration of Large Language Model guardrails for industrial
deployment.</p>
      <p>The remainder of this paper is organized as follows: Section 2 reviews automated RE tasks for
natural language requirements. Section 3 details the methodology and dataset creation. Section 4
presents the classification results, while Section 5 outlines assumptions, limitations, and exclusions.
Section 6 discusses the findings, and Section 7 summarizes key insights and future directions for
applying RE tasks in the ECS domain.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Automated RE tasks have been extensively explored using Natural Language Processing (NLP) and
machine learning in the embedded systems domain. Applications include, but are not limited to,
requirements modeling [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], ontology-based specification [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and Named Entity Recognition (NER)
for requirement analysis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Although the application of automated requirements classification
tasks in the context of embedded systems is not particularly popular, it is widely studied in the field
of software engineering [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Recently, transfer-learning approaches have shown their effectiveness
both through approaches such as fine-tuning [
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        ] and through approaches such as prompting
[
        <xref ref-type="bibr" rid="ref10 ref11">10,11</xref>
        ] for software requirements classification. However, these methods rely on structured
requirement datasets, limiting their direct applicability to ECS and hardware requirements due to
the numerical and physical constraints that complicate direct adaptation.
      </p>
      <p>
        In adjacent domains like chip design [
        <xref ref-type="bibr" rid="ref12 ref13">12,13</xref>
        ], Large Language Models (LLMs) have been employed
for design validation and documentation synthesis, demonstrating their potential in processing
technical constraints. Their applications extend across various industries, including aerospace [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
and automotive [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], underscoring the versatility of LLMs in solving complex engineering problems
in different domains. However, such applications typically require domain-specific fine-tuning, a
challenge given the lack of open datasets for hardware RE. A significant barrier to their broader
application is the lack of open datasets tailored to domain-specific tasks. The performance of LLMs
is highly dependent on the quality and relevance of the datasets used for training and evaluation
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Although zero-shot learning allows LLMs to operate in the absence of domain-specific datasets,
its limitations become evident as even advanced models can struggle in such settings [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Therefore,
the creation of high-quality datasets is indispensable. This study addresses these gaps by adapting
NLP-driven RE techniques to a dataset derived from hardware design sources, exploring LLM-based
classification within the constraints of ECS hardware requirements. To this end, annotation
development frameworks, such as the MATTER cycle [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], along with dataset documentation
frameworks [
        <xref ref-type="bibr" rid="ref19 ref20 ref21 ref22">19-22</xref>
        ], are essential for preparing datasets that support the effective application of
LLMs in new domains.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Research Design</title>
      <sec id="sec-3-1">
        <title>3.1. Research Questions &amp; Scope</title>
        <p>This study investigates the feasibility of adapting requirement classification techniques widely used
in software engineering to ECS hardware requirements. To structure our investigation, we define
the following research questions:

</p>
        <p>RQ1: What are the challenges in creating a representative annotated requirements corpus
from hardware design sources to enable automated classification?
RQ2: How effective are fine-tuning and zero-shot approaches with large language models
(LLMs) at accurately classifying ECS hardware requirements?
RQ1 addresses the methodological challenges by investigating how to extract, annotate, and validate
hardware requirements in a structured manner. RQ2 evaluates the ability of existing classification
techniques, adapted from software RE, to perform effectively in the hardware domain.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset Creation Process &amp; Its Practical Implications</title>
        <p>In our case, hardware requirements are embedded within circuit diagrams, technical documentation,
and component datasheets, unlike common software RE datasets compiled from textual
specifications. Therefore, we developed a hardware requirements dataset by reverse-engineering
design artifacts from open-source ECS projects. This approach introduced several challenges
addressing RQ1:



</p>
        <p>Implicit vs. Explicit Requirements: Unlike explicitly defined requirements, low-level ECS
hardware requirements need to be inferred from design constraints and physical elements.
Interdependencies: Hardware requirements can exhibit a higher degree of interconnectivity
and interdependency; a single modification in one requirement often cascades into multiple
downstream constraints.</p>
        <p>Quantitative Features: Unlike software requirements, which frequently describe system
functionalities, hardware requirements predominantly involve numerical constraints (e.g.,
voltage ratings, timing requirements, and temperature tolerances).</p>
        <p>
          Size: To ensure effective model training, the dataset should be sufficiently large, similar to
established datasets like PROMISE [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], which has been used in similar software requirement
classification tasks with comparable complexity.
        </p>
        <p>
          To overcome these challenges, we adopted an iterative annotation framework with DEs. We
systematically managed both the requirement writing and annotation processes using guidelines
developed according to the MATTER cycle [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. This annotation development cycle, recognized
within the RE community [
          <xref ref-type="bibr" rid="ref24">24,25</xref>
          ], employs an iterative and incremental approach to ensure accuracy
and reliability throughout the dataset creation process. The details of the process are presented in
the following subsections.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.2.1. Domain Expert Selection</title>
        <p>The selection criteria for the contributing DEs were critical to the creation and annotation of the
ECS requirements corpus. As emphasized by [26], experts must demonstrate significant skills,
knowledge, and experience. Addressing these, three electrical and electronics engineers with 10 to
15 years of experience in the R&amp;D departments of ECS/embedded systems contributed voluntarily
to this study. Bayerl and Paul [27] recommend that annotators possess comparable domain
knowledge and receive appropriate training. Accordingly, comprehensive training sessions on RE
and NLP tasks were provided to the DEs.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2.2. Classification Problem</title>
        <p>
          The DEs identified five basic functionality categories by analyzing the functional blocks of the ECS,
which form the basis of the annotation problem. Our labeling structure was designed to imitate the
general architecture of feedback control systems. While alternative taxonomy approaches, such as
categorization based on design expertise (e.g., analog, digital, or power circuit design), could have
been applied, we determined that a more pragmatic, exploratory classification was appropriate since
no clear guideline exists in the ECS domain to support it yet. Finally, we aimed to make the categories
as inclusive as possible while ensuring broad applicability across different ECS application domains.
The identified categories and their definitions are presented in Table 1.
3.2.3. Open-Source Hardware ECS Project Selection
[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] emphasizes that to prevent bias in model training, the over-representation of features rarely
encountered in real-world scenarios must be avoided. To represent a product inventory emulating
an industrial setting that can be used for AI training, we selected source projects based on the
following criteria: (1) permissive licensing, (2) representation of real-world ECS applications, (3)
diversity in products and functional blocks to enhance dataset representativeness, and (4) availability
of structured design documentation.
        </p>
        <p>Based on these inclusion and exclusion criteria, eight open-source ECS projects were selected
from the project list maintained by the Open-Source Hardware Association2 or published in the
opensource hardware journal HardwareX3. These projects were identified through targeted keyword
searches focusing on representing a diverse range of physical entities, including pressure, weight,
and velocity, and involve control instruments such as pumps and fans. Each selected project
underwent review and consensus by the DEs to ensure alignment with the four mentioned criteria.
Table 2 presents the complete list of projects along with their IDs and functional domains. The
functional domain descriptions here aim to provide insight into system complexity, component
dependencies, and diversity across projects. For example, domains like robotics and power
electronics feature tightly integrated components, while food processing and agricultural automation
rely on more modular architectures. This also reflects the variety and number of components, as
different domains inherently require distinct sets of sensors, actuators, and control mechanisms,
ensuring a representative dataset of diverse functional blocks.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.2.4. Requirement Extraction and Annotation</title>
        <p>Our reverse-engineering approach for extracting and annotating requirements from hardware
projects has two key aspects: (1) Expert-Guided Requirement Identification, where DEs defined
extraction guidelines, manually extracted requirements from design documents and categorized
them; (2) Iterative Annotation and Refinement, where DEs refined guidelines, requirements, and
dataset through consensus-based adjudication. For example, DE review meetings identified
ambiguities, especially in multi-board projects and connector requirements. The guidelines were
then revised to focus on electronic circuit structures while excluding mechanical and environmental
constraints, as they had minimal impact on functional categorization.</p>
        <p>The process resulted in 366 requirements from eight projects over 191 DE hours spanning 184
days. After ensuring, through consensus, that the requirements extracted by the DEs for each project
comply with the guidelines, they were all randomly shuffled and reassigned to the DEs as a list for
annotation. The resulting annotation consistency for the complete set was assessed using
recommended [36,37] Inter-Annotator Agreement (IAA) metrics: Fleiss’ Kappa [37] (0.76) and
Cohen’s Kappa [38] (0.66–0.74), indicating strong agreement. Based on these scores, labeling
repetition was unnecessary. However, the adjudication process aimed to establish a gold standard
by achieving consensus among annotators, ensuring high data quality through error correction and
consistency checks.</p>
        <p>During adjudication meetings, non-consensus requirements were reviewed through discussions
where DEs explained their classification rationale. If disagreements stemmed from requirement
quality issues, such as violations of agreed-upon guidelines or inconsistencies with the design
artifacts, necessary corrections were made to align them with the established criteria. Requirements
that failed to meet technical sufficiency standards were either revised or removed, ensuring
coherence and reliability in the final dataset. As a result, 28 out of 366 requirements were removed
due to insufficient technical details, while 24 were modified to resolve ambiguities. These
refinements addressed violations of predefined guidelines but did not introduce a significant shift in
distribution or category representation.</p>
        <p>The final gold standard revealed category imbalance, reflecting real-world distributions.
Projectlevel requirement distributions varied based on complexity, and differences among the three DEs’
annotations (DE Distribution) were influenced by their distinct writing styles. An example of a
feedback requirement can be seen in Table 3, while the finalized dataset's statistical distribution is
presented in Table 4.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Automated Classification</title>
      <sec id="sec-4-1">
        <title>4.1. Baseline Model: A Fine-tuned Approach with BERT</title>
        <p>
          We utilized NoRBERT [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] as a baseline model for classifying ECS requirements, , which has reported
F1 scores of up to 94% in multi-class software requirement classification performance. However, due
to differences in classification schemes and domain-specific terminology, the pre-trained NoRBERT
model could not be directly applied in this study. Instead, we adopted NoRBERT's methodology and
fine-tuned it for classifying ECS requirements.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Zero-shot ECS Requirement Classification with LLMs</title>
        <p>Given the limited availability of datasets, prompt-based classification with LLMs is a viable
alternative to dataset-heavy fine-tuning methods. Our zero-shot prompting approach consisted of
two main steps: Prompt Design and Model Selection/Setting.</p>
        <p>
          In Prompt Design, we adapted the strategy and methodology from [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], who applied similar
techniques in the RE domain to develop prompts that leverage the pre-trained capabilities of LLMs
for multiclass requirement classification. Based on their recommendations, we considered four
prompt patterns, which were shown to be effective for the binary classification of software
requirements. Table 5 demonstrates how we adapted these patterns to the ECS low-level
requirements classification problem.
        </p>
        <p>
          We considered the model selection criteria based on potential needs that may arise in industrial
use, as ECS design teams typically prioritize solutions that align with their operational and technical
constraints. These include (1) non-commercial solutions to address data privacy and control
concerns, (2) scalable solutions that do not require significant local computation investments, and
(3) less restrictive licenses for broader adaptability. Based on these criteria, we selected a quantized
variant of Qwen2.5-72B-Instruct. These models can be locally hosted on a single machine using
popular open-source libraries, such as llama.cpp and GPT4All [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>Designed Zero Shot Prompt Patterns for ECS Requirements Classification
Classify the given hardware requirement into one of the five functional labels. These labels are
feedback (labelled as F), driver (labelled as D), interface (labelled as I), power (labelled as P) and
controller (labelled as C). The label definitions are as follows: (…) Ask me questions if needed to
break the given task into smaller subtasks. All the outputs to the smaller subtasks must be combined
before you generate the final output. The requirement: (…)
Classify the given hardware requirement into one of the five functional labels. These labels are
feedback (labelled as F), driver (labelled as D), interface (labelled as I), power (labelled as P) and
controller (labelled as C). The label definitions are as follows: (…) When you provide an answer,
please explain the reasoning and assumptions behind your response. If possible, address any
potential ambiguities or limitations in your answer, to provide a more complete and accurate
response. The requirement: (…)
Act as a requirements engineering domain expert in embedded control systems and classify the given
hardware requirement into one of the five functional labels. These labels are feedback (labelled as
F), driver (labelled as D), interface (labelled as I), power (labelled as P) and controller (labelled as C).
The label definitions are as follows: (…) The requirement: (…)
Classify the given hardware requirement into one of the five functional labels. These labels are
feedback (labelled as F), driver (labelled as D), interface (labelled as I), power (labelled as P) and
controller (labelled as C). The label definitions are as follows: (…) Ask me questions if needed to
break the given task into smaller subtasks. All the outputs to the smaller subtasks must be combined
before you generate the final output. If needed, suggest a better version of the question to use that
incorporates information specific to this task and ask me if I would like to use your question instead.</p>
        <p>The requirement: (…)</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Experimental Setup</title>
        <p>Various methods exist for splitting datasets into training, validation, and test sets, such as random
or stratified sampling. However, rather than using cross-validation approaches, this study held out
the requirements from two entire projects as the test set to prevent data leakage and ensure
evaluation on a distinct dataset. The DEs selected these projects by consensus, choosing one that
closely resembled the others in terms of diversity and complexity and another that was least similar.
This strategy enabled a more comprehensive assessment of the model’s generalizability under strict
time and computational cost constraints. The test set was strictly excluded from all stages of the
study, including training, fine-tuning, and model selection experiments.</p>
        <p>Our BERT-based baseline model was fine-tuned using the remaining six projects, with
hyperparameters optimized via Grid Search. For Qwen2.5-72B-Instruct, the llama.cpp library was
used on a local machine. Each requirement was processed individually to prevent cross-influence,
and results were logged separately. To ensure inference independence, the model was reloaded for
each requirement. For handling anomalies in prompting results, we treated the following cases as
misclassifications: when the model ignored requirement details, generated outputs outside the five
predefined categories, or returned null.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Results</title>
        <p>This section presents the results of the classification of ECS low-level hardware requirements,
addressing RQ2 through the evaluation of two models: fine-tuned BERT and zero-shot Qwen-2.5,
applied with four distinct prompt patterns. Table 6 summarizes the performance of the four prompt
patterns: (1) Cognitive Verifier, (2) Context Manager, (3) Persona, and (4) Question
Refinement, evaluated using Qwen-2.5 (N) and fine-tuned BERT (T-0) on a hold-out test set. The
results are presented for each category with precision (P), recall (R), and F1-score (F1) metrics, along
with average accuracy (A) and weighted F1-score (w.F) for each model.
The experiment results for the hold-out test set.</p>
        <p>The results indicate that the Qwen-2.5 model with the Persona prompt pattern (N-3) generally
achieves the most balanced performance among the evaluated configurations. Analysis of the
confusion matrices reveals that N-3 reduces certain types of misclassifications compared to the
finetuned BERT model (T-0). For instance, N-3 does not misclassify Power as Feedback, whereas T-0
records two such errors. Additionally, N-3 demonstrates improved accuracy in distinguishing
between Driver and Power. For the test set, Power and Driver categories exhibit perfect precision,
while Controller has the highest variance across most configurations. Since zero-shot settings
eliminate the need for a test set, we also evaluated the Qwen-2.5 model using the full dataset,
comprising all 338 requirements, under the same configuration. The results, as summarized in Table
7, demonstrate similar performance compared to the hold-out set, with category-level consistency
largely maintained. However, confusion between Interface/Feedback and Controller/Interface persists
across both models for individual requirements, indicating a common challenge in these categories:
errors tend to occur when requirements involve communication or control specifications between
sub-circuits or feedback sensors.
The experiment results for the complete dataset.</p>
        <p>Classifier</p>
        <p>N-1</p>
        <p>N-2</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Threats to Validity</title>
      <p>Our dataset is based on open-source ECS projects, which may not fully represent industrial
requirements. Since contributions were voluntary, only three non-native (but proficient) DEs
participated, potentially limiting annotation consistency and introducing subjective variations.
Similarly, to balance computational efficiency and generalization assessment, we manually selected
the test set instead of applying cross-validation, which would have required retraining the model
multiple times at a high computational cost. To minimize potential bias, the selection was made
through the DE consensus. While the DEs were highly skilled in technical domains, they lacked prior
expertise in computational linguistics or annotation processes. Additionally, differences between
LLM and library versions may significantly impact the obtained results. Contextual ambiguity also
remains a challenge, as hardware requirements were classified without explicit system-wide
references. To mitigate these issues, training sessions were held for the DEs, annotation guidelines
were refined to minimize ambiguity, the adjudication process was applied at each iteration, and
misclassifications were thoroughly analyzed to assess model errors.</p>
      <p>While the dataset and code remain unpublished due to ongoing PhD research, we provide a
detailed methodology and analytical framework to ensure transparency and support reproducibility
within the constraints of our current research phase.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>This study explored how software requirement classification techniques can be adapted for low-level
ECS hardware requirements, addressing dataset construction challenges (RQ1) and classification
performance (RQ2).</p>
      <p>
        Low-level hardware requirements were inherently implicit in our case, derived from design
artifacts rather than explicitly stated. We experienced firsthand the labor-intensive process, a
difficulty widely acknowledged in the literature [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. However, the framework outlined in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
provided a structured approach to managing [
        <xref ref-type="bibr" rid="ref19 ref20 ref21 ref22">19-22</xref>
        ] this process. Despite adhering to established
guidelines, the domain’s complexity led DEs to perceive strong interdependencies and a critical need
for contextual understanding when evaluating individual requirements. Furthermore, constructing
a dataset comparable in size to established software datasets like PROMISE [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] demanded
considerable effort, and in our case, we could only achieve approximately half its size.
      </p>
      <p>
        Our findings on the classification performance indicate that the fine-tuning approach performed
well on structured hardware requirements, but zero-shot classification with prompt engineering
outperformed the fine-tuned approach in certain cases, reducing reliance on dataset curation.
Aligning with the most related work in the software domain [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8-11</xref>
        ], fine-tuning remains effective
but demands high-quality datasets, technical expertise, and computational resources. However,
contrary to [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], we observed that the zero-shot approach yielded better results for our case,
considering the domain and dataset size. However, interpreting the impact of specific performance
metrics, e.g., precision and recall, depends on the downstream task to which classification is applied.
      </p>
      <p>Hardware requirements introduce strong interdependencies that make classification sensitive to
contextual ambiguity. Some feedback-related requirements were classified as Interface, likely due to
internal communication specifications, while some Driver specifications (e.g., DACs) were
misclassified as Controller due to references to signal processing. Driver and Power categories tend to
be quantitative constraints heavy while being less dependent on context, leading to higher precision.
In contrast, Feedback and especially Controller involve significant interdependencies—internal data
flow and control-related inter-circuit interfaces require either project context or additional
requirements for accurate classification, making isolated evaluation more challenging.</p>
      <p>Misclassified cases, particularly within the Interface category, often stemmed from pre-consensus
disagreements among the DEs. A review of adjudication meeting notes revealed that ambiguous
cases frequently involved multi-board projects, where internal and external interfaces were not
always distinguishable without context. This suggests that for low-level hardware requirements,
context is crucial for classification accuracy. Providing the full set of requirements or key system
attributes as a context could significantly improve performance. Hybrid methods and prompt
engineering could mitigate these challenges.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion and Future Work</title>
      <p>This study assessed the feasibility of applying software requirement classification techniques to ECS
hardware requirements. Dataset construction (RQ1) highlighted challenges in extracting
requirements from design artifacts and developing a structured classification protocol, demanding
domain expertise and iterative refinement. Model evaluation (RQ2) found that well-crafted zero-shot
LLM prompts, particularly the Persona-based prompt, could outperform fine-tuned BERT models,
eliminating the need for extensive dataset curation and fine-tuning. The findings highlight inherent
ambiguity due to the interdependencies in low-level hardware requirements, particularly when
analyzed individually versus holistically.</p>
      <p>While classification currently serves as an auxiliary activity in our case, it can enable systematic
efforts in ECS design for downstream tasks like analysis, verification, validation, testing, etc. Future
work will explore dataset expansion and its open-access release as well as confidence-aware systems
integrating human-in-the-loop approaches to enhance classification reliability. Additionally,
understanding prompt efficiency can help reduce classification errors and improve requirement
quality. Beyond the ECS, these advancements could enable cross-domain adaptability, allowing
requirement classification techniques to be transferred to complex hardware-intensive domains,
where structured but implicit requirements play a crucial role.</p>
      <p>Finally, an important direction for future work is exploring embedded system co-design, where
subsystem-level requirements are systematically decomposed into hardware and software
components. This process could further enable automated test and validation procedures, ensuring
seamless integration between software and hardware functionalities.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and Grammarly to: Grammar and
spelling check, Paraphrase, and reword. After using these services, the authors reviewed and edited
the content as needed and take full responsibility for the publication’s content.
[25] Noh, Y., Kim, K., Lee, M., Heo, C., Jeong, Y., Jeong, Y., ... &amp; Choi, K. S. (2020, October). Enhancing
quality of corpus annotation: Construction of the multi-layer corpus annotation and simplified
validation of the corpus annotation. In Proceedings of the 34th Pacific Asia Conference on
Language, Information and Computation (pp. 216-224).
[26] Hopkins, P., &amp; Unger, M. (2017). What is a’subject-matter expert’? Journal of Pipeline</p>
      <p>Engineering, 16(4).
[27] Bayerl, P. S., &amp; Paul, K. I. (2011). What determines inter-coder agreement in manual annotations?</p>
      <p>A meta-analytic investigation. Computational Linguistics, 37(4), 699–725.
[28] Anthilla/AnthC. (2024). [HTML]. Anthilla. https://github.com/Anthilla/AnthC
[29] Fang, L., Zhang, J., Zong, H., Wang, X., Zhang, K., Shen, J., &amp; Lu, Z. (2023). Open-source lower
controller for twelve degrees of freedom hydraulic quadruped robot with distributed control
scheme. HardwareX, 13, e00393.
[30] Klar, V., Pearce, J. M., Kärki, P., &amp; Kuosmanen, P. (2019). Ystruder: Open source multifunction
extruder with sensing and monitoring capabilities. HardwareX, 6, e00080.
[31] Lau, S. K., Ribeiro, F. A., Subbiah, J., &amp; Calkins, C. R. (2019). Agenator: An open source
computercontrolled dry aging system for beef. HardwareX, 6, e00086.
[32] Bujnowicz, Ł., &amp; Sarewicz, M. (2022). Multichannel pulse high-current driver of magnetic
actuator. HardwareX, 11, e00286.
[33] Abbas, N. S., Salim, M. S., &amp; Sabri, N. (2024). ASCD: Automatic sensing and control device for
crop irrigation scheduling. HardwareX, 18, e00523.
[34] Poulsen, E., Eggertsen, M., Jepsen, E. H., Melvad, C., &amp; Rysgaard, S. (2022). Lightweight
dronedeployed autonomous ocean profiler for repeated measurements in hazardous areas–Example
from glacier fronts in NE Greenland. HardwareX, 11, e00313.
[35] Lezcano, H., Rodas, J., Pacher, J., Ayala, M., &amp; Romero, C. (2023). Design and validation of a
modular control platform for a voltage source inverter. HardwareX, 13, e00390.
[36] Klie, J. C., Castilho, R. E. D., &amp; Gurevych, I. (2024). Analyzing dataset annotation quality
management in the wild. Computational Linguistics, 50(3), 817-866.
[37] Kim, M., Qiu, X., &amp; Wang, Y. (Arthur). (2024). Interrater agreement in genre analysis: A
methodological review and a comparison of three measures. Research Methods in Applied
Linguistics, 3(1), 100097. https://doi.org/10.1016/j.rmal.2024.100097
[38] Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological
bulletin, 76(5), 378.
[39] Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological
measurement, 20(1), 37-46.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Fariha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alwidian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Azim</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <source>Towards Requirements Specification Collaboration Forum for Embedded Software Systems. 2023 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C)</source>
          ,
          <volume>312</volume>
          -
          <fpage>317</fpage>
          . https://doi.org/10.1109/MODELS-C59198.
          <year>2023</year>
          .00061
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Aceituna</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Survey of Concerns in Embedded Systems Requirements Engineering</article-title>
          . SAE
          <source>International Journal of Passenger Cars - Electronic and Electrical Systems</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          . https://doi.org/10.4271/2013-01-2403
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Sousa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Couto</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agra</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Alencar</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Use of Ontologies in Embedded Systems: A Systematic Mapping</article-title>
          .
          <source>2016 10th International Conference on the Quality of Information and Communications Technology (QUATIC)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Aalund</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Philip Paglioni</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>Enhancing Reliability in Embedded Systems Hardware: A Literature Survey</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>13</volume>
          ,
          <fpage>17285</fpage>
          -
          <lpage>17302</lpage>
          . IEEE Access.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ruan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Requirements Modeling Aided by ChatGPT: An Experience in Embedded Systems</article-title>
          .
          <source>2023 IEEE 31st International Requirements Engineering Conference Workshops (REW)</source>
          ,
          <fpage>170</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Chow</surname>
            ,
            <given-names>M. Y.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Analysis of Embedded System's Functional Requirement using BERT-based Name Entity Recognition for Extracting IO Entities</article-title>
          .
          <source>Journal of Information Processing</source>
          ,
          <volume>31</volume>
          (
          <issue>0</issue>
          ),
          <fpage>143</fpage>
          -
          <lpage>153</lpage>
          . https://doi.org/10.2197/ipsjjip.31.143
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Lopez-Hernandez</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Octavio</surname>
            Ocharan-Hernandez,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mezura-Montes</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sanchez-Garcia</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <source>Automatic Classification of Software Requirements using Artificial Neural Networks: A Systematic Literature Review. 2021 9th International Conference in Software Engineering Research and Innovation (CONISOFT)</source>
          ,
          <fpage>152</fpage>
          -
          <lpage>160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hey</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koziolek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tichy</surname>
            ,
            <given-names>W. F.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>NoRBERT: Transfer Learning for Requirements Classification</article-title>
          .
          <source>2020 IEEE 28th International Requirements Engineering Conference (RE)</source>
          ,
          <fpage>169</fpage>
          -
          <lpage>179</lpage>
          . https://doi.org/10.1109/RE48521.
          <year>2020</year>
          .00028
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kici</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malik</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cevik</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Başar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>A BERT-based transfer learning approach to text classification on software requirements specifications</article-title>
          .
          <source>Proceedings of the Canadian Conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Ronanki</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cabrero-Daniel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horkoff</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Berger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Requirements engineering using generative AI: Prompts and prompting patterns</article-title>
          .
          <source>In Generative AI for Effective Software Development</source>
          (pp.
          <fpage>109</fpage>
          -
          <lpage>127</lpage>
          ). Cham: Springer Nature Switzerland.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Arvidsson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Axell</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Prompt engineering guidelines for LLMs in Requirements Engineering</article-title>
          . https://gupea.ub.gu.se/handle/2077/77967
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arunachalam</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>P. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Psiakis</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makris</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Basu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Unlocking Hardware Security Assurance: The Potential of LLMs (No</article-title>
          . arXiv:
          <volume>2308</volume>
          .11042). arXiv. https://doi.org/10.48550/arXiv.2308.11042
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ene</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirby</surname>
            , R., Cheng,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinckney</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alben</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anand</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bayraktaroglu</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhaskaran</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catanzaro</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaudhuri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dally</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deshpande</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dhodhi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halepete</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , … Ren,
          <string-name>
            <surname>H.</surname>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>ChipNeMo: Domain-Adapted LLMs for Chip Design (No</article-title>
          . arXiv:
          <volume>2311</volume>
          .00176). arXiv. https://doi.org/10.48550/arXiv.2311.00176
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Tikayat</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Cole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. F.</given-names>
            ,
            <surname>Pinon Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. J.</given-names>
            ,
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. T.</given-names>
            , &amp;
            <surname>Mavris</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. N.</surname>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>aeroBERTClassifier: Classification of Aerospace Requirements Using BERT</article-title>
          . Aerospace,
          <volume>10</volume>
          (
          <issue>3</issue>
          ),
          <fpage>279</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Uygun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Momodu</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Local large language models to simplify requirement engineering documents in the automotive industry</article-title>
          .
          <source>Production &amp; Manufacturing Research</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ),
          <fpage>2375296</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Sambasivan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kapania</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Highfill</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akrong</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paritosh</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L. M.</given-names>
          </string-name>
          (
          <year>2021</year>
          , May). “
          <article-title>Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI</article-title>
          .
          <source>In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Murthy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venkateswaran</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Contractor</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Evaluating the Instructionfollowing Abilities of Language Models using Knowledge Tasks</article-title>
          .
          <source>arXiv preprint arXiv:2410</source>
          .
          <fpage>12972</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Pustejovsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Stubbs</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Natural Language Annotation for Machine Learning: A guide to corpus-building for applications. "</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          ,
          <source>Inc.".</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Gebru</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morgenstern</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vecchione</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaughan</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallach</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iii</surname>
            ,
            <given-names>H. D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Crawford</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Datasheets for datasets</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>64</volume>
          (
          <issue>12</issue>
          ),
          <fpage>86</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20] Holland,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Hosny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Newman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Joseph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            , &amp;
            <surname>Chmielinski</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>The dataset nutrition label</article-title>
          .
          <source>Data Protection and Privacy</source>
          ,
          <volume>12</volume>
          (
          <issue>12</issue>
          ),
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Bender</surname>
            ,
            <given-names>E. M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Data statements for natural language processing: Toward mitigating system bias and enabling better science</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>6</volume>
          ,
          <fpage>587</fpage>
          -
          <lpage>604</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Hutchinson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smart</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanna</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denton</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kjartansson</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , ... &amp; Mitchell,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2021</year>
          , March).
          <article-title>Towards accountability for machine learning datasets: Practices from software engineering and infrastructure</article-title>
          .
          <source>In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency</source>
          (pp.
          <fpage>560</fpage>
          -
          <lpage>575</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Cleland-Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazrouee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liguo</surname>
            ,
            <given-names>H</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Port</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>nfr [Data set]</article-title>
          .
          <source>Zenodo</source>
          . Available: http://doi.org/10.5281/zenodo.268542
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Fischbach</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frattini</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spaans</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kummeth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vogelsang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Unterkalmsteiner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Automatic detection of causality in requirement artifacts: the cira approach</article-title>
          . In Requirements Engineering: Foundation for Software Quality: 27th International Working Conference,
          <string-name>
            <surname>REFSQ</surname>
          </string-name>
          <year>2021</year>
          , Essen, Germany, April 12-
          <issue>15</issue>
          ,
          <year>2021</year>
          , Proceedings
          <volume>27</volume>
          (pp.
          <fpage>19</fpage>
          -
          <lpage>36</lpage>
          ). Springer International Publishing.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>