<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Blum);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>What to expect from a Replication Package Repository?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Blum</string-name>
          <email>blumma@uni-trier.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralf Schenkel</string-name>
          <email>schenkel@uni-trier.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Reproducibility, Replication Package, PapersWithCode, GitHub, Text Processing</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Trier</institution>
          ,
          <addr-line>Universitätsring 15, 54296 Trier</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The release of replication packages with source code has become a valuable step in ensuring the reproducibility of the results promulgated in scholarly articles. Many of the recent works in the field of reproducibility focus on the deviation of reproduced outcomes from their corresponding published results and often simply ignore replication packages which could not be executed and evaluated at all. This study evaluates replication packages hosted on GitHub for more than 200 000 scientific publications and attempts to algorithmically decide if the files provided in the repositories potentially allow reconstituting the original execution environments and, therefore, are more likely to have their published findings reproduced. We find that less than half of the analyzed repositories contain dependency information files to successfully curate runtime environments. Additionally, we discover that these ifles frequently are of poor quality and only install the newest versions of dependencies instead of those used for the original research; this happens especially often with packages for numerical computation when compared to regular utility packages.</p>
      </abstract>
      <kwd-group>
        <kwd>This led to the creation of various ”reproducibility checklists” that authors should fill out when</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The feasibility of reusing existing scientific findings to facilitate more advanced research depends on
several factors. An important aspect is the need to trust the results of the original work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This trust is
facilitated if an independent researcher is able to apply the same methods on the same data and thereby
”reproduce” the original authors’ results and conclusions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In contrast, if the attempt to reproduce
leads to deviated data and divergent inference, the examined scientific research would be considered
less trustful [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Attaching replication packages with source code to scholarly works simplifies the process of verifying
the computational reproducibility of the respective research. A diferent researcher using the same code
and data as in the original work should obtain the same results [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Validating reproducibility could be assumed to be a simple task: If both data and source code exist in
digital form, it should be easy to just run the code and receive the same results as before. However, 50%
of the researchers have reported failing to reproduce their own results and even 70% were unable to
perform this task on someone else’s work, a situation labeled ”reproducibility crisis” in science [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In
the context of Machine Learning and Natural Language Processing, these tendencies are reported only
slightly better, with 56% of researchers not being able to execute someone else’s code at all or obtaining
significantly diferent results compared to the original work [
Germany
      </p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
      <p>Although the ability to obtain the exact same results can depend on various factors, such as access
to code and data, algorithm documentation, dataset versions, parameter settings, initialization values,
random seeds, hardware details, and runtime environment [11], we focus only on the latter in this work:
Do replication packages supplied with scientific publications contain enough information to
reproduce their respective original runtime environments?</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>Modern programming languages (e. g., Python) already include standard runtime libraries that ofer
built-in support for many common tasks. For additional requirements, algorithms have to be either
implemented from scratch by programmers or imported from third-party libraries which provide
ready-to-use versions of the required functionality.</p>
      <p>Since these libraries are maintained independently, code updates can possibly break existing source
code relying on them completely (e. g., interface changes) or result in divergent runtime behavior (e. g.,
arithmetic rounding deviations if internal calculations are performed in a diferent order).</p>
      <p>For example, the ”numpy” library – which we later observe as the most referenced third-party
Python library in our study – maintains a list of supported Python releases in combination with
”numpy” versions [12]. However, even this precaution was not enough when the transition to pairwise
summation changed the internal computation of the dot product1 and for ⃗= ( 1,  2) and ⃗= ( 1,  2)
the equation ⃗⋅ ⃗=  1 1 +  2 2 was no longer true2.</p>
      <p>Therefore, reproducibility often depends on the availability of exact versions of third-party libraries.
A replication package for source code using ”numpy” might:
• Just use ”import numpy” in its source code
• Ask to install ”numpy”
• Ask to install ”numpy” version ”x.y”
• Ask to install ”numpy” version ”x.y” in combination with Python release ”a.b.c”
Only the last option would be a good candidate to reproduce outputs comparable to the original
results, while the other three possibilities might compute a deviating output or not run at all. As a
consequence, this could cast doubt on the validity of the research output.</p>
      <p>For this work, we focus only on source code and replication packages using Python as programming
language.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>This section describes the pipeline we use to select and retrieve the data on which our study is based
and the methods we apply to it.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Selection</title>
        <p>We use the data provided by Papers With Code3 as the initial seed for our research. The website maintains
a large corpus of scientific publications concerning Machine Learning in various fields of science. It
also provides information about methods introduced, datasets created, evaluation results achieved and
source code released for many of the individual articles. This data is available in daily updated JSON
dumps [13]. We use the version downloaded on November 12th, 2024 for our research.</p>
        <sec id="sec-3-1-1">
          <title>1https://github.com/numpy/numpy/pull/3685, accessed 2025-07-18</title>
          <p>2https://stackoverflow.com/questions/32952941/numpy-floating-point-rounding-errors, accessed 2025-07-18
3https://paperswithcode.com/, accessed 2025-07-18; ofline at the time of the camera-ready version (redirects to https://
huggingface.co/)
3.1.1. Papers With Code
The file ” links-between-papers-and-code.json.gz” contains mappings of scientific publications to one or
more source code repositories. In our downloaded version, approximately 265 000 mappings pointing
to 210 000 unique repositories are included. These references exist if
• the publication explicitly links to the source code in its full text,
• the repository states in its README file that it contains the replication package for the publication,
or
• the publication’s authors create an account on the website and manually add the mapping.</p>
          <p>The publication’s metadata only includes its title and two URLs to its landing page and PDF file
download location, but no additional information such as year of publication, field of science, or a
digital object identifier (DOI).</p>
          <p>Since this would prevent researching the changes in replication packages over time and also complicate
deciding if one specific publication known only by its DOI, title and authors matches one of those
analyzed in this work, we attempt to determine each publication’s DOI and to match it to an OpenAlex4
metadata profile.</p>
          <p>First, we look at the URLs of a publication’s landing page and PDF location and generate a list of
possible alternative URLs which still lead to the same resource, e. g., if the URL format changed over
time. We apply the following heuristics:
• Scanning for embedded DOIs or alternative resolvers (e. g., ”https://dl.acm.org/doi/ ”) and then
using ”https://doi.org/” and the DOI to create the URL
• Removing file extensions (” .pdf ”) or path sufixes (e. g., ” /download”, ”/full”, or ”/xml”)
• Replacing known domain names by alternative ones (e. g., ”https://aclanthology.org/ ” instead of
”https://www.aclweb.org/anthology/ ”)
• Removing embedded version numbers (e. g., ”/pdf/1601.02063v2.pdf ” → ”/pdf/1601.02063.pdf ”)
• Removing or adding leading zeros for files hosted on arXiv 5 (e. g., ”/1601.02063” and ”/1601.2063”)
Then, we collect all these permutations in a list of candidate URLs and try to match them with a
known DOI. For this, we build a reverse lookup table with the public data available from the Crossref
[14] and Datacite [15] DOI registration agencies. If a DOI resolving to one of the URLs is found, we add
it to the candidate list.</p>
          <p>Finally, we search the OpenAlex [16] metadata dump for any URLs and DOIs contained in our
candidate list.
3.1.2. Crossref and Datacite
Crossref and Datacite are large DOI registration agencies. They assign DOIs to publishers that can be
updated by them to resolve to the most recent URL for the given publication. A DOI may link to various
alternative URLs that provide other data formats based on an additional application requested (e. g.,
”text/xml” for text-mining).</p>
          <p>Although the Web-APIs of Crossref and Datacite only allow resolving DOIs to URLs and enforce
query limits, they also provide a downloadable dump of their data [14, 15], which we use to create an
inverse index to match our candidate URLs to known DOIs. Collectively, both dumps provide metadata
for approximately 240 million DOIs.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>4https://openalex.org/ 5https://arxiv.org/</title>
          <p>3.1.3. OpenAlex
OpenAlex is an open and free bibliographic catalog of scientific publications and their related metadata.
Besides its own ”Alex-IDs”, it also includes several other unique identifiers, such as DOIs. In addition, it
contains extensive metadata, such as the date of publication, fields of science with rated scores, open
access status, all known landing pages, and download locations for each indexed entry.</p>
          <p>We use the OpenAlex data dump6 published on May 29th, 2025 for our research, which contains
metadata for 267 million publications.</p>
          <p>We then try to match all available URLs to the publications from our Papers With Code candidate list
and filter out all entries found where OpenAlex lists the concept ”Computer Science” with a score ≤ 0.5
or not at all. This way, we attempt to focus only on the computer science domain and ignore source
code repositories for Machine Learning publications in fields such as ”Psychology”, ”Chemistry”, or
”Economics”, which might have divergent expectations on reproducibility and replication packages.
3.1.4. unarXive
The unarXive [17] dataset contains metadata and full text for all articles uploaded to the arXiv website
up to December 2022. Since many URL permutations of arXiv URLs are already included in the
candidate list, matching is uncomplicated and gives us additional information about the field of science
by returning the categories the authors have selected for their uploaded publications.
3.1.5. Metadata Multiplicities
Multiple entries in the OpenAlex and unarXive datasets might be matched to the same publication listed
in Papers With Code. This happens if several versions of a scientific paper exist (e. g., a few pre-prints
and a journal publication). Since we mainly use the metadata to decide if we include a publication in
our research, this problem can be neglected.</p>
          <p>When processing the year of publication in our work, we use the most recent of the multiple available
dates since we assume the contents of the source code repository to be up-to-date with the latest version
of a publication.
3.1.6. Final Research Data Set Selection
Based on our previous steps, we now have a curated list of computer science publications from the
Papers With Code file ”links-between-papers-and-code.json.gz”, including the matching metadata available
from the OpenAlex corpus.</p>
          <p>Since almost all (99%) linked source code repositories are located on GitHub7, we only focus on
retrieving the repository metadata from there and ignore other hosting solutions.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Retrieval</title>
        <p>Since the valid format of a GitHub repository is known to be
”https://github.com/&lt;username&gt;/&lt;repository&gt;”, we lowercase the URLs and remove any additional data (e. g., pagination arguments, or directly
linking to a branch or file).</p>
        <p>For all of the approximately 210 000 distinct GitHub repositories linked in our Papers With Code
data, we downloaded the repository metadata between December 24th and 29th, 2024 using the GitHub
Web-API8.</p>
        <p>We also retrieved the content metadata (e. g., directory structure, filenames, and sizes) within two
months after that. Since these downloads were based on the tree-hash stored in the repository metadata,
all structural data downloaded represents the state it had between December 24th and 29th, regardless</p>
        <sec id="sec-3-2-1">
          <title>6https://docs.openalex.org/download-all-data/download-to-your-machine</title>
          <p>7https://github.com/
8https://docs.github.com/en/rest?apiVersion=2022-11-28
of whether the repository was updated afterward. The same applies to individual files downloaded later
during our research.</p>
          <p>After removing inaccessible repositories and applying the data selection filters described above,
approximately 151 000 GitHub repositories are left for our research.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Reproducibility Heuristics</title>
        <p>The assessment of how likely it is to reproduce a diferent researcher’s outcomes depends not only on
the question if source code is provided but also on whether the environment in which the code was
originally executed can be replicated as closely as possible. For Python source code, we assume that
this depends mainly on the Python version used and the dependencies installed in its environment.</p>
        <p>Other external factors, such as the availability of a specific C++ compiler in the operating system or
the presence of a given GPU architecture, are ignored in our work.
3.3.1. Environment File Selection
We download individual files from the chosen repositories that are typically connected to the creation
of a runtime environment with specific dependency versions. This includes file types such as:
• ”requirements.txt”: This file is conventionally associated with the Python Package Index 9 and the
”PIP” module used to install dependencies in an existing Python environment. The text file lists
the package names and may include additional restrictions on the versions to be installed (e. g.,
=, &lt;, &gt;, ≥, ≤, ≠). If no version is specified, the most recent compatible package would be installed.</p>
        <p>It is not possible to change the Python version itself using PIP.
• ”conda.txt”, ”environment.yaml”: These files are used by Anaconda 10, an alternative Python
distribution that uses its own package index, which is additionally able to install diferent Python
releases. Package version restrictions similar to PIP can be applied.
• ”setup.py”, ”pyproject.toml”, ”meta.yaml”: These files are used when building a Python package
and specify which version ranges they expect for other packages. While not directly related to
setting up a Python environment, they influence which other package versions can be installed
at the same time.
• ”Dockerfile ”, ”docker-compose.yml”: These files are used by Docker 11 containers and specify
how an operating system environment should be set up. Their existence could mean that the
publication’s authors are aware of possible problems with setting up the execution environment
for their source code and therefore provide a preconfigured container, presumably the same one
used for the original research.</p>
        <p>Although researchers often follow these file naming conventions, they might also violate them
(e. g., ”conda_env.yml”, ”additional_requirements.txt, ”python-package-conda.yml”, ”test.dockerfile.18.04 ”).
When scanning the GitHub content metadata for candidate files, we also include similar file names to
be downloaded and determine the specific file type during the parsing process.
3.3.2. File Encoding
When retrieving files using the GitHub API, it returns the file contents in Base64 format. We detect the
proper character encoding (e. g., UTF-8 or UTF-16) and remove any existing Byte Order Marks. This
step is necessary because the parsing libraries used in our research do not support such markers inside
character sequences.</p>
        <sec id="sec-3-3-1">
          <title>9https://pypi.org/ 10https://anaconda.org/anaconda/python 11https://www.docker.com/</title>
          <p>3.3.3. Parsing
After downloading all candidate files, we parse them based on their contents, since their filenames
might be unusual or misleading. For file types that support specifying Python dependency versions, we
analyze the extent to which they make use of this possibility.</p>
          <p>PIP File Format PIP supports loading a list of dependencies from a plain text file, one package per line.
Valid names may contain only ASCII characters, digits, underscores, and hyphens. They are followed
by zero or more pairs of a comparator and a version number, e. g., ”transformers&gt;=4.31.0,&lt;4.35.0”. White
space, blank lines, and everything after a ” #” character is ignored.</p>
          <p>Entries may specify to load packages from local files, URLs, or GitHub repositories. PIP also supports
commands to add additional package indices or cryptographic hashes.</p>
          <p>Anaconda File Format Anaconda supports two types of file formats. One works similarly to PIP,
but uses a diferent line format, e. g., ” boltons=23.0.0=py310h06a4308_0”. The other one uses the YAML12
syntax to specify dependencies. Depending on the position and node type in the YAML structure,
packages may be imported using the Anaconda or PIP syntax.</p>
          <p>Packaging Tools and Docker Files We detect the TOML- and YAML-based file types by comparing
the nodes of their extracted tree structure with a list of nodes we expect for that file type (e. g., we
separate Anaconda environment files (node: ”dependencies”) from Anaconda packaging files (node:
”requirements” or ”build”) by testing for the (non)existence of these nodes). ”setup.py” files are detected
by checking if they import the ”distutils” or ”setuptools” packages.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>This section shows the results of our work and the statistics generated during it.</p>
      <sec id="sec-4-1">
        <title>4.1. Matching and Filtering</title>
        <p>We present the results of the intermediate steps used in our data selection process. Figure 1 shows the
number of repositories in the individual stages.</p>
        <p>Papers</p>
        <p>With</p>
        <p>Code
OpenAlex
matching</p>
        <p>Field of
Science
GitHub</p>
        <p>total
GitHub
unique
GitHub
existing
0
50000
100000
150000
200000
250000
4.1.1. OpenAlex Matching
Our Papers With Code input file ” links-between-papers-and-code.json.gz” contains 265 735 mappings
between articles and source code repositories. For 256 560 (96.5%), we are able to match them with
an OpenAlex metadata record by using URL permutations and the inverse index generated from the
Crossref and Datacite dumps.</p>
        <p>129 025 (50.3%) of the articles are matched with more than one and 102 (0.04%) with more than four
publication-IDs from the OpenAlex dataset. An inspection of all ”two-match” results (103 582) shows
that 103 418 (99.8%) of them have one of the publication-IDs referring to an arXiv article and the other
to a non-arXiv URL. This validates our assumption that many publications have a pre-print version in
addition to a conference or journal publication.</p>
        <p>We also observe 21 outliers with more than 10 000 publication-IDs each. A manual inspection shows
that the publishers used a specific URL format that was reduced to the same prefix for all of their
publications. This was caused by our code trying to extract DOIs from regular URLs. Since only 21
publications are afected in total, we keep the data in our research set.
4.1.2. Field of Science Filtering
After filtering the 256 600 remaining publications using the OpenAlex ”concept” and unarXive ”category”
information, we reduce our list to 199 547 (77.7%) Computer Science publications.
4.1.3. GitHub Repository Filtering and Retrieval
When only GitHub repository links are considered for further research, 197 878 publications remain.
They link to 156 743 unique GitHub repositories. We were unable to download metadata on 4984 (3.2%)
of them since the GitHub API responded with ”HTTP 404 - Not Found” errors. The remaining 151 759
(96.8%) links could be retrieved without problems.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. General File Type Statistics</title>
        <p>The largest repository contains 3 787 028 files, while the average counts ( 647) and the median (39) are
much lower. For the total size, the maximum is 85 GB with average 63 MB and median 3 MB.</p>
        <p>Table 1 shows the Top-10 distribution of file extensions in the content metadata downloaded for the
repository list. As expected, Python source code (”.py” and ”.ipynb”) accounts for the largest number of
ifles in repositories related to Machine Learning. C / C++ would get the second place (”.cpp/.h/.c/.hpp”
ranked on 16/17/34/40), followed by Matlab (”.m/.mat” ranked on 26/38).</p>
        <p>Python source code is found in 132 130 repositories (87.0%) with 126 704 (83.4%) containing at least
one ”.py” file and 36 903 (24.3%) at least one ”.ipynb” file. Jupyter Notebooks may be used for
programming languages other than Python. Since they account for a total size of 158 GB, we refrained from
downloading and analyzing them individually to determine the language used. C / C++ source code is
only found in 20 507 (15.5%) repositories.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Parsing</title>
        <p>In the 151 759 repositories we consider for our research, we observe 70 741 (46.6%) that include at least
one of the file types we assume to contain instructions for reproducing the original runtime environment.
The total number of such candidate files is 189 110, including 2189 with zero-byte length.
4.3.1. PIP Package Installer
Our retriever downloaded 82 225 files with names similar to ” requirements.txt” located in 51 212
repositories. Of these, 903 were files meant to be used with the Anaconda Package Installer instead of PIP
and 683 contained at least one invalid instruction, so they would not be usable by PIP without manual
user intervention.</p>
        <p>• 35 275 files ( 42.9%) import all dependencies with specified versions.
• 15 128 files ( 18.4%) do not specify any versions.
• 26 128 files ( 31.8%) have versions set for some of their imports.</p>
        <p>• 3029 files ( 3.7%) do not import any dependencies directly.</p>
        <p>Files in the last category may import their dependencies from diferent sources than the Python
Package Index. This is suboptimal, as that resource is not necessarily available to a researcher trying to
reproduce the results of the publication. These ”bad patterns” are found in all four categories listed
above and account for:
• 3807 files ( 4.6%) use instructions to add alternative URLs to lookup packages.
• 6122 files ( 7.4%) install packages directly from a GitHub repository or URL.</p>
        <p>• 1696 files ( 2.0%) install dependencies from a file path on the local system.</p>
        <p>Figure 2 shows the number of replication packages uploaded to GitHub based on the date of the
publication to which it belongs. It also gives the absolute number of PIP environment files, split by
the way dependencies are imported. The data from 2024 and 2025 appears to be still incomplete in the
Papers With Code dataset that we used for our research. Figure 3 shows the relative distribution of the
same data.</p>
        <p>When we ignore the possible data error in 2014, we can observe that the ratio of PIP environment
ifles in replication packages has been constantly increasing from 20% in 2015 to above 50% in recent
years. This could indicate a greater awareness of the reproducibility crisis in the academic world. The
ratio of PIP environment files specifying versions for some of their dependencies has almost doubled
over time, which might also be an indicator that version constraints are decided on a per-package basis
more frequently.
4.3.2. Anaconda Package Installer
Our retriever downloaded 15 320 files with file extensions ” .yaml” or ”.yml”. Of these, 14 910 (97.3%)
have valid YAML syntax, and 284 (1.9%) are YAML templates, which means that they would need to be
preprocessed first by an additional template engine.</p>
        <p>PIP environment files
All imports by version
No imports by version</p>
        <p>Some imports by version
Replication packages</p>
        <p>Total number of</p>
        <p>GitHub uploads</p>
        <p>The number of Anaconda dependency files is 11 523 distributed across 9452 repositories. The package
versioning usage is split up as follows:
• 6876 files ( 59.7%) import all dependencies with specified versions.
• 324 files ( 2.8%) do not specify any versions.
• 4027 files ( 34.9%) have versions set for some of their imports.</p>
        <p>• The remaining files are invalid, empty, or do not list any dependencies.
4.3.3. Packaging Tools and Docker Files
We observe the following file types used in the build process of Python packages or Docker containers:
• 27 965 (18.4%) repositories contain at least one file related to packaging (total: 45 863 files)
• 10 898 (7.1%) repositories contain at least one file related to Docker containers (total: 43 001 files)
25000
20000
15000
10000
5000
0
(a) PIP
(b) Anaconda
4.3.4. Package Distribution
Machine Learning dependencies appear on top of that list. The proportion of entries with and without
version information is not distributed evenly and partially unexpected:
• We assumed that code directly related to numeric computation (e. g., ”numpy”, ”scipy”, ”pandas”)
would rarely appear without version information, since the computational results may change
even with small code updates. However, they appear to be imported rather frequently (30.1%)
without version information.
• ”certifi” contains a list of currently trusted TLS root certificates. Since they become invalid once
they reach their expiration date, we would assume that nobody intentionally enforces using old
certificates because of the security implications. However, almost all imports ( 98.5%) are locked
to specific versions.</p>
        <sec id="sec-4-3-1">
          <title>HTTP clients and fulfill very similar tasks.</title>
          <p>• ”urllib3” has far less unversioned appearances than ”requests”, even though both packages are
”python” specifies which release version should be used to setup the Python runtime environment. ”pip”
is a meta-package and allows importing packages from the Python Package Index which are otherwise
unavailable in Anaconda itself.</p>
          <p>Similarly to the PIP overview, the typical computational Machine Learning packages rank next with
moderate amounts of non-versioned imports. We would assume that packages such as ”zlib”, ”readline”,
”xz” or ”ca-certificates” almost never introduce breaking changes with updated library versions; however,
these types of packages have a very low number of unversioned uses.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this study, we have analyzed the availability of files in replication packages which facilitate
reconstituting the original runtime environments and, therefore, foster reproducibility for scientific publications.
We focused our research on replication packages from the field of Machine Learning which are hosted
on GitHub. Our results show that most repositories contain source code for the Python programming
language and are generally small both in file count and size.</p>
      <p>We observed that less than half of the repositories include file types which potentially facilitate
reconstituting the Python runtime environment used for the respective original scientific work with the
appropriate software and dependency versions. Additionally, these files are often of poor quality and
might link to arbitrary URLs or local files instead of using oficial package indices. Although we observed
a significant growth in the number of replication packages that include dependency information files in
the last decade, there is still much work to be done.</p>
      <p>A future goal might be the automated setup and execution of a replication package. For this to work,
we would need to know which of the possibly multiple environment files to apply and what commands
to execute after that. Such information might be buried somewhere deep in the documentation – if it
exists and we are able to locate it.</p>
      <p>The use of Docker containers – which we only looked at from the sidelines in this study – or similar
solutions could greatly improve reproducibility. Conferences or journals could require the authors to
use provided containers for their submissions, similar to the prerequisite of using a given LATEX template.
This could include using fixed directory structures and commands to execute and would allow the
automated setup and execution of replication packages.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
        <p>checklist?, in: Findings of the Association for Computational Linguistics: ACL 2023, Association for
Computational Linguistics, 2023, pp. 12789–12811. doi:10.18653/v1/2023.findings-acl.809.
[10] M. Arvan, L. Pina, N. Parde, Reproducibility in computational linguistics: Is source code enough?,
in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language
Processing, Association for Computational Linguistics, 2022, pp. 2350–2361. doi:10.18653/v1/2022.
emnlp-main.150.
[11] A. Belz, S. Agarwal, A. Shimorina, E. Reiter, A systematic review of reproducibility research in
natural language processing, in: Proceedings of the 16th Conference of the European Chapter
of the Association for Computational Linguistics: Main Volume, Association for Computational
Linguistics, 2021. doi:10.18653/v1/2021.eacl-main.29.
[12] Nep 29 — recommend python and numpy version support as a community policy standard, 2019.</p>
        <p>URL: https://numpy.org/neps/nep-0029-deprecation_policy.html, accessed 2025-07-16.
[13] PapersWithCode, Papers with code datasets, 2024. URL: https://github.com/paperswithcode/
paperswithcode-data, accessed 2024-11-12.
[14] March 2025 public data file from crossref, 2025. doi: 10.13003/87bfgcee6g.
[15] DataCite, Datacite public data file 2024, 2025. doi: 10.14454/TJPC-9M93.
[16] J. Priem, H. Piwowar, R. Orr, Openalex: A fully-open index of scholarly works, authors, venues,
institutions, and concepts, 2022. doi:10.48550/ARXIV.2205.01833.
[17] T. Saier, J. Krause, M. Färber, unarxive: All arxiv publications pre-processed for nlp, including
structured full-text and citation network (full), 2023. doi:10.5281/ZENODO.7752754.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Siebert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Machesky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Insall</surname>
          </string-name>
          ,
          <article-title>Overflow in science and its implications for trust</article-title>
          ,
          <source>eLife</source>
          <volume>4</volume>
          (
          <year>2015</year>
          ). doi:
          <volume>10</volume>
          .7554/elife.10825.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hsieh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Vaickus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Remick</surname>
          </string-name>
          ,
          <article-title>Enhancing scientific foundations to ensure reproducibility</article-title>
          ,
          <source>The American Journal of Pathology</source>
          <volume>188</volume>
          (
          <year>2018</year>
          )
          <fpage>6</fpage>
          -
          <lpage>10</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.ajpath.
          <year>2017</year>
          .
          <volume>08</volume>
          .028.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.-L.</given-names>
            <surname>Meng</surname>
          </string-name>
          , Reproducibility, replicability, and reliability,
          <source>Harvard Data Science Review</source>
          <volume>2</volume>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1162/99608f92.dbfce7f9.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Willis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stodden</surname>
          </string-name>
          ,
          <article-title>Trust but verify: How to leverage policies, workflows, and infrastructure to ensure computational reproducibility in publication</article-title>
          ,
          <source>Harvard Data Science Review</source>
          <volume>2</volume>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1162/99608f92.25982dcf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <volume>1</volume>
          ,
          <string-name>
            <surname>500</surname>
          </string-name>
          <article-title>scientists lift the lid on reproducibility</article-title>
          ,
          <source>Nature</source>
          <volume>533</volume>
          (
          <year>2016</year>
          )
          <fpage>452</fpage>
          -
          <lpage>454</lpage>
          . doi:
          <volume>10</volume>
          .1038/ 533452a.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mieskes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Névéol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Grouin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <article-title>Nlp community perspectives on replicability</article-title>
          ,
          <source>in: Proceedings - Natural Language Processing in a Deep Learning World, RANLP 2019</source>
          ,
          <string-name>
            <given-names>Incoma</given-names>
            <surname>Ltd</surname>
          </string-name>
          .,
          <string-name>
            <surname>Shoumen</surname>
          </string-name>
          , Bulgaria,
          <year>2019</year>
          , pp.
          <fpage>768</fpage>
          -
          <lpage>775</lpage>
          . doi:
          <volume>10</volume>
          .26615/
          <fpage>978</fpage>
          - 954- 452- 056- 4_
          <fpage>089</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pineau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vincent-Lamarre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lariviere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beygelzimer</surname>
          </string-name>
          , F.
          <string-name>
            <surname>d'Alche Buc</surname>
            , E. Fox,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Larochelle</surname>
          </string-name>
          ,
          <article-title>Improving reproducibility in machine learning research(a report from the neurips 2019 reproducibility program)</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>22</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          . URL: http://jmlr.org/papers/v22/
          <fpage>20</fpage>
          -
          <lpage>303</lpage>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Stojnic</surname>
          </string-name>
          ,
          <source>Ml code completeness checklist</source>
          ,
          <year>2022</year>
          . URL: https://github.com/paperswithcode/ releasing-research-code,
          <year>accessed 2025</year>
          -
          <volume>07</volume>
          -16.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Magnusson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dodge</surname>
          </string-name>
          ,
          <article-title>Reproducibility in nlp: What have we learned from the</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>