<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Block and Roll: A Metric-based Evaluation of Reputation Block Lists</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Siôn</forename><surname>Lloyd</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">ICANN</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Carlos</forename><surname>Hernandez-Gañán</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">ICANN</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Samaneh</forename><surname>Tajalizadehkhoob</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">ICANN</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Block and Roll: A Metric-based Evaluation of Reputation Block Lists</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">DA934472DEBFBE3F10E19D70179CF64F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:25+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>DNS abuse</term>
					<term>blocklists</term>
					<term>phishing</term>
					<term>malware</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Reputation Block Lists (RBLs) serve as a common defense mechanism against harmful and unwanted internet content. These lists contain the IP addresses, domain names, or full URLs of known spam sources, phishing, malicious sites or other unwanted content. By using RBLs, internet service providers, email providers, and other organizations can effectively safeguard their users from online threats. They are also used for more academic research and as training sets for machine learning models. To help evaluate and understand the effectiveness of RBLs, this paper covers a set of metrics that can be used to evaluate and characterize them. These metrics include RBL focus, mechanics, metadata, volume, overlap, timeliness, and churn. We categorise the metrics into four groups: a general description; metrics that can be directly measured; metrics that can be indirectly measured and metrics that can only be discovered second-hand. When it comes to RBLs there is no "one size fits all". We argue that understanding the strengths and weaknesses of any one RBL, or set of multiple RBLs, is key to getting a good fit for a particular use-case. To maximize the benefit of RBLs, we suggest combining two or more to get a fuller picture than can be provided by any single RBL.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Domain name and IP address reputation lists have been used for many years as a way to identify and block potentially harmful or unwanted traffic on the internet. The earliest known reputation list was created by Paul Vixie in the 1990s, and was called the "Real-time Blackhole List" or RBL <ref type="bibr" target="#b0">[1]</ref>. This list contained the IP addresses of known spam sources and was used by mail servers to block incoming email from those sources. Over time, similar lists were created for other types of online activities, such as domain or URL reputation lists for identifying malicious or phishing websites, and IP address reputation lists for identifying sources of malware or other online threats. Today, these lists are widely used by internet service providers, email providers, and other organisations to help protect their users from online threats. They continue to evolve and improve as new threats emerge and new technologies are developed to combat them.</p><p>We refer to these sources as "Reputation Block Lists" or RBLs, others may call them by slightly different names like "threat intelligence", "security feeds", "abuse feed" or similar. They can contain dif- ferent identifier types: domain names, IP addresses or full URLs, and in many cases a mixture of two or more identifier types. They can also specialise in particular threat types, like spam, phishing, malware, etc.; or they may contain a mixture of multiple threat types. They can differ in collection methodology, licensing, distribution method, intended use and almost every other conceivable way.</p><p>There are many examples of RBLs being used in many different scenarios, some more obvious than others, for example services like google safe browsing 1 can be thought of like an RBL protecting a browser user from known phishing sites. The academic community also utilises RBLs to understand the current and historical reputation of domain names in various types of analysis, to measure security threat concentrations within the internet intermediaries such as TLD, registry, registrars or hosting providers and finally to assess mitigation strategies of internet intermediaries <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref>.</p><p>In many cases the use of this data is not necessarily aligned with how the producers intend it to be used, and so its suitability may not be clear. In other cases conclusions drawn from the analysis based on this data does not necessary reflect the specifications and limitations of the data. Moreover, for all use-cases it is hard to know if the RBL being used is the best fit, if there is a better option or if a combination of two or more RBLs would add enough benefit to justify any extra cost. Note that the cost can be in terms of time and complexity as well as financial, so even free open-source feeds have some cost associated with them.</p><p>Misalignment with the intended use can have a significant impact on a project. For example, an RBL which contains low confidence or not vetted entries could result in an appreciable number of incorrect entries, known as false positives. Such a data feed might be perfectly acceptable if used to protect a small network where the mitigation of incorrect entries has a low associated cost. However, the same RBL may not be suitable for an application where a false positive results in a time and resource consuming investigation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Objectives</head><p>Given the problem introduced above, in this document we propose a method to evaluate and characterise an RBL; not just in isolation but also in how multiple RBLs complement one another. We'll look at the general description of the RBL; things we can measure directly; things that we can make approximations of and things that we can only discover second-hand. We'll also discuss the implications and limitations of these measurements.</p><p>This work has been informed from earlier examples <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17]</ref>; but we have kept or modified parts of their suggested method to suit our requirements. As such; our approach is grounded in the projects that we have been involved with, other parties with other experiences may well have other metrics which they regard as important.</p><p>To move towards evaluating an RBL, or group of RBLs, we propose metrics that help measuring multiple aspects of a list. We then demonstrate the methods by which the metrics can be measured. Recall though that this work is based on the sources that we are already familiar with; it is likely that other RBLs have features which will require modifications to this method.</p><p>We will not discuss here the steps required to read RBL data as this will vary between RBLs. We do show, in Appendix A, the database schema that we use to harmonise data into a single, consistent, format. All of the RBL data we read is written into this structure, although it has had to evolve as new RBLs with new fields have been added.</p><p>Finally, there ae some things that we are explicitly not trying to measure. We are not looking to put a score on an RBL or say that one is demonstrably better than another; we want to increase our understanding of RBL data used regularly by us and our community, so that we can either use them with confidence, or understand why they are not suitable for a particular project. We also do not consider cost or licensing terms here; although these could be significant factors in any decision on whether to use an RBL. Lastly, we are not aiming to evaluate the absolute effectiveness of our RBLs as some of the existing work have already looked at that aspect <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14]</ref>.</p><p>The metrics we use are listed below and described in more detail in section 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">General RBL Description</head><p>These are characteristics that we will know before we start to ingest data into our system. It may be a feature that initially brought the RBL to our interest, maybe to fill an identified gap in our other RBLs. We also include details that we need to know during the integration of an RBL into our system, like how it is distributed and what data it contains.</p><p>• RBL focus -What entry types does it contain (spam, phish, etc.)</p><formula xml:id="formula_0">• RBL mechanics -how is the RBL dissemi-</formula><p>nated, what format is the data in, etc. • Metadata -does the RBL provide more context on list entries, like malware family, phished brand, etc.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">What We Can Measure Directly</head><p>These are the metrics we can measure directly based on the information provided in the RBLs.</p><p>• Volume -how many entries are present • Overlap -how many entries in one RBL are in common with other RBLs • Timeliness -how quickly do entries appear (compared to other RBLs) • Churn -how dynamic are the entries</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">What We Can Measure Indirectly</head><p>These are metrics that can be measured indirectly from the data. So where we maybe have to sample the data in order to get an approximation of the answer, or where we have surrogate measurements in lieu of the thing that we actually want to investigate.</p><p>• Liveliness -how many entries are "active" • Purity -how many are potential false positives • Accuracy -what proportion of stated threat types match reality </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">What We Cannot Measure</head><p>These are characteristics which cannot be derived from the data itself, but are discovered based on second-hand information. For example we look at the documentation for the RBL, consult FAQs or talk to the RBL providers to get this information.</p><p>• Catchment -are there geographic blindspots, collection method gaps (e.g. no mobile data), etc. • Entry retesting -how frequently are entries retested to check if they should still be present on the RBL • Reliability -is the data always available or are there issues transferring</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Method</head><p>Looking in more detail at the metrics outlined above, in the remainder of the paper we demonstrate what we measure and, where appropriate, how we might use visualisations. Sticking to our four categories.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">General RBL Description</head><p>RBL Focus The first thing to consider is the threat types that the RBL contains. Does it focus on a single threat type or contain multiple types? How does this relate to any other RBLs in our set, does it fill a known gap? RBL Mechanics A prosaic but significant issue is how we can read the RBL and merge it into our larger dataset. We need to understand what delivery mechanism is used, is there an API, do we get the data formatted in CSV, JSON, etc. . Also, when we read the RBL does it provide the whole current list or a stream of new entries (a list of "point in time" observations). In the case of the latter the decision on how long an entry remains active is decided by us rather than the provider.</p><p>Metadata It can be useful to have context around a particular entry, and some RBLs provide more information, like a timestamp the entry was added, the malware family seen, the brand being phished and so on. Another useful data point is whether the entry is believed to be a malicious registration or a compromised but otherwise legitimate registration. All of this forms the metadata of an observation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">What We Can Measure Directly</head><p>Volume Possibly the easiest measurement to make is how many entries are present. Although here some care needs to be taken that the same thing is being measured in each case. For example, some RBLs contain just domains while others may contain URLs, but of course multiple URLs may well map to a single domain. We look at unique entries over a period of time, preferably a month or more, to give as good a representation as possible. This is particularly significant for those RBLs which provide a stream of new entries and so don't have the concept of a "current list". If we look at unique domains we see something like Figure <ref type="figure" target="#fig_1">1</ref>. We can also produce similar figures but showing unique hosts, URLs, domains broken down by different threat types, etc. . Higher volumes are, in general, desirable; this is not, however, the whole story. For example the DGArchive <ref type="bibr" target="#b17">[18]</ref> data is based on enumeration of domain generation algorithms, and so the majority of those entries may never be registered. It is therefore arguable that we are not comparing like with like to other RBLs; we look to addresses issues Figure <ref type="figure">2</ref>: Overlap of unique domain entries seen between RBLs over a fixed period of time like these later on. It is also true that some threats are more serious or active than others and so some entries offer more "value".</p><p>Overlap If we are looking to add a new RBL into our existing data, it is interesting to know how many entries are in common with our current data. Again we need to aggregate over a period of time and be careful to compare apples with apples. It may also be instructive to see different threat types separately. One simple measure is the overlap of unique domains shown in Figure <ref type="figure">2</ref>.</p><p>This shows how much of one RBL is contained within another (and vice versa). For example, if we look at SURBL and openphish we can see that SURBL contains 0.85 (85%) of openphish. However, openphish contains just 0.015 (1.5%) of SURBL; while the absolute number of domains in common is the same, the difference is the underlying size of the RBL.</p><p>The view shows us some other interesting features; while the majority of overlaps are small, less than 5% or so, there are some which are much higher. This is where open sources are being read and incorporated into other RBLs, presumably after being validated to the required standard for that RBL. This could be significant if entries on multiple RBLs are being taken as multiple independent observations, when they may in fact stem from a single original source.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Timeliness</head><p>The view above is interesting, and shows some cross-pollination between RBLs, so the next question is where two or more RBLs have the same entry, which gets it earlier and by how much? To this end we look at the time delta between an entry appearing on our "base feed" that we are considering and any other RBL, this gives us visualisations like that shown in Figure <ref type="figure">3</ref>.</p><p>Here entries with a negative time show the base feed leading other RBL entries, whereas a positive time indicates it lagging behind. So ideally we want to see more weight to the left of the graph indicating that the RBL being considered is consistently getting entries earlier than others.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 3:</head><p>Where overlap is seen we can show if our considered RBL saw the domain earlier or later than the others Churn For the RBLs that provide their whole current set of entries on each read, it is also useful to know how dynamic the list is. If an RBL's volume stays the same as the previous iteration, is it because the list is static, or is it because as many entries  Note that removing stale entries which are no longer active threats can be as important as adding new entries, but is often not considered. To this end we can also look at the histogram of the ages of entries, see Figure <ref type="figure" target="#fig_3">5</ref>, note the log y-scale. Figure <ref type="figure" target="#fig_3">5</ref> shows a healthy mix where the majority of entries have a short lifespan of days/weeks, with a small number being on the RBL for a year or more.</p><p>This analysis gives us more insight into how active the RBL is, how many new threats are being added and how many old threats are being removed. A higher churn reflects a more active RBL and so is seen as a positive feature. For those feeds which just provide "point in time" observations this analysis is not so relevant; although we can still look at the volumes of new threats being added.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">What We Can Measure Indirectly</head><p>Liveliness Above we measured the volume of entries on an RBL. However, it is also interesting to know how many of those entries are "active". There may be entries which no longer resolve, or have been mitigated in other ways (for example, some registrars take control of the domain and "park" it).</p><p>We would struggle to capture this information for every entry on a sizeable RBL, and once we had finished we would need to start again to catch any new entries or changes in existing ones. One way to tackle this would be to pick a random sample of sufficient size to give us a measurement hopefully representative of the whole population. If we see a large proportion of the entries not resolving then we need to think why this might be. While one reason may be that the RBL has stale information there may be other explanations. For example, maybe the RBL includes the output from one or more domain generation algorithms (DGAs), many of which are never actually registered.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Purity</head><p>One of the more serious potential issues for RBLs is when they contain false positive reports, that is they contain entries which are not, and never have been, malicious. These entries are nearly impossible to discover en masse, they will only really become apparent during use. However, can we try to discover potential issues ahead of time? One thing which we look at is the overlap between the RBL and a source of "known good" entities. We are not aware of such a list, so use a surrogate source -a list of top domains, like the TRANCO top 1M. While these domains may still be malicious they are less likely to be. Also, for uses like blocking network traffic, any entry in the top 10,000 say would potentially be very disruptive.</p><p>We obviously want this score to be as low as possible, and where we suspect false positives we'd like to understand if there are explanations or mitigations we can use. To take DGAs as an example again, short DGA domains may coincidentally overlap with real words and legitimate registrations. To make this less of an issue it may be that only DGA domains with seven or more characters are retained.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Accuracy</head><p>Where an RBL provides extra metadata, like threat types, do we believe that they are correct? Where we see entries in common between different RBLs, do they agree? This can be difficult to pin down as we do see the same entity reported for different threat types within the same RBL, so again we need to sample and check in order to get an idea of the scale of any issues.</p><p>We would like to be able to trust all the data that an RBL provides, not just the presence of entries, and the mis-classification of entries can have serious consequences in some cases. If an RBL has a low accuracy in terms of the metadata we may not be able to use it to generate statistics for example.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">What We Can't Measure</head><p>Catchment RBLs have different collection mechanisms, even though some are aggregates of multiple primary sources. This will end up giving the RBL strengths and blindspots, which could be geographic or delivery related (e.g. no mobile data), no visibility of threats targeted at specific countries, etc.</p><p>Understanding of these can sometimes be found from FAQs, whitepapers, conversations with the providers or other second-hand methods. In many cases however the amount of information is, for operational reasons, limited.</p><p>We may need this information to identify RBLs that fill gaps in our current set, for other uses it may be that data for a particular locale is essential.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Entry Retesting</head><p>We have seen that entries are removed from RBLs; but we cannot, from our measurements, definitively say why. Are statuses of entries being periodically reconfirmed, or are they just timed out? Some RBLs give this information but most do not, and deciding how long we trust entries for can be influenced by how this is being handled by the RBL.</p><p>Ideally all entries are frequently retested, but we appreciate that operationally this may not be possible.</p><p>Reliability A metric that can only be determined with continued monitoring and use, is whether the RBL data is always available, or are there sometimes issues transferring. This can influence our confidence in using an RBL in a production environment as if we have our own SLAs then the RBL should have something at least similar but preferably better.</p><p>For open-source RBLs with no contract (and therefore no SLA) only our experience with the RBL can give us this confidence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions</head><p>In order to understand which RBL(s) are suitable for which projects, we need to understand the project requirements, the RBL characteristics and how multiple RBLs interact with each other.</p><p>We cannot claim that certain RBLs are better than others; but it can be that some RBLs are more suited to some projects.</p><p>However, from what we have seen of the RBLs we have access to, adding multiple sources increases the number of unique entities included and hence the comprehensiveness of the data used.</p><p>While in this work we outlined our evaluation processes, we emphasize the fact that these are not meant to be complete or prescriptive as they are predicated on our current use cases. It is quite likely that future projects, or new RBLs, will suggest new measures and modifications to existing ones.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Appendix 1: Database Schema</head><p>We write all of our RBL data to a single database table per month; most sources are read daily although some more frequently. Our current schema is shown in Table <ref type="table" target="#tab_0">1</ref> although this has evolved with new RBLs and requirements. Some processing is required for most entries to be written to these tables, for example domains are extracted from URLs, as are the TLD and suffix. This means we can get a more consistent view across all of our RBLs, coping with those which provide different fields in different formats, or use slightly different terminology.</p><p>Note that each time we read from a feed we add new entries rather than updating existing rows. This means that there will be duplicate entries when an entity is reported by an RBL for multiple days. This is also true for RBLs which report on URLs, and so may have the same domain multiple times. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>APWG.EU Tech 2023, June 21-22, 2023, Dublin, IE sion.lloyd@icann.org (S. Lloyd); carlos.ganan@icann.org (C. Hernandez-Gañán); samaneh.tajali@icann.org (S. Tajalizadehkhoob) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Number of unique domains seen over a fixed period of time</figDesc><graphic coords="3,89.29,84.19,416.69,126.06" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Volume over time for a single RBL along with the number of additions and deletions</figDesc><graphic coords="5,89.29,84.19,203.36,118.85" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Histogram of entry ages for a single RBL, note the log scale</figDesc><graphic coords="5,89.29,242.02,203.36,128.72" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Statuses of a sample of domains for two RBLs</figDesc><graphic coords="5,302.62,253.32,203.36,146.92" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="4,89.29,84.19,416.70,184.05" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>RBL Data Schema</figDesc><table><row><cell>Column Name</cell><cell>Type</cell><cell>Notes</cell></row><row><cell>report_date</cell><cell>date</cell><cell>Some RBLs tell us, for others it's when we read that RBL.</cell></row><row><cell>domain</cell><cell>text</cell><cell>Stripped domain name</cell></row><row><cell>feed</cell><cell>text</cell><cell>Which source it came from</cell></row><row><cell>reason</cell><cell>text</cell><cell>Threat type -Spam, phishing, etc.</cell></row><row><cell>full_identifier</cell><cell>text</cell><cell>Some RBLs give URLs or include subdomains.</cell></row><row><cell>score</cell><cell>int</cell><cell>Some RBLs give a confidence score</cell></row><row><cell>suffix</cell><cell>text</cell><cell>Suffix according to the public suffix list</cell></row><row><cell>tld</cell><cell>text</cell><cell>Top-level domain</cell></row><row><cell>tld_type</cell><cell>text</cell><cell>country code (CC) or generic (gTLD) top-level domain</cell></row><row><cell>registrar</cell><cell>text</cell><cell>If known</cell></row><row><cell>reg_id</cell><cell>int</cell><cell>Registrar ID, if known</cell></row><row><cell>seen_since</cell><cell cols="2">timestamp Initial report_date</cell></row><row><cell>url_shortener</cell><cell>boolean</cell><cell>Is it a known URL shortener (e.g. bit.ly); won't be reliable</cell></row><row><cell>sub_feed</cell><cell>text</cell><cell>Some RBLs aggregate other sources, if this is the case the original source will be here</cell></row><row><cell>notes</cell><cell>text</cell><cell>Any other info the RBL gave that might be useful. Will depend on the RBL</cell></row><row><cell>dga</cell><cell>boolean</cell><cell>Is the entry known to be from a domain generation algorithm</cell></row><row><cell>ip</cell><cell>boolean</cell><cell>Is the entry an IP address</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Mcmillan</surname></persName>
		</author>
		<ptr target="http://sunsite.uakom.sk/sunworldonline/swol-12-1997/swol-12-vixie.html" />
		<title level="m">What will stop spam?</title>
				<imprint>
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><surname>Icann</surname></persName>
		</author>
		<ptr target="https://www.icann.org/octo-ssr/daar" />
		<title level="m">Domain abuse activity reporting</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><surname>Spamhaus</surname></persName>
		</author>
		<ptr target="https://www.spamhaus.org/statistics/tlds/" />
		<title level="m">The World&apos;s Most Abused TLDs</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Interisle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Consulting</forename><surname>Group</surname></persName>
		</author>
		<ptr target="https://interisle.net/PhishingLandscape2022.pdf" />
		<title level="m">Phishing landscape 2022</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">David</forename><surname>Barnett</surname></persName>
		</author>
		<ptr target="https://circleid.com/posts/20230112-the-highest-threat-tlds-part-1" />
		<title level="m">The highest threat tlds -part 1</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Bayer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Nosyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Hureau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Fernandez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Paulovics</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Duda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Korczyński</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2212.08879</idno>
		<title level="m">Study on domain name system (dns) abuse: Technical report</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Comar: classification of compromised versus maliciously registered domains</title>
		<author>
			<persName><forename type="first">S</forename><surname>Maroofi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Korczyński</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hesselman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ampeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Duda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE European Symposium on Security and Privacy (EuroS&amp;P), IEEE</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="607" to="623" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Cybercrime after the sunrise: A statistical analysis of dns abuse in new gtlds</title>
		<author>
			<persName><forename type="first">M</forename><surname>Korczynski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wullink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tajalizadehkhoob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">C</forename><surname>Moura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Noroozian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bagley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hesselman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 on Asia Conference on Computer and Communications Security</title>
				<meeting>the 2018 on Asia Conference on Computer and Communications Security</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="609" to="623" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Herding vulnerable cats: a statistical approach to disentangle joint responsibility for web security in shared hosting</title>
		<author>
			<persName><forename type="first">S</forename><surname>Tajalizadehkhoob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Van Goethem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Korczyński</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Noroozian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Böhme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Moore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Joosen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Eeten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security</title>
				<meeting>the 2017 ACM SIGSAC Conference on Computer and Communications Security</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="553" to="567" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Shades of grey: On the effectiveness of reputation-based &quot;blacklists</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sinha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bailey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Jahanian</surname></persName>
		</author>
		<idno type="DOI">10.1109/MALWARE.2008.4690858</idno>
	</analytic>
	<monogr>
		<title level="m">2008 3rd International Conference on Malicious and Unwanted Software (MALWARE)</title>
				<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="57" to="64" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Characterization of blacklists and tainted network traffic</title>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chivukula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bailey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Karir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Passive and Active Measurement</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Roughan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Chang</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="218" to="228" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Assessing the effectiveness of domain blacklisting against malicious dns registrations</title>
		<author>
			<persName><forename type="first">T</forename><surname>Vissers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Janssen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Joosen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Desmet</surname></persName>
		</author>
		<idno type="DOI">10.1109/SPW.2019.00045</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Security and Privacy Workshops (SPW)</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="199" to="204" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Blag: Improving the accuracy of blacklists</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ramanathan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mirkovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yu</surname></persName>
		</author>
		<idno type="DOI">10.14722/ndss.2020.24232</idno>
		<ptr target="https://par.nsf.gov/biblio/10205652.doi:10.14722/ndss.2020.24232" />
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>NDSS</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Paint it black: Evaluating the effectiveness of malware blacklists</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kührer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Rossow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Holz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Research in Attacks, Intrusions and Defenses: 17th International Symposium, RAID 2014</title>
				<meeting><address><addrLine>Gothenburg, Sweden</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">September 17-19, 2014. 2014</date>
			<biblScope unit="page" from="1" to="21" />
		</imprint>
	</monogr>
	<note>Proceedings 17</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Reading the tea leaves: A comparative analysis of threat intelligence</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">G</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dunn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pearce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mc-Coy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Voelker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Savage</surname></persName>
		</author>
		<ptr target="https://www.usenix.org/conference/usenixsecurity19/presentation/li" />
	</analytic>
	<monogr>
		<title level="m">28th USENIX Security Symposium (USENIX Security 19)</title>
				<meeting><address><addrLine>Santa Clara, CA</addrLine></address></meeting>
		<imprint>
			<publisher>USENIX Association</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="851" to="867" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Taster&apos;s choice: A comparative analysis of spam feeds</title>
		<author>
			<persName><forename type="first">A</forename><surname>Pitsillidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kanich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Voelker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Levchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Savage</surname></persName>
		</author>
		<idno type="DOI">10.1145/2398776.2398821</idno>
		<idno>doi:10.1145/2398776. 2398821</idno>
		<ptr target="https://doi.org/10.1145/2398776.2398821" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2012 Internet Measurement Conference, IMC &apos;12</title>
				<meeting>the 2012 Internet Measurement Conference, IMC &apos;12<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="427" to="440" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Mis-shapes, mistakes, misfits: An analysis of domain classification services</title>
		<author>
			<persName><forename type="first">P</forename><surname>Vallina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Le Pochat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Feal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Paraschiv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gamba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Burke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Hohlfeld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tapiador</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Vallina-Rodriguez</surname></persName>
		</author>
		<idno type="DOI">10.1145/3419394.3423660</idno>
		<idno>doi:10.1145/ 3419394.3423660</idno>
		<ptr target="https://doi.org/10.1145/3419394.3423660" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM Internet Measurement Conference, IMC &apos;20</title>
				<meeting>the ACM Internet Measurement Conference, IMC &apos;20<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="598" to="618" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<ptr target="https://dgarchive.caad.fkie.fraunhofer.de/welcome/" />
		<title level="m">DGArchive website</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>Fraunhofer</orgName>
		</respStmt>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
