<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Heuristic Algorithms for Influence Maximization in Partially Observable Social Networks</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sebastian</forename><surname>Stein</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Southampton</orgName>
								<address>
									<settlement>Southampton</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Soheil</forename><surname>Eshghi</surname></persName>
							<email>soheil.eshghi@yale.edu</email>
							<affiliation key="aff1">
								<orgName type="institution">Yale University</orgName>
								<address>
									<settlement>New Haven</settlement>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Setareh</forename><surname>Maghsudi</surname></persName>
							<email>setareh.maghsudi@yale.edu</email>
							<affiliation key="aff1">
								<orgName type="institution">Yale University</orgName>
								<address>
									<settlement>New Haven</settlement>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Leandros</forename><surname>Tassiulas</surname></persName>
							<email>leandros.tassiulas@yale.edu</email>
							<affiliation key="aff1">
								<orgName type="institution">Yale University</orgName>
								<address>
									<settlement>New Haven</settlement>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rachel</forename><forename type="middle">K E</forename><surname>Bellamy</surname></persName>
							<email>rachel@us.ibm.com</email>
							<affiliation key="aff2">
								<orgName type="institution">IBM T.J. Watson Research</orgName>
								<address>
									<settlement>Yorktown Heights</settlement>
									<region>NY</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nicholas</forename><forename type="middle">R</forename><surname>Jennings</surname></persName>
							<email>n.jennings@imperial.ac.uk</email>
							<affiliation key="aff3">
								<orgName type="institution">Imperial College London</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Heuristic Algorithms for Influence Maximization in Partially Observable Social Networks</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">843D565917D9E6B0E40B755CD3A93239</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T23:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We consider the problem of selecting the most influential members within a social network, in order to disseminate a message as widely as possible. This problem, also referred to as seed selection for influence maximization, has been under intensive investigation since the emergence of social networks. Nonetheless, a large body of existing research is based on the assumption that the network is completely known, whereas little work considers partially observable networks. Yet, due to many issues including the extremely large size of current networks and privacy considerations, assuming full knowledge of the network is rather unrealistic. Despite this, an influencer often wishes to distribute its message far beyond the boundaries of the known network. In this preliminary study, we propose a set of novel heuristic algorithms that specifically target nodes at this boundary, in order to maximize influence across the whole network. We show that these algorithms outperform the state of the art by up to 38% in networks with partial observability.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Today's vast online social networks connect people across geographical, cultural and organizational boundaries. This offers a tremendous opportunity for quickly disseminating a message to a global audience with only a limited budget. This could be exploited for product marketing, but also for quickly mobilizing large crowds to help during humanitarian crises <ref type="bibr" target="#b14">[15]</ref>, for collecting data through participatory sensing <ref type="bibr" target="#b22">[23]</ref> or for supporting machine intelligence through human computation and crowdsourcing <ref type="bibr" target="#b19">[20]</ref>. However, tapping into this vast distributed processing and information resource requires organizations to effectively disseminate a message (e.g., a call for action or participation) far beyond their usual domain of direct influence (e.g., their existing customers or social media followers). This can be achieved by asking key individuals to spread the message to contacts within their respective social networks, ideally creating far-reaching cascades of repeated messages.</p><p>How to choose such key individuals within a large network has been widely studied in the computer science literature and is typically referred to as influence maximization (IM) <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b2">3]</ref>. This body of work provides models that allow the prediction of network spread, given a fully known network, enabling those who wish to have their messages reach the most people, e.g., advertisers, marketers or public health professionals, to choose the best few people to target with their initial messaging.</p><p>Although IM has attracted a great deal of attention in the past few years, the majority of existing work focuses on known networks, rather than the real-world situation where the full network is not known. In essence, little work considers partially observable networks, where only part of a network is visible to the decision maker. This is especially relevant, for instance, when an organization may be aware of its own members' network links, but has little information about the wider network.</p><p>In this paper, we take the first steps towards addressing this problem. Specifically, we survey the state of the art and propose a new model for IM in partially observable networks. Furthermore, we propose a number of heuristic algorithms that target nodes at the boundary of the known network and, through an empirical evaluation on a realworld network, we show that these can outperform the current state of the art by up to 38% in settings with very limited observability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>In the following we first introduce the general influence maximization problem in fully observable settings and then we detail work that has looked at partially observable settings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">IM in Fully Observable Networks</head><p>In the influence maximization problem, a (fully observable) network is specified along with an activation function that describes the spread of influence. Initially, no individuals in this network are activated, i.e., influenced by the message. Given this, the decision maker picks a fixed number of individuals as seeds. These become activated and spread the message, i.e., activate their neighbors, according to the specified activation function. The aim of the decision maker is to maximize the total number of activated individuals <ref type="bibr" target="#b9">[10]</ref>.</p><p>While solving this problem is NP-hard in general, there are several computationally efficient approximate solutions with performance guarantees <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b2">3]</ref>. Some of these scale to realistic-sized networks with millions of nodes or more.</p><p>This work focuses on partially observable networks, and hence we avoid a detailed discussion of the state-of-the-art IM schemes for fully observable networks. Nonetheless, we describe the state-of-the-art IM algorithms we use in our numerical analysis.</p><p>Currently, the most successful IM algorithms with theoretical performance guarantees <ref type="bibr" target="#b0">[1]</ref> are TIM and TIM+ <ref type="bibr" target="#b18">[19]</ref>, and IMM <ref type="bibr" target="#b17">[18]</ref>. These algorithms are based on the concept of reverse reachable sets <ref type="bibr" target="#b2">[3]</ref>, defined below.</p><p>Definition 1 (Reverse Reachable Set). Let v be a node in some graph G, and g be a graph obtained by removing each edge e in G with probability 1 − p(e). The reverse reachable (RR) set for v in g is the set of nodes in g that can reach v. That is, for each node u in the RR set, there is a directed path from u to v in g.</p><p>Theoretically, it can be shown that if an RR set generated for v overlaps with a node set S with probability ρ, then when we use S as the seed set to run an influence propagation process on G, v will be activated with probability ρ.</p><p>These algorithms first determine how many RR sets need to be generated to provide statistical guarantees for the IM problem. Choosing too many RR sets increases the computational complexity, while choosing too few RR sets affects the algorithm's performance guarantees. TIM/TIM+ predetermine the required number of RR sets and then generate them, while IMM produces RR sets one after another until a certain stopping condition is satisfied. Once the RR sets are generated, a greedy heuristic for the max-cover problem derived from these RR sets is applied to select the seeds, which is guaranteed to at least be within a factor (1 − 1 e ) of the optimal case.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">IM in Partially Observable Networks</head><p>Influence maximization under uncertainty corresponds to a general class of problems, where given limited network information and a fixed budget, the desire is to allocate the budget to a set of seeds, so that the spread of influence is maximized. In particular, the probability of influence among each pair of nodes and/or the network structure might be (partially) unknown. This problem has been investigated in the literature in two general settings:</p><p>One shot: Here, seed selection is performed only once given the available information. For example, robust influence maximization <ref type="bibr" target="#b4">[5]</ref> addresses uncertainty in the edge influence probability, which is represented as an interval in which the true probability may lie. Its goal is to maximize the worst-case ratio between the influence spread of the chosen seeds and the optimal set of seeds.</p><p>Repeated: Here, seeds are selected incrementally, so that sampling/learning approaches can be used to acquire information about the a priori unknown propagation characteristics. These work can be classified into two sub-groups:</p><p>-Bandit-based approaches: In <ref type="bibr" target="#b15">[16]</ref>, a combinatorial bandit approach is used to model IM, where the agent selects a set of nodes as seeds at every round, and incurs a regret due to not selecting the optimal set. Existing regret-minimizing algorithms are used to solve the problem. In another bandit-based IM method <ref type="bibr" target="#b10">[11]</ref>, the seed selection scheme has two phases: (i) Exploration using a conventional influence maximization algorithm and (ii) Exploitation using a bandit algorithm. In influence maximization with semi-bandit feedback <ref type="bibr" target="#b21">[22]</ref>, upon activating the seeds, the learner only observes the influenced part of the network and uses an algorithm based on the well-known UCB strategy to select seeds. This latter algorithm can also accommodate a combinatorial bandit setting, since at every round a set of nodes can be selected.</p><p>-Other approaches: <ref type="bibr" target="#b8">[9]</ref> learn the diffusion probability in an information-cascade model by likelihood maximization, and develop an expectation maximization seed selection algorithm. <ref type="bibr" target="#b16">[17]</ref> investigates a scenario in which seed selection is performed under partial feedback, where at every round the result of past action(s) is not completely revealed. This creates a trade-off between delay (waiting time to see a desired partial outcome) and efficiency of seed selection in terms of spread of influence. This is somewhat similar to the trade-off in influence maximization in changing networks <ref type="bibr" target="#b23">[24]</ref>, which is between recency and accuracy of network information and efficiency of spread. Finally, <ref type="bibr" target="#b12">[13]</ref> proposes a heuristic algorithm which at every round probes a set of nodes which have a high probability of large degrees. All probed nodes are ranked based on their degree, and nodes with the highest ranks are selected as seeds.</p><p>In this work, we explore influence maximization under a type of uncertainty which has not been investigated so far. In our setting, a part (or some parts) of a network is known (e.g., individuals that belong to the decision maker's organization), while the rest is completely unobservable. We investigate the applicability of the state-of-the-art approaches, and propose new heuristic algorithms for seed selection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Model and Problem</head><p>We consider a dense network modeled by a directed graph G(V, E, w) and called an influence network:</p><p>-V = {1, 2, . . . , N } indicates the set of nodes. Each node represents a network entity, for instance an individual. G represents a social network; -E ⊆ V × V is the set of directed edges; w : E → [0, 1] is a weight function, where w(i, j) is the likelihood (possibility) of j being influenced by i. Alternatively and with a slight change of notation, w can be defined as the weight matrix.</p><p>In an influence network, initially a seed set S ⊂ V becomes activated. Every activated node causes its neighbors to become (and stay) activated according to the independent cascade model <ref type="bibr" target="#b7">[8]</ref>. In this model, an activated node i ∈ V might activate each neighbor j with probability w(i, j). In a weighted cascade model, we have w(i, j)</p><formula xml:id="formula_0">= 1 |{(k,j)∈E}| , where |{(k, j) ∈ E}| is the in-degree of j.</formula><p>It should be mentioned that in addition to the independent-and weighted cascade models, the spread of influence through a network can be described by some other models, one of them being the well-known linear threshold model. In this work, however, we confine our attention to the cascade influence propagation model.</p><p>Departing from existing models in the literature, we assume that the network is only partially observable. Specifically, we assume that the decision maker, who selects the seeds, observes only one part of the network, denoted by G = (V , E , w ). Naturally we have V ⊆ V, E ⊆ E(⊆ V × V ), and w : E → [0, 1]. Note that the range of function w includes the weights of the edges belonging to E .</p><p>One specific case of partially observable networks is the organization-partitioned model of network. In such networks, a subset of nodes, O ⊆ V, reveal their neighbors (and possibly the relevant edge-weights) to the decision makers, e.g., because they are members of the decision maker's organization. We denote this subset of nodes as fully observable nodes, while the neighbors of nodes in O (that are not also in O) are denoted as boundary nodes. We assume only nodes in O can be selected as initial seeds. Figure <ref type="figure">1</ref> provides an illustrative example of an organization-partitioned network.</p><p>Concluding this section, we define the general problem of influence maximization in partially observable networks as follows: Given a specific activation function, a seed set size M and the observable part G of a network G, it is desired to find a seed set S ⊆ V (with |S| ≤ M ), so that in the underlying network G, the total number of activated nodes through seeding S is maximized. Fig. <ref type="figure">1</ref>. Illustrative example of an organization-partitioned network, which is a spacial case of partially observable networks. The light nodes are observable. Note that there is only a weak tie between the observable and unobservable parts of the networks, which makes the influence maximization, and thereby the seed selection, particularly challenging.</p><p>In this paper, we confine our attention to heuristic-based IM (described in the following section), and leave the investigation of inference-based IM in partially observable network (where the unseen part of the network is reconstructed using generative models) for our future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Heuristic-based Influence Maximization</head><p>Here the structure of the problem is such that reliable inference is not possible. This might arise, for instance, due to restricted availability of samples or limited computational capacity. In such circumstances, instead of inference, we explore simple rules for selecting seed nodes:</p><p>-Random Selection: In this case, we select the seeds simply at random. -Random with Neighbor Activation: Here, we first select the seeds at random, and then for each seed, we activate one of its neighbors. This approach is based on the friendship paradox <ref type="bibr" target="#b24">[25]</ref>, first discovered by S. L. Feld. It states that on an average basis, most people have fewer friends than their friends have. Thus, if we select some seed at random, it is beneficial to activate one of its neighbors, instead of the original node itself, due to possible larger number of connections. In this heuristic, we first rank the nodes in the known part of the network based on their degree, so that nodes with larger number of neighbors have higher ranks. Then we select node with highest rank as seeds.</p><p>Here we use the intuition that nodes with larger number of connections are very likely to be highly influential. In the settings we considered in this paper, neighbours typically had non-negligible influence on each other. However, in some realworld social networks, users may have large numbers of followers but actually exert little influence on them <ref type="bibr" target="#b3">[4]</ref>. Here, links could be pruned based on historical influence interactions, but we leave this to future work.</p><p>In the weighted version of this approach, we still rank the nodes based on their degree; however, in order to improve the influence probability in the unknown part of the network, we attach a higher weight for neighbours that are in the boundary set. For example, when w = 3, then node 6 in Figure <ref type="figure">1</ref> would have a weighted degree of 4, while node 7 would only have a weighted degree of 3 (since all its neighbours are fully observable). -State of the Art with Limited Visibility: Here we simply use the IM algorithms which are originally developed for known networks, for instance TIM and IMM described in Section 2.1. In doing so, we intend to investigate the extent to which the state of the art is applicable to partially known networks. Similar to the activation based on degree, TIM and IMM can be used in the weighted version. In essence, we run the TIM and IMM in the known part of the network as conventional; however, for selecting the RR sets, we weight the edges with connections to outside the boundary, although such connections are not completely visible to us.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experimental Evaluation</head><p>In order to evaluate influence maximization algorithms in partially observable networks, we implemented a comprehensive benchmarking framework, as shown in Figure <ref type="figure" target="#fig_1">2</ref> (here, dashed lines indicate the use of a probabilistic model, and solid lines indicate data flow). Briefly, the figure shows how a full network is generated, then filtered to a partially observable network. The IM algorithm then selects seed nodes (potentially using inference) and the outcome is simulated. We implemented a range of algorithms for generating synthetic networks and filtering these. However, due to the limited space, we only report on a subset of our results. In the following, we first describe our experimental setup and then outline the results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Experimental Setup</head><p>We use the NetHept<ref type="foot" target="#foot_1">5</ref> dataset, a network of 15k nodes and 31k edges (representing citations within the high energy physics theory community). We choose this because it constitutes a real-world dataset of reasonable size and because it has been widely used to benchmark influence maximization algorithms <ref type="bibr" target="#b17">[18]</ref>.</p><p>Throughout our experiments, we will generate partially observable networks with varying numbers of fully observable nodes (i.e., nodes that are observable and whose neighbours are also observable -see Section 3). We do this by initialising the set of fully observable nodes (O) to contain a number of seed nodes, chosen uniformly at random. In our experiment, we initially choose 4 seed nodes, but our results are similar for other settings. Then, we iteratively add one additional node to O at a time, either by picking a node from O and then adding one of its neighbours that is currently not in O to O (both uniformly at random), or, when no such neighbours exist, by adding a random node to O. This continues until O contains a target number of nodes.</p><p>Again, due to space reasons, we focus on the weighted cascade model (see Section 3), although we have observed broadly similar trends more generally in the independent cascade model (for various edge weight distributions).</p><p>We evaluate most of the heuristics described in Section 3.1. Here, we use Random and Random Neighbour to denote Random Selection (without or with Neighbour Activation). Weighted(1) represents selection based on degree, while Weighted(w) is the weighted version that attaches a relative weight w to boundary nodes. Finally, IMM(1) is the state-of-the-art IMM algorithm described in Section 2, and IMM(w) similarly represents the weighted version. We choose IMM here, due to its consistently high performance compared to other state-of-the-art algorithms <ref type="bibr" target="#b0">[1]</ref>. We leave the evaluation of centrality-based heuristics and other state-of-the-art approaches in partially observable settings to future work.</p><p>To evaluate each influence maximization algorithm, we generate a partially observable network, run the algorithm on it and then simulate the influence spread on the chosen set of seeds 1,000 times to obtain an average spread. We repeat this process 1,000 times (to evaluate a wide range of partially observable networks) and present the overall average spread. The 95% confidence intervals are typically smaller than an absolute deviation of 2 from the average value, so are not plotted on the graphs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Results</head><p>Figures <ref type="figure" target="#fig_3">3 and 4</ref> show the average spread of all algorithms in a setting with five seed nodes, as we vary the network visibility, i.e., the proportion of fully observable nodes (|O| / |V|). Figure <ref type="figure" target="#fig_2">3</ref> focuses on networks with particularly low observability (up to 10%), while Figure <ref type="figure" target="#fig_3">4</ref> shows the full range of proportions. Here, several interesting trends emerge. As expected, the state-of-the-art IMM algorithm performs will throughout all settings. However, looking more closely at cases with low observability (Figure <ref type="figure" target="#fig_2">3</ref>), some of the heuristic approaches outperform it. Specifically, the degree-based heuristics perform consistently well, sometimes achieving an up to 19% higher average spread than IMM. For example, when 1% of nodes are fully observable, IMM achieves an average spread of 112.12 ± 1.78 (with 95% confidence interval), while Weighted(2) achieves 133.19 ± 1.95. <ref type="foot" target="#foot_2">6</ref> This is likely because IMM focuses on maximising its influence only within the known part of the network. In settings with low visibility, this is a highly inaccurate view and the degree-based heuristics offer a better estimate of what nodes are influential across the full network. As visibility rises (also continuing in Figure <ref type="figure" target="#fig_3">4</ref>), this difference becomes less pronounced, and by 10% visibility, they achieve a similar performance. As visibility rises further, IMM(1) eventually achieves the highest performance (from 50% visibility onwards). This is not surprising, since it is known to perform well in fully observable networks.</p><p>Looking specifically at the effect of adding higher weights for boundary nodes to the heuristics, this can lead to a significant increase in the average spread. However, the performance is sensitive to the exact parameter value, and for networks with higher visibility, a high weight can indeed lead to a decrease in performance, and is most pronounced for Weighted <ref type="bibr" target="#b4">(5)</ref> in settings with 20-50% visibility. This is because the boundary nodes actually decrease in importance in the network as more of it is known to the algorithm.</p><p>Note that the relatively good performance of degree heuristics is surprising, as it goes against the distinction between influential and high-degree nodes that has been observed in some settings <ref type="bibr" target="#b3">[4]</ref>. This may be due to the fact that NetHept, as a citation network, has high-degree assortativity [2, Chapter 7], meaning that higher degree nodes are likely to be connected to other high-degree nodes and vice versa. This would mean that targeting a high-degree node can lead to influence spread via its high-degree neighbors (including within the unseen part of the network).</p><p>Promisingly, adding the weight to IMM significantly increases the average spread of IMM -up to 6% in some cases, and IMM(2) consistently performs better or as well as IMM across all settings. However, as for the degree-based heuristic, choosing a very large weight can lead to a decrease in performance for the same reasons as mentioned above.</p><p>Both random heuristics achieve a low level of performance, but using neighbour activation achieves a significantly higher spread than not using it (up to 100% more in some cases). This is potentially promising in settings where very little is known about the network, but where neighbours of known nodes can be activated. Note the apparent decrease in performance for larger networks is due to the way we filter nodes based on their neighbours -more connected nodes are likely to be included in our fully observable set earlier. Thus, when sampling a random node in a network with smaller visibility, it is more likely to be well connected, than when sampling from a network with full visibility. Finally, Figure <ref type="figure" target="#fig_4">5</ref> shows the average spread per initial seed chosen as the number of initial seeds is increased (in a setting with 1% visibility). This highlights that our heuristic techniques achieve the highest performance gains over the state of the art in settings with fewer initial seeds (a gain of up to 38% when there is just a single initial seed).</p><p>Overall, these are promising results, showing that in settings where large parts of the network are not observable and where only few seeds can be chosen, the state-ofthe-art algorithm does not necessarily perform best. Instead simple heuristics perform well, and both those heuristics and the current state of the art benefit from explicitly favouring nodes at the boundary of the known network. It should also be noted that the heuristics are several order of magnitude faster than IMM -a typical run of IMM took about 0.1-0.2 seconds, while the degree-based heuristics typically completed within 0.2-0.3ms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions and Future Work</head><p>In this paper, we presented initial results on influence maximization in partially observable networks. We showed that the current state of the art does not necessarily produce the best results, especially when only small parts of the network are visible. Indeed, we examined some settings where a degree-based heuristic leads to a 38% improvement over the state of the art.</p><p>In future work, we will examine these settings in more detail, to derive more principled inference-based algorithms that work well in a range of settings and that offer some performance guarantees.</p><p>We will look at influence maximization over different types of real-world networks to understand whether the relative performance of our simple heuristic depends on any parameters of the underlying network (e.g., degree assortativity).</p><p>We will then consider the case where the decision maker has access to a generative network model that can be used for inference about the structure of the unseen part of the network (conditional on the observed part) before application of IM algorithms.</p><p>Two specific examples of such generative models include:</p><p>-Sampling: An inference model is used to sample multiple snapshots of the unseen network. Across all snapshots, the average marginal influence of all nodes in V is calculated and a greedy IM approach is applied. -Maximum Likelihood: The model is used to calculate a single maximum likelihood network and an existing IM approach is applied to this.</p><p>Finally, we will investigate repeated settings, as described in Section 2.2, where seeds are selected incrementally and where information about the unknown part of the network is gradually revealed.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Benchmarking framework</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Average spread in partially observable networks for 5 seeds (up to 10% visibility).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Average spread in partially observable networks for 5 seeds (up to 100% visibility).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. Average spread for varying numbers of seeds for 1% visibility.</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017) August 19th, 2017 -Melbourne, Australia</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_1">This can be downloaded at https://www.microsoft.com/en-us/research/ wp-content/uploads/2016/02/weic-graphdata.zip. Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017) August 19th, 2017 -Melbourne, Australia</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_2">A t-test confirms there is a statistically significant difference at the p &lt; 0.0001 level.Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017) August 19th, 2017 -Melbourne, Australia</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under Agreement Number W911NF-16-3-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copy-right notation hereon. The work of S. M. was supported by a post-doctoral fellowship from the German Research Foundation (DFG) under Grant Number MA 7111/1-1.</p><p>Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017) August 19th, 2017 -Melbourne, Australia</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Debunking the myths of influence maximization: An indepth benchmarking study</title>
		<author>
			<persName><forename type="first">A</forename><surname>Arora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Galhotra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ranu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 ACM International Conference on Management of Data</title>
				<meeting>the 2017 ACM International Conference on Management of Data</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="651" to="666" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Barabási</surname></persName>
		</author>
		<title level="m">Network science</title>
				<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Maximizing social influence in nearly optimal time</title>
		<author>
			<persName><forename type="first">C</forename><surname>Borgs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brautbar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chayes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Lucier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms</title>
				<meeting>the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms</meeting>
		<imprint>
			<publisher>SIAM</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="946" to="957" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Measuring user influence in twitter: The million follower fallacy</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Haddadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Benevenuto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Gummadi</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Robust influence maximization</title>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</title>
				<meeting>the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="795" to="804" />
		</imprint>
	</monogr>
	<note>KDD &apos;16</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Scalable influence maximization in social networks under the linear threshold model</title>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 10th International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2010">2010. 2010</date>
			<biblScope unit="page" from="88" to="97" />
		</imprint>
	</monogr>
	<note>Data Mining (ICDM)</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Mining the network value of customers</title>
		<author>
			<persName><forename type="first">P</forename><surname>Domingos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Richardson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the seventh ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="57" to="66" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Talk of the network: A complex systems look at the underlying process of word-of-mouth</title>
		<author>
			<persName><forename type="first">J</forename><surname>Goldenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Libai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Muller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Marketing letters</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="211" to="223" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Prediction of information diffusion probabilities for independent cascade model</title>
		<author>
			<persName><forename type="first">K</forename><surname>Satio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">N</forename><surname>Kimora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Knowledge-Based Intelligent Information and Engineering Systems</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="67" to="75" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Maximizing the spread of influence through a social network</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kempe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Kleinberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tardos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">KDD</title>
		<imprint>
			<biblScope unit="page" from="137" to="146" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Online influence maximization</title>
		<author>
			<persName><forename type="first">S</forename><surname>Lei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Maniu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Senellart</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</title>
				<meeting>the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="645" to="654" />
		</imprint>
	</monogr>
	<note>KDD &apos;15</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Mining knowledge-sharing sites for viral marketing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Richardson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eighth Intl. Conf. on Knowledge Discovery and Data Mining</title>
				<meeting>the Eighth Intl. Conf. on Knowledge Discovery and Data Mining</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="61" to="70" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Influence maximization problem for unknown social networks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mihara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tsugawa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ohsaki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</title>
				<meeting>the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1539" to="1546" />
		</imprint>
	</monogr>
	<note>ASONAM &apos;15</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">On the submodularity of influence in social networks</title>
		<author>
			<persName><forename type="first">E</forename><surname>Mossel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the thirty-ninth annual ACM symposium on Theory of computing</title>
				<meeting>the thirty-ninth annual ACM symposium on Theory of computing</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="128" to="134" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Ushahidi, or &apos;testimony&apos;: Web 2.0 tools for crowdsourcing crisis information</title>
		<author>
			<persName><forename type="first">O</forename><surname>Okolloh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Participatory learning and action</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="65" to="70" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Influence maximization with bandits</title>
		<author>
			<persName><forename type="first">S</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">L</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 International Conference on Neural Information Processing Systems</title>
				<meeting>the 2015 International Conference on Neural Information Processing Systems</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">No time to observe: Adaptive influence maximization with partial feedback</title>
		<author>
			<persName><forename type="first">S</forename><surname>Tang</surname></persName>
		</author>
		<idno>CoRR abs/1609.00427</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Influence maximization in near-linear time: A martingale approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xiao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data</title>
				<meeting>the 2015 ACM SIGMOD International Conference on Management of Data</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1539" to="1554" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Influence maximization: Near-optimal time complexity meets practical efficiency</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 ACM SIGMOD international conference on Management of data</title>
				<meeting>the 2014 ACM SIGMOD international conference on Management of data</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="75" to="86" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Human computation</title>
		<author>
			<persName><forename type="first">L</forename><surname>Von Ahn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICDE 2008. IEEE 24th International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2008">2008. 2008</date>
			<biblScope unit="page" from="1" to="2" />
		</imprint>
	</monogr>
	<note>Data Engineering</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Scalable influence maximization for independent cascade model in large-scale social networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data Mining and Knowledge Discovery</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">545</biblScope>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Influence maximization with semi-bandit feedback</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kveton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Valko</surname></persName>
		</author>
		<idno>CoRR abs/1605.06593</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Coordinating measurements for air pollution monitoring in participatory sensing settings</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zenonos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">R</forename><surname>Jennings</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems</title>
				<meeting>the 2015 International Conference on Autonomous Agents and Multiagent Systems</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="493" to="501" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Influence maximization in dynamic social networks</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Data Mining (ICDM), 2013 IEEE 13th International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1313" to="1318" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">What makes you think you&apos;re so popular? Self-evaluation maintenance and the subjective side of the &quot;friendship paradox</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">W</forename><surname>Zuckerman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Jost</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Social Psychology Quarterly</title>
		<imprint>
			<biblScope unit="page" from="207" to="223" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
