=Paper=
{{Paper
|id=Vol-2975/paper10
|storemode=property
|title=nanoHUB User Behavior: Moving from Retrospective Statistics to Actionable Behavior Analysis
|pdfUrl=https://ceur-ws.org/Vol-2975/paper10.pdf
|volume=Vol-2975
|authors=Gerhard Klimeck,Gustavo Valencia-Zapata,Nathan Denny,Lynn Zentner,Michael Zentner
|dblpUrl=https://dblp.org/rec/conf/iwsg/KlimeckVDZZ19
}}
==nanoHUB User Behavior: Moving from Retrospective Statistics to Actionable Behavior Analysis==
11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019
nanoHUB user behavior:
moving from retrospective statistics to
actionable behavior analysis
Gerhard Klimeck1, Gustavo A. Valencia-Zapata1, Nathan Denny2, Lynn K. Zentner1, Michael G. Zentner1,2
1
Network for Computational Nanotechnology
2
Rosen Center for Advanced Computing
Purdue University, West Lafayette, IN 47907, USA
provide a stable, national-level infrastructure, provide support
ABSTRACT for the offered services, and provide compute cycles for an
nanoHUB annually serves 17,000+ registered users with ever-growing user base.
over 1 million simulations. In the past, we have used data From the very beginning in 1996 [1], the predecessor to
analytics to demonstrate that nanoHUB can be a powerful nanoHUB called Purdue Network Computing Hub (PUNCH)
scientific knowledge sharing platform. We used retrospective was created to enable researchers to share their code without
data analytics to show how simulation tools were used in re-writes through novel web interfaces with end-users in
structured education and how simulation tools were used in education and research. PUNCH was so novel that even the
novel research. With the use of such retrospective analytics, web-server had to be created within the team. By 2004 the
we have made strategic decisions in terms of tool and content standard web-form-interfaces were antiquated and did not
developments and justified continued nanoHUB investments inspire the interactive exploration of simulation results with
by the US National Science Foundation (NSF). As we migrate rapid “What If?” questions that users might have. Users had to
towards a sustainable nanoHUB we must embrace similar download their simulation data to manipulate them in a form
processes pursued by in similar platforms such as Uber or where they can be truly used. nanoHUB was not an end-to-
AirBnB: we need to create actionable data analytics that can end usage platform. It became clear that the system had to be
rapidly support user experience and help grow the supply in the revamped to enable the hosting of user-friendly engineering-
two-sided market platform – we need to improve the use inspired interactive applications. Such interactive sessions
experience of providers as well as end-users. This paper had to be hosted in a reliable, scalable middleware that was
describes some aspects on how we pursue user behavior running in production mode, not as a research paper
analysis inside the virtual worlds of nanotechnology simulation demonstration. 3D dataset exploration had to be supported on
tools. From such user behavior we plan to derive actionable remote, dedicated GPUs that deliver the results to end users.
analytics that influence user behaviors as they interact with
nanoHUB. RAPPTURE, the Rapid APPlication infrastrucTURE
toolkit [2] enabled researchers, who typically did not have any
Keywords— nanoHUB; HUBzero; science gateways; user
graphical user interfaces to their codes to describe the input and
behavior; analytics; cluster; meander; education
outputs of their codes in XML and to generate a GUI. New
middleware [3] enabled 1,000+ users to be hosted
INTRODUCTION AND BACKGROUND simultaneously on a moderate cluster of about 20 compute
nanoHUB is a scientific knowledge platform that has nodes. A novel remote GPU-based visualization system [4]
enabled over 3,500 researchers and educators to share 500+ supported hundreds of simultaneous sessions. nanoHUB
research simulation tools and models as well as 6,000+ lectures established the first community accounts on TeraGrid and OSG
and tutorials globally through a novel cyberinfrastructure. which would execute heavy-lifting nanoHUB simulation jobs
nanoHUB annually serves 17,000+ registered users with over 1 completely transparently on behalf of users who had no
million simulations in an end-to-end user-oriented scientific accounts on these grid platforms [5]. We developed processes
computing cloud. Over 1.5 million visitors access the openly [6] to continually test the reliability of these remote grid
available web content items annually. These might be services to ensure smooth user services. For application
considered impressive summative numbers, but they do not support we developed policies and operational infrastructure
address if the site has any impact or what these users are doing. that enabled tool contributors to support and improve their
tools through question & answer forums and through wishlists.
Understanding these numbers requires some background on As this novel infrastructure emerged in 2005 we observed rapid
the original intentions and cyberinfrastructure developments growth in the simulation user base from the historical numbers
around nanoHUB. Fundamental issues raised by peer- of 500 annual users to over 10,000 in a few years. As
reviewers were the perceived ability of a University project to
Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019
questions of technical feasibility were addressed new questions - How many parameters do users change?
as to actual and potential impact emerged. - How different do researchers, classroom users, and self-
Early-on our peer reviewers raised fundamental questions study users behave?
whether such research-based simulation tools could be used by - How different do different classes behave?
other researchers at all and if these tools could be used in - Does different class instruction material / scaffolding make a
education without specific customizations. The nanoHUB difference?
team developed analytics that documented nanoHUB use - Can we provide feedback to instructors on their classrooms?
research through reference and citation searches in the - Given certain usage patterns inside the tool:
scientific literature. Today we can document over 2,200 papers
that cite nanoHUB and we keep track of the used resources and - Can we improve the tools and provide feedback to the
tools, to provide attribution to the published tools. When we developers?
showed the first 200 formal citations our peers remained There are a variety of different requirements that need to be
unconvinced that this could be good research. We then began met to address some of these questions in a scalable
to track secondary citations, which today sum to over 30,000 infrastructure such as:
resulting in an h-index of 82. - Storage/availability of individual simulation runs within user
Our peers had a similarly strong opinion that research tools sessions
could not be used in education. We therefore developed novel - A data description language that is shared across different
clustering algorithms [7] that documented systematic used of tools
simulation tools in formal education settings. Today we can - A large set of simulation runs and participants
show that over 35,000 students in over 1,800 classes at over - Other user data such as classroom participation, or researcher
180 institutions have used nanoHUB in formalized education identification, geolocation, etc.
settings. We could also measure the time-to-adoption between
In the next Sections we describe some of our first results that
tool publication and first-time systematic use in a classroom.
begin to address some of these questions.
The median time was determined to be less than 6 months.
For our initial study presented here we focus on the user
From the analysis of research use and education use we can
behavior for PN Junction Lab [9] which is consistently one of
begin to qualify the attributes of the underlying simulation
the top 10 nanoHUB tools [10] within any year. Despite our
tools. We found significant use in education and in research codename pntoy the tool is powered by an industrial strength
for many of the nanoHUB tools. These research and education
semiconductor device modeling tool called PADRE [11].
impact studies are documented in detail in Nature Instead of learning the complex PADRE input language that
Nanotechnology [8]. involves gridding, geometry, material and environmental
We used retrospective data analytics to show how specifications, users can easily ask “What if?” questions in a
simulation tools were used in structured education and how toy-like fashion.
simulation tools were used in novel research. We showed that
the transition from research tool publication to adoption in the II SEARCHERS AND WILDCATTERS
classroom is happening rapidly in typically less than six
months and demonstrated through longitudinal data how RAPPTURE provides a rather generic description of
research tools migrate into education. With the use of these simulation tool inputs and outputs. Over 90% of the 500+
retrospective analytics, we have made strategic decisions in nanoHUB simulation tools utilize RAPPTURE as their data
terms of tool and content developments and justified continued description language. With existing simulation logs we can
investments by NSF into nanoHUB. now begin to study the user behavior inside simulation tools.
Each simulation tool typically consists of 10 to 50 parameters
As we migrate towards a sustainable nanoHUB we must that are exposed to the users. Most of these parameters are
embrace similar processes pursued by in similar platforms such freeform numbers such as length, doping, effective mass,
as Uber or AirBnB: we need to create actionable data dielectric constant, temperature etc. with their specific units,
analytics that can rapidly support user experience and help while there is also a significant set of discrete options such as
grow the supply in the two-sided market platform – we need to model or geometry choices. Assuming that each parameter
improve the experience of providers as well as end-users. might have just 10 reasonable choices, then each tool spans a
configurational design space of at least 1010 to 1050. The
II RESEARCH QUESTIONS dimensionality of these tools is clearly too large to be
intuitively understood.
Beyond raw numbers of users and simulations, we have
over the years continued to ask ourselves: How do users We developed a visualization methodology [12] to flatten
behave in their virtual world of a simulation tool? More an N-dimensional space into 2 dimensions. Figure 1 shows the
specifically: conceptual mapping and shows two significantly different user
behaviors. A searcher, who moves through the design space in
- How do they “travel” through the design/exploration world? subsequent steps that appear to indicate a method or a goal. A
- How many individual simulations do they run within one wildcatter who modifies, apparently wildly, the same set of
session? parameters and appears to jump throughout the design space.
11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019
Within the same publication we also documented the III CLASSROOM CLUSTERS
development of a “Searchiness” index that assigns a single To demonstrate our ability to fuse different data sets from
value to the degree a user behaves like a prototypical wildcatter our datastore we pick two different class clusters with
(Searchiness=0) and prototypical searcher (Searchiness=1). significantly different characteristics as depicted in Figure 2.
Class C12 is a class that reoccurred in 15 times between 2008
and 2018. We have pntoy simulation data from 7 classes within
2014 to 2018 for 109 users who ran 180 sessions. Historically,
we do not have the simulation data from all users in that time
frame. Going forward in the future we have developed a
simulation caching system where all Rappture simulations are
stored and users will receive stored solutions if they exist. The
cluster view in Figure 2 shows a subset, the individual class
held in the fall 2015 with 40 students who ran 80 sessions.
C12 only uses pntoy. Class C16 uses 6 different tools
throughout a semester. pntoy is one of these 6 tools used by 20
users in 29 sessions. The visual cluster representation in Figure
2 clearly shows the temporal behavior of 7 users who have
used all 6 tools in the class. In the next section we will
(a) compare the behavior in these classes against all available data
and against all self-study users within the same region (Texas).
(b)
Figure 2: Visual representation of two temporal usage
patterns in two different classes. The horizontal axis
represents time in units of days. The vertical axis stacks
different users within a cluster. 40 users in C12 of fall 2015
use pntoy. C12 only uses pntoy. Class C16 uses 6 different
tools throughout a semester. pntoy is one of these tools used
by 20 users. 7 users utilize all 6 tools as depicted.
IV USER BEHAVIOR DISTRIBUTIONS
Figure 3 shows the Searchiness distribution of all 2,747 geo-
located simulation sessions of pntoy by 1,865 users in the time
frame of 2014-2018. The complete distribution of all runs
shows clear peaks around 0 (wildcatters) and 1 (searchers).
(c)
Figure 1: a) Visual representation of a multidimensional Class cluster C12 is a subset of all the available data consistent
space in two dimensions. b) a prototypical searcher. c) a of 40 users with 80 sessions. This cluster usage uses only a
prototypical wildcatter. single tool in the whole class. Wildcatter behavior appears to
dominate this class C12. In contrast the smaller class that uses
In this paper we show the analysis of a whole user in total 6 tools, including pntoy with 20 users and 28 sessions
population using a specific tool and fuse that data set with shows a distribution that seems to indicate more searchers.
specific classroom users.
Finally, we look at a third population within the users.
These are all the geo-located users in Texas (the location of
C12 and C16) who have not been identified as participants in
11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019
in any classes. We title this group of 20 who ran 36 simulation Next to Searchiness, which is a computed model metric, we
sessions in pntoy as “self-study” users. These users show yet a can also look at a simple raw number, which is the number of
different distribution of Searchiness compared to the other queries each individual has performed within a single tool
populations. session. Within each tool session a user can execute the tool
multiple times and compare results as visualized in Figure 1.
Figure 4 shows the normalized distribution of queries executed
by the 4 different populations we examined in Figure 3. The
number of queries does not reveal much information except
that the overall population runs more queries than the 2 Texas
classes and the Texas self-study users. The classes and self-
study users show a rather strong drop-off for more than the
minimal queries of 4, which is the minimal number of queries
needed to define Searchiness. Initial analysis does not seem to
indicate a strong correlation to between Searchiness and
number of queries.
VI CONCLUSION
We report the development of a nanoHUB infrastructure
that begins to enable the study of user behavior in virtual
worlds of simulation tools. We use the previously published
model index Searchiness and compute it for a complete data set
of simulation sessions within a specific tool. We fuse data sets
of class cluster identification with the model index Searchiness
Figure 3: Normalized Searchiness density for four and number of queries. No surprising results are seen or
nanoHUB populations. All pntoy runs contain 1,865 users critical insight gained at this stage. We observe in the data
with 2,745 sessions. C12 Texas contains simulation data that different user populations appear to behave differently in
from 109 users running pntoy in 180 sessions in 7 classes terms of Searchiness and classes seem to appear similar in
from 2014-2018 (we do not have the simulation data of all terms of number of queries. At this stage the data opens new
users in those classes). C16 is a class that occurred once in vectors for questions such as:
the spring of 2015 and uses 6 tools. 20 users utilized 28 - Do all single-tool classes have similar behavior?
sessions. The Self-Study users populations are all 20 geo-
- Do classes with more diverse tool use or better scaffolding
located users in Texas who ran 36 sessions who were not
foster more search-like behavior?
associated with a formal class in the time fram of 2014-
- Can similar behavior differences be seen with the other tools
2018.
that are used in classes?
- What does a peak in Searchiness value of 0.5 mean? Do we
need to refine the Searchiness index?
- Do we need to identify other behavioral metrics in addition
to Searchiness?
- Do the users who use other nanoHUB material outside the
tools behave differently than the ones that use tools only?
We conclude that this work is a first demonstrator that
indicates that we can assess the simulation behavior of different
user populations inside nanoHUB. We plan to refine these
metrics and classifiers to gain more insights on the user
behavior, and ultimately influence their behavior during use.
ACKNOWLEDGEMENTS
Funding by the US National Science Foundation under Grant
Nos. EEC-0228390, EEC-0634750, OCI-0438246, OCI-
0721680, and EEC-1227110 as well as Purdue University is
gratefully acknowledged.
Figure 4: Normalized number of queries for four REFERENCES
nanoHUB user populations described in Figure 3. [1] N.H. Kapadia, J.A.B. Fortes, M.S. Lundstrom, The Semiconductor
Simulation Hub: A network-based microelectronics simulation
laboratory, Proceedings of the Twelfth Biennial Conference:
11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019
University/Government/Industry Microelectronics Symposium, 1997, [7] Michael Zentner, Nathan Denny, Krishna Madhavan, Swaroop Samek,
IEEE Xplore, DOI: 10.1109/UGIM.1997.616686 George Adams III., Gerhard Klimeck, "Using Automatic Detection and
[2] Michael McLennan (2005), "Add Rappture to Your Software Characterization to Measure Educational Impact of nanoHUB",
Development - Learning Module," https://nanohub.org/resources/240. Proceedings of the 13th Gateway Computing Environments Conference,
September 25-27, 2018, Austin, TX
[3] Michael Mclennan, Rick Kennell, HUBzero: A Platform for
Dissemination and Collaboration in Computational Science and [8] Krishna Madhavan, Michael Zentner, Gerhard Klimeck, "Learning and
Engineering, IEEE Computing in Science and Engineering 12(2):48 – research in the cloud", Nature Nanotechnology 8, 786–789 (2013);
53, 2010 DOI: 10.1109/MCSE.2010.41 doi:10.1038/nnano.2013.231
[4] Wei Qiao, Michael McLennan, Rick Kennel, David Ebert, Gerhard [9] Dragica Vasileska, Matteo Mannino, Michael McLennan, Xufeng Wang,
Klimeck, "Hub-based Simulation and Graphics Hardware Accelerated Gerhard Klimeck, Saumitra Raj Mehrotra, Benjamin P Haley (2014),
Visualization for Nanotechnology Applications". IEEE Transactions on "PN Junction Lab," https://nanohub.org/resources/pntoy. (DOI:
Visualization and Computer Graphics, Vol. 12, Issue: 5, Page(s): 1061- 10.21981/D3GH9B95N).
1068, Sept.-Oct. 2006;doi : 10.1109/TVCG.2006.150 [10] https://nanohub.org/usage/tools provides nanoHUB tool listings ranked
[5] Gerhard Klimeck, Michael McLennan, Sean Brophy, George Adams by various criteria, such as number of users, number of simulations, wall
III., Mark Lundstrom, "nanoHUB.org: Advancing Education and clock time, etc.
Research in Nanotechnology", IEEE Computers in Engineering and [11] Mark R. Pinto, Kent Smith, Muhammad Alam, Steven Clark, Xufeng
Science (CISE), Vol. 10, Issue: 5, Page(s): 17 - 23, Sept.-Oct. Wang, Gerhard Klimeck, Dragica Vasileska (2014), "Padre,"
2008;doi:10.1109/MCSE.2008.120 https://nanohub.org/resources/padre. (DOI: 10.21981/D30C4SK7Z).
[6] Lynn Zentner, Steven Clark, Krishna Madhavan, Swaroop [12] Nathan Denny, Gerhard Klimeck, Michael Zentner, "Visualizing User
Shivarajapura, Victoria Farnsworth, Gerhard Klimeck, "Automated Interactions with Simulation Tool", Proceedings of the 13th Gateway
Grid-Probe System to Improve End-To-End Grid Reliability for a Computing Environments Conference, September 25-27, 2018, Austin,
Science Gateway", Proceedings of TeraGrid 2011 conference. July 18- TX
21, 2011, Salt Lake City, ACM proceedings, ISBN: 978-1-4503-0888-
5;doi:10.1145/2016741.2016789