=Paper=
{{Paper
|id=None
|storemode=property
|title=FreePub: Collecting and Organizing Scientific Material Using Mindmaps
|pdfUrl=https://ceur-ws.org/Vol-698/paper11.pdf
|volume=Vol-698
|dblpUrl=https://dblp.org/rec/conf/swat4ls/DalamagasFMH10
}}
==FreePub: Collecting and Organizing Scientific Material Using Mindmaps==
FreePub: Collecting and Organizing Scientific
Material Using Mindmaps
(or ...where mindmaps meet web search)
Theodore Dalamagas1 , Tryfon Farmakakis2 , and Manolis Maragkakis3 and
Artemis G. Hatzigeorgiou3
1
IMIS Institute/“Athena” R.C., Athens, Greece
dalamag@imis.athena-innovation.gr
2
University of Edinburgh, School of Informatics, Edinburgh, Scotland
T.Farmakakis@sms.ed.ac.uk
3
BSRC Fleming, Vari, GR
(maragkakis,artemis)@fleming.gr
Abstract. This paper presents a creativity support tool, called FreePub,
to collect and organize scientific material using mindmaps. Mindmaps
are visual, graph-based represenations of concepts, ideas, notes, tasks,
etc. They generally take a hierarchical or tree branch format, with ideas
branching into their subsections. FreePub supports creativity cycles. A
user starts such a cycle by setting up her domain of interest using
mindmaps. Then, she can browse mindmaps and launch search tasks
to gather relevant publications from several data sources. FreePub, be-
sides publications, identifies helpful supporting material (e.g., blog posts,
presentations). All retrieved information from FreePub can be imported
and organized in mindmaps. FreePub has been fully implemented on top
of FreeMind, a popular open-source, mindmapping tool.
1 Introduction
Web search engines are widely used for searching information on the Web. Their
increased popularity is due to the following reasons: the search model employed
(i.e., keyword-based) is simple and easy to use, and the search techniques are
nowadays mature enough to support fast text retrieval with accurate results.
However, there are use cases where the information need is complex. Con-
sider, for instance, a researcher that needs to set up her research agenda and
generate innovative ideas. She often has the “big picture” of the domain, i.e.,
an abstraction based on topics, thoughts, and everything else that helps setting
up her search plan to explore the domain. Based on this initial abstraction, she
(a) gathers information from several data sources, (b) organizes the information,
(c) generates hypothesis and scientific results, (c) disseminates those results, and
then (d) starts over by refining her abstraction and search plan. Such a creativity
cycle actually enables discovery and innovation.
To illustrate an example of a creativity cycle, consider a researcher interested
in sequence matching techniques for genomics, and the following use case:
1. The researcher starts by looking for journal papers that make a thorough
review of this particular research area (i.e., the so-called survey papers), and
blog articles that provide a review of the current state-of-the-art technologies
technologies.
2. After organizing and studying the retrieved material, she pays more attention
to the local alignment problem, that is “given a query sequence and a data
sequence, find pairs of similar subsequences chosen from these sequences”.
She finds out that the dynamic programming solutions suggested to deal with
that problem have high computational cost, and that this is the reason for
researchers to work on approximation solutions (i.e., methods to return some
but not all of the alignment results, according to some statistical significance
model). Thus, she starts now looking for papers related to approximate local
alignment.
3. After organizing and studying the retrieved material, she concludes that
those methods, athough efficient, are not appropriate for several cases where
the full result set of alignments is needed. Thus, she starts now looking for
papers that are related to indexing schemes for efficient local alinment. These
approaches exploit data structures which speed up the matching process
between a large data sequences and a query sequence, at the expense of
having to maintain these structures when data changes.
4. At any step of the above creativity cycle, she disseminate her findings to
other researchers to get feedback.
New search models and techniques are necessary to support creativity and
innovation [21]. A critical objective is to support creativity cycles, and also
to provide effective presentation and visualization capabilities for the lists of
retrieved resources that will guide users during their search and exploration.
Mindmapping [5, 10] makes use of visual diagrams to capture and organize
information. They generally take a hierarchical or tree branch format, with ideas
branching into their subsections. Mindmapping elements include concepts, ideas,
notes, tasks, etc. One can use mindmaps to summarizing information, consol-
idating information from different research sources, thinking through complex
problems, and presenting information showing the overall structure of her topic.
Mindmaps is an excellent model for visualize, structure, and classify ideas, and
support creative thinking.
This paper presents a creativity support tool, called FreePub, to collect and
organize scientific material using mindmaps. FreePub supports creativity cycles,
assisting users to:
1. set up their domain of interest using mindmaps,
2. browse mindmaps and launch search tasks to gather relevant documents
from several data sources,
3. identify supporting material for those documents (e.g., blog posts, presenta-
tions), and
4. import and organise all retrieved information in mindmaps.
FreePub (http://web.imis.athena-innovation.gr/projects/mm/) has been built
on top of Freemind [12], a popular open-source, mindmapping tool.
Outline. In the next section we give an overview of FreePub architecture, and
we discuss the related work. Section 3 describes mindmaps. Section 4 presents
the search facilities of FreePub, and Section 5 describes the semantic query
expansion mechanism. Section 6 discusses a test case for FreePub, and, finally,
Section 7 concludes the work.
2 Overview and Related Work
In this section we give a brief overview of tool features and technologies used,
and we discuss the related work.
Figure 1 shows the architectute of FreePub. FreePub has been implemented
on top of FreeMind [12]. Freemind provides an excellent user-friendly editor to
build mindmaps. Users exploit mindmaps to set up their knwoledge domain, and
collect and organize scientific material retrieved from several data sources. The
search orchestrator module is responsible for launching vertical and horizontal
search tasks, and coordinate their operation in order to retrieve publications and
supporting material. The semantic query expansion module provides intelligent
retrieval facilities by enriching user queries with terms extracted from mindmap
elements to improve search effectiveness. The data cleaning module processes the
result lists to remove name ambiguities and inconsistencies, and also to remove
duplicate results. FreePub maintains a database of conference/journal info to
assist cleaning tasks. The facet-based browsing module provides visualization
options using several information facets to present the results. Finally, the MM
element construction module is responsible for transfering the result lists into
the mindmaps, according to user needs.
The use of mindmaps in information retrieval tasks has been acknowledged
by several researchers. In [2], the authors present how information retrieval on
mind maps could be used to enhance expert search, document summarization,
keyword based search engines, document recommender systems and determining
word relatedness.
Also, [3] describes how one can use mindmaps to succesfully model, design,
modify, import and export XML DTDs, XML schemas and XML dooc, getting
very manageable, easily comprehensible, folding diagrams. They actually con-
verted a general purpose mind-mapping tool into a very powerful tool for XML
vocabulary design and simplification. Finally, SciPlore MindMapping [1] is the
first mind mapping tool focusing on researchers needs by integrating mind map-
ping with reference and pdf management. SciPlore MindMapping offers all the
features one would expect from a standard mind mapping software, plus the
following special features for researchers: adding reference keys, PDF bookmark
import, and monitoring folders for new pdfs.
Compared to the above works, FreePub provides a full-fledged retrieval ser-
vice to collect scientific material using mindmaps. It retrieves not only relevant
publications, but also supporting material, like blog posts, presentation slides,
from several wrapped data sources. Also, it exploits a semantic query expansion
12$34!35#5 4#&$*$$-6 H+5E$( I @-$$H+5
?$52/-<$*&3*"-/,,$-
7+$-8 !"#$%'#" !+,,)-&
()"* ./&$-#/0
4)%'=C)+-%/03AD
!$./%"37+$-8
A/&/340$/%#%B
9:,/%*#)%
E#%(./,*
!$/-"23)-"2$*&-/&)-;
<$-"/0=2)-#>)%&/03*$/-"2 @-$$.#%(
@/"$&F5/*$( EE3$0$.$%&*
5-)G*#%B ")%*&-+")%
Fig. 1. FreePub’s architecture.
mechanism to enrich user queries with mindmap element terms for improved
search effectiveness.
There are also several open source (e.g., Vue, XMind, Compendium4 ) and
commercial tools (e.g., MindManager, ConceptDraw, iMIndMap5 ) for mindmap-
ping. However, they are actually mindmapping editors, providing advanced vi-
sualization capabilities, document handling and integration facilities with other
popular software suites. Neither of them exploits mindmaps as a means for ex-
ploration Web search, giving also intelligent query expansion mechanisms, like
FreePub does.
3 Mindmapping
Mindmapping [5, 10] refers to graphical representations of elements such as con-
cepts, ideas, notes, tasks, or other items related to a topic of study. Mindmapping
elements are organized in hierarchical branches or groups according to the se-
mantic interpretation given by the user. However, everything is built around a
central topic or idea. The key feature of mindmapping is that the elements are
arranged in a non-linear fashion. Thus, users are free to enumerate and connect
concepts without a tendency to begin within a particular conceptual framework.
4
http://vue.tufts.edu/, http://www.xmind.net/, http://compendium.open.ac.uk/
5
http://www.mindmanager.com, http://www.conceptdraw.com,
http://www.thinkbuzan.com
This encourages a brainstorming approach to planning and organizational tasks,
and idea generation.
Mindmaps is an excellent model for setting up workspaces for internet search,
project and task management (including links to necessary files, executables,
source of information), knowledge base organization (notes, references), and es-
say writing and brainstorming. They allow for greater creativity when recording
ideas and information, and help the note-takers to associate topics and ideas
with visual representations.
A key difference between mindmaps and other graph-based formal modelling
representations, e.g. UML, semantic networks, TopicMaps, is that the the latter
have explicit structured elements to model relationships. Contrary, mindmaps
rerpesent the visual mnemonics of users, exploiting colors, icons and informal
visual representations. Visual methods like mindmaps have been used for cen-
turies in learning and problem solving by educators for recording knowledge,
visual thinking, and problem solving. Also, mindmaps are based on radial hier-
archies showing connections with a centered ruling concept.
Freemind [12] provides a user-friendly editor to build mindmaps. Table 1
presents the most important mindmap elements used by Freemind. Figure 2
shows a mindmap example, organizing information about microRNA entities (see
also Section 6). In this mindmap, for example, microRNA is the central idea where
all other elements are structured around. microRNA targets and microRNA
transcripts are topic elements, while microRNA target prediction is a sub-
topic element. The text “miRNA incorporate into the RNA-Induced...” is a detail
element.
Topic, Larger topic: main Waiting topic: a topic that
elements, arranged in a needs to be reconsidered
topic/subtopic fashion, to
represent ideas
Needs action: an element for Hot: a critical element
which action is needed
Detail: text content element Link: direct link element to user
(e.g., notes, abstracts, etc) folders, urls, or local files
Object (keywords): set of Object (code): piece of code for
words used as keywords for an el- an element
ement
Question: issues that need to be Cloud: set of related elements
considered for an element
Table 1. Mindmap elements in FreeMind.
4 Searching facilities
As the user explores a mindmap, she can initiate a search task to retrieve,
from several wrapped data sources, documents relevant to mindmap topics. Var-
ious search parameters can be determined, like the number of results, the data
Fig. 2. A mindmap example.
sources used, etc. For each search task, FreePub starts the retrieval service by
first formulating the necessary queries. Keywords are extracted from the content
of mindmap elements selected by the user in order to form keyword queries to
send to the data sources. A key feature of FreePub is a semantic query expan-
sion mechanism used to extract keywords not only from the selected mindmap
elements, but also from their semantic neighbourhood. We discuss this feature in
detail later on, in Section 5.
Vertical search. Keyword queries are sent to all wrapped data sources to re-
trieve relevant documents. Such data sources usually provide vertical search fa-
cilities, i.e., tailored to certain types of information resources - in our case, com-
puter science publications (e.g., DBLP, PubMed [8, 19]). FreePub wraps data
sources using WebHarvest [22]. We discuss wrapping facilities later on.
The resulting snippets are extracted from the data sources, cleaned, and
presented to the user. Cleaning includes several facilities used to process the
results in order to remove ambiguities, inconsistencies, etc. Specifically, the sys-
tem utilizes a catalog with journal names and conferences extracted from DBLP
and PubMed [8, 19] to deal with name inconsistencies. Each journal/conference
name in the snippets is matched again this catalog to determine a common name
for all snippets. The catalog actually maintains two string values for each jour-
nal/catalog entry: a short string for the acronym and a long one for the title of
the entry.
Matching is based on the Levenshtein distance [14] L between two strings.
The Levenshtein distance is defined as the minimum number of edit operations
needed to transform one string into the other, with the allowable edit opera-
tions being insertion, deletion, or substitution of a single character. For example,
L(“VLDD”, “VLDB Conf”)= 6: replace ‘D’ with ‘B’, and insert ‘ ’, ‘C’, ‘o’ ‘n’
‘f’, a total number of 6 operations.
Assuming a string s and a catalog of n entries {(a1 , t1 ), (a2 , t2 ), . . . , (an , tn )}
with pairs of acronyms ai and titles ti , s is matched to the entry (ai , ti ) such
that L(s, ai ) + L(s, ti ) is minimized (0 < i ≤ n). For example, “Very Large
Database Conf” and “VLDB Conf”, both are matched to (“Very Large Databa-
se Conference”, “VLDB”) catalog entry.
Duplicate elimination. Since results are retrieved from several data sources,
duplicate results may appear. Duplicates are removed using entity resolution
blocking techniques [23]. The problem of entity resolution involves finding records
in a dataset that represent the same real-world entity. Blocking techniques divide
data into groups and only compares records within the same group, to avoid
redundant comparisons. This is based on the assumption that records in different
blocks are unlikely to match.
FreePub implements the following efficient strategy for entity identification
and duplicate elimination:
1. The result list of each data source is partitioned into groups, using the publi-
cation date as key for each group. For each group we maintain a (key→value)
hash structure H, where key is the date and value is the list of publica-
tion objects oi . For example: H1 = (2004 → {o1 , o3 , o5 , o6 }), H2 = (2005 →
{o2 , o4 }) for data source 1, H3 = (2004 → {o1 , o5 , o8 }) for data source 2, etc.
2. Then, to identify duplicates we check pairs of publication objects (oi , oj )
only for objects than share the same key (date). Checking is done using exact
string matching on publication title and publication forum. For instance, in
the previous example, only pairs of publication objects from H1 value list
and H3 value list will be checked.
Horizontal search. After retrieving docuements relevant to mindmap elements,
the user may launch another search task to get supporting material for these
documents. Such material includes blog posts discussing the topic of a docu-
ment, related presentations, other reports etc. To detect the material, FreePub
uses horizontal search facilities, i.e., general search engines that cover all the
Web, and appropriate options to restrict searches to only certain type of docu-
ments. Specifically, FreePub searches for the following support material for each
retrieved publication:
1. pub document: a query string is constructed from publication’s title, and
the filetype:pdf or doc option is used in order to retrieve results. Further
heuristic rules are used in order to certify that the retrieved result is indeed
the document of the publication. E.g., we parse the retrieved documents and
check whether the title of the publication appears in, etc.
2. pub abstract: the abstract is extracted either by parsing the document iden-
tified in 1. or by looking for the appropriate metadata fields in the data
source used, since several data sources provide such information.
3. slide presentation: a query string is constructed from publication’s title, and
the filetype:ppt or pdf option is used in order to retrieve results. Further
heuristic rules are used in order to certify that the retrieved results are indeed
presentations. E.g., we parse the retrieved documents and check whether
certain terms appear inside, e.g., the term“outline”, terms from the sections
of the document identified in 1., etc.
4. blog entries: a query string is constructed from publication’s title along with
author’s name and issued to the Google Blogs Search Engine to retrieve
results.
Wrappers. FreePub retrieves scientific documents from several data sources,
e.g., the collection of Computer Science Bibliography [7], citeseerX [6], and
PubMed [19]. New data sources can be easily integrated. FreePub wraps data
sources using WebHarvest [22], a Web scraping tool that (a) captures data
source search capabilities, and (b) simplifies Web information extraction from
data sources. WebHarvest provides several types of processors (e.g., html-to-
xml, xpath, etc) to define a sequence of extraction operations on Web pages and
identify the required html parts easily.
To demonstrate how WebHarvest work, we show the part of the html source
of the first three results returned from google blog search for the term “ubuntu”.
...1st result
How To Upgrade Ubuntu 10.04 (Lucid Lynx) To 10.10 (Maverick Meerkat)
(Desktop; Server)
...2nd result
HowtoForge - Linux Howtos and Tutorials - -
http://www.howtoforge.com/
...3rd result
Latest Ubuntu 10.10 Emphasizes the Cloud - ReadWriteCloud