=Paper= {{Paper |id=None |storemode=property |title=Validation of a Data Mining Method for Optimal University Curricula |pdfUrl=https://ceur-ws.org/Vol-646/DERIS2010paper7.pdf |volume=Vol-646 }} ==Validation of a Data Mining Method for Optimal University Curricula== https://ceur-ws.org/Vol-646/DERIS2010paper7.pdf
                              VALIDATION OF A DATA MINING METHOD
                               FOR OPTIMAL UNIVERSITY CURRICULA

                       R. Knauf ∗                                            Y. Sakurai, K. Takada, S. Tsuruta

      Ilmenau University of Technology                                          Tokyo Denki University
         Faculty of Computer Science                                       School of Information Environment
               and Automation                                                  2-1200 MuZai Gakuendai
       PO Box 100565, 98684 Ilmenau                                             Inzai, Chiba, 270-1383
                   Germany                                                                Japan


                        ABSTRACT                                     include both logical consistency issues and formally to
The paper deals with modeling, processing, evaluat-                  check didactic issues. According to different learning
ing and refining processes with humans involved like                 and teaching preferences, it includes alternative paths
learning. A formerly developed concept called story-                 and possible detours if certain concepts to be learned
boarding has been applied at Tokyo Denki University                  need reinforcement. Using modern media technology,
to model the various ways to study at this university.               a storyboard also plays the role of a server that provides
Along with this storyboard, we developed a data min-                 the appropriate content material.
ing technology to estimate success chances of curric-                     By storyboarding, didactics can be refined accord-
ula. Here, we introduce a validation method for this                 ing to revealed weaknesses and proven excellence.
technology and its results. Further, we discuss chances              Successful didactic patterns can be explored by apply-
to improve these results by implementing a formerly                  ing data mining techniques to the various ways stu-
introduced learner profiling concept that represents the             dents went through a storyboard and their related suc-
students’ individual properties, talents and preferences             cess. As a result, future instructors and students may
for personalized data mining.                                        utilize these results by preferring those ways through
                                                                     a storyboard, which turned out to be the most promis-
   Index Terms— modeling learning processes, sto-                    ing ones. In [2], a data mining technology, which al-
ryboarding, data mining, validation                                  lows students to utilize mined ”experience” of former
                                                                     students to compose curricula with an optimal success
                  1. INTRODUCTION                                    chance, is introduced.
                                                                          However, so far we did not have a practically
Learning systems suffer from a lack of an explicit and               proven significance, that this method is appropriate.
adaptive didactic design. University education is es-                The basic problem so far was the collection of data,
pecially effected by this lack, because university pro-              which has to be accumulated during a complete un-
fessors are not necessarily educational experts. One                 dergraduate study, which needs a period of four years.
way of didactic support is providing a modeling con-                 Meanwhile, we could gain a significant amount of data
cept for didactic design, which allows the anticipation              to validate the technology.
of the learning processes.                                                The paper is organized as follows. Section 2 in-
    An explicit formal didactic design provides a firm               troduces the storyboard concept including the present
basis to verify and validate the didactics behind a learn-           state of the current development. Section 3 provides an
ing process by knowledge engineering techniques such                 overview on our data mining technique to compose op-
as machine learning and data mining. A modeling                      timal curricula for university studies. In section 4, we
concept called storyboarding [1] has been developed                  describe the available data. Section 5 introduces the
formerly as a means of modeling learning processes.                  validation technology and provides its results. In sec-
Besides providing didactic support, this semi-formal                 tion 6, we outline a refinement of the technology and
model is setting the stage to apply knowledge engi-                  section 7 summarizes the paper.
neering technologies to verify and validate the didac-
tics behind a learning process. The verification may
                                                                                    2. STORYBOARDING
  ∗ This author performed the work while at Tokyo Denki University

and was sponsored by the Japan Society for the Promotion of Sci-
                                                                     Our storyboard concept was introduced in [1] und later
ence (JSPS) with an Award-Fellowship for Rainer Knauf (Fellow’s
ID S-08742) and the Research Institute for Science and Technology    refined (see [2] for the latest version). A storyboard
of Tokyo Denki University.                                           is a nested hierarchy of directed graphs with anno-
tated nodes and annotated edges. Nodes are scenes or          their study [4][2] based on the success of former stu-
episodes. Scenes are not further structured, episodes         dents, who went a similar path through their study.
have a sub-graph as its implementation. Also, there is             For this purpose, we introduced a concept to esti-
exactly one start node and one end node in each graph.        mate success chances of curricula, which are composed
Edges specify transitions between nodes and may be            by students at the School of Information Environment
single-color or bi-color. Nodes and edges can carry at-       of the Tokyo Denki University in their curriculum plan-
tributes.                                                     ning class in the first semester. Along with the estima-
     A storyboard may be seen as a model of an antici-        tion, the students also receive (1) a significance of the
pated reception process that is interpreted as follows.       provided estimation statement (according to the suffi-
     Scenes denote a non-decomposable learning activ-         ciency of the available data) and (2) a recommendation
ity that can be implemented in any way, e.g. by the pre-      for modifications of their plan with respect to an opti-
sentation of a (media) document, opening a tool that          mal success chance.
supports learning (an URL or an e-learning system) or              For such curricula we developed a data mining
an informal activity description. Episodes are defined        technique, which is applied to storyboard paths that
by their sub-graph. Graphs are interpreted by the paths,      (former) students went. Based on these examples, the
on which they can be traversed.                               success chance of intended paths can be estimated [2].
     A start node of a graph defines the starting point            The data mining technique is applied to the paths of
of a legal graph traversing. An end node of a graph           students through a storyboard, which anticipates possi-
defines the final target point of a legal graph traversing.   ble ways through a complete study.
     Edges denote transitions between nodes. There are             In a pre-processing step to determine the paths, the
rules to leave a node by an outgoing edge, namely (1)         individually visited items (episodes and scenes) in the
The outgoing edge must have the same color as the in-         storyboard graph-hierarchy are “flatten down” to a big
coming edge by which the node was reached and (2) If          graph that contains scenes only. This is performed by
there is a condition specified as the edge’s key attribute,   systematically replacing episodes by the individually
this condition has to be met for leaving the node by this     visited items of the episode’s related sub-graph.
edge. So the colors express the dependence of ways                 In the granularity of this storyboard application, a
leaving a node from the way of arriving there.                scene is a course that holds over one semester. As a
     Key attributes of nodes specify application driven       result, we have a linear list of course sets, in which
information, which is necessary for all nodes of the          each list item is the set of courses that the student took
same type, e.g. actors and locations. Key attributes          in the subsequent semesters.
of edges specify conditions, which have to be true for             The technique consists of two steps, namely (1)
traversing on this edge. Free attributes specify what-        constructing a decision from the examples of former
ever the storyboard author wants the user to know:            students and (2) applying this decision tree to the
didactic intentions, useful methods, necessary equip-         planned curricula.
ment, e.g. For further information, the reader may see             The decision tree is based on the concept of
[3] or [4].                                                   bundling common starting sequences of the various
                                                              paths to a node of the tree. Different subsequent fol-
                                                              lowing (next) nodes of the paths will result in different
   3. CURRICULUM VALIDATION BY DATA                           sub-trees right below the actual root on the last node of
               MINING                                         the common starting sequence.
                                                                   This continues for each lower level sub-tree accord-
A basic objective of storyboarding is to use knowledge        ingly. If there are different paths with a common start-
engineering technologies on the (semi-) formal process        ing sequence from the root to the actual root different
models [3] [4].                                               in the next (subsequent) nodes, related sub-trees will be
    In particular, we aim at inductively “learning” suc-      established.
cessful storyboard patterns and recommendable paths.               The utilization or application of this decision tree is
This is some sort of meta-learning, i.e. the learning of      performed as follows.
learning knowledge. It is performed by an analysis of              If a submitted path is already represented in the de-
the paths where former students went through the sto-         cision tree, the prediction or estimation is very easily
ryboard [2].                                                  done through presenting the average Grade Point Av-
    To show the feasibility and benefit of high level         erage (average of a numeric performance metric of a
storyboarding for its qualified assistance of students        student over all subjects, weighted by the number of
suffering from the “jungle of opportunities and con-          each subject) that students gained, who went exactly
straints” in university education, we developed a simple      this paths, too.
prototype storyboard for curricula of a university study.          In the other case, the longest leading (starting and
    This prototype is used to validate curricula, which       its succeeding) part in common with the path represent-
are created or modified by the students in advance of         ing the submitted curriculum plan will be identified and
               code    subject                                   • the achieved results (with light blue back-
                  1    Advanced Project A                          ground), i.e. the mark (columns m: S, A, B,
                  2    Advanced Project B                          C, D, or E) and the number of grade points
                  3    Agent Technology                            (columns GP: 4, 3, 2, or 0)
                  ..   ..
                   .    .
                155     Workshop                             are listed up.
                                                                  The last row contains a weighted (by the number of
                  Table 1. Subject list
                                                             units) grade point average GPA, which quantifies the
                                                             degree of success in the study. Again, both the subject
the average GPA of all students’ paths in the sub-trees      lists of the students within a semester and the complete
that start from that point, will be presented as a success   students’ samples (which are lists of lists), are sorted
estimation. Additionally, the degree of similarity and       by subject code. The bars between the paths show,
a recommended change of the submitted path will be           up to which semester the curricula of adjacent students
presented. T he data mining technology is described          are identical (circles) respectively from which semester
more detailed in [2].                                        they are different from each other (bullets). Thus, the
                                                             grey bars separate the sub-trees from each other.
            4. DATA PREPROCESSING                                The entire table has 42 columns and 1616 rows.
                                                             Figuratively spoken, the table illustrates the decision
We collected 188 individual storyboard paths of stu-         tree in a horizontal direction wit the root being on the
dents, who studied Information Environment at the            very left hand side and the leaves being on the very
School of Information Environment of Tokyo Denki             right hand side. The grey bars separate sub-trees from
University from 2005 till 2009.                              each other.
    From these samples, we removed two samples of                Before applying the validation technology, we
students, who joint the university after taking several      found some “exotic samples” of students, who are not
semesters elsewhere, because their marks were derived        representative. This applies to those students, who
by recognition of marks received in similar subjects at      never finished their study (as this was the case with
another university. This led to 186 samples.                 students 8, 11, 59, 97, 113, 118, 121 and 153) and re-
    After collecting and studying all the samples and        moved them because of incomplete data, i.e. 177 sam-
organizational material rules to compose a curriculum,       ples left. As a “learning curve”, in future validations,
which was available in Japanese only, we chose a com-        we will leave at least those “dead end” paths in the set,
pact data representation by coding the particular sub-       which are caused by a lack of performance.
jects and the particular students. Table 1 shows an ex-          Our validation technology uses an example set to
tract from the subject coding list.                          construct a decision tree and a test set to check its per-
    By using subject codes 1-155 and student IDs 1-          formance. Both the example set and the test set are
186, we composed a complete decision tree from the           recruited from the given samples.
186 samples.                                                     Those storyboard paths, which are unique and do
    To make sure that identical starting sequences of        not have anything in common with any other path, are
semester curricula really end up in the same path, the       not appropriate for such a technology, because the test
decision tree is well sorted: (1) the subject sequence       set origins from the same source of data. If the test set
within a semester is sorted by ascending subject codes       contained samples that do not have anything in com-
and (2) the students samples are sorted by the code lists,   mon with any path of the decision tree, any data mining
which are, compared element by element, ascending,           can not really work because of missing data.
too. We adopted this technology from a similar tech-
nology, which is usually performed in data mining for            In practice, our data mining technology degenerates
item lists to efficiently generate association rules.        to merge all paths of the decision tree and provides the
    Figure 1 shows an extract of the decision tree com-      average degree of success of all former students.
posed by all the samples. For each student (coded by             Since this is not really a result of data mining, we
his/her ID),                                                 excluded such paths, which led us to 104 remaining
                                                             paths, which are used to validate the technology.
   • each semester (columns s, with yellow-brown
                                                                 For practical use in the success estimation of new
     background),
                                                             paths submitted by students, however, we kept these
   • the subjects (courses, columns c with light green       73 “lonely” paths, of course, because new paths may
     background),                                            be similar to them as well. In fact, any new path is
                                                             ”lonely” when somebody goes it the first time, before
   • their number of units (columns u with light yel-        it may gain popularity and grow evolutionary towards
     low background) and                                     a sub-tree.
                                                                                                                                                 G
 I                  G                 G                 G                 G                    G                G               G               G
     s    c   u m     s    c   u m      s   c   u m       s   c   u m       s     c     u m      s    c   u m     s   c   u m     s   c   u m    P
 D                  P                 P                 P                 P                    P                P               P               P
                                                                                                                                                 A
 5   1 11 3    A    4 2 29 4 A 4 3 21 2             B   3 4     9 2   C   2 5 10 4         A   4 6 13 4 A       4 7 1 4 A 4 8         2   4 A 4 3,48
       17 4    B    3    49 4 S 4   30 4            A   4      14 2   A   4    12 4        A   4    20 2 A      4   84 2 S 4
       26 2    B    3    92 4 C 2   32 3            C   2      35 3   B   3    14 2        A   4    70 2 S      4
       36 1    A    4    96 3 C 2   50 4            A   4      41 3   S   4    19 2        A   4   105 2 A      4
       58 2    A    4   116 3 A 4   57 3            S   4      64 3   C   2    87 3        B   3   140 3 A      4
       94 2    B    3   130 2 A 4   73 3            B   3      75 3   B   3    99 2        A   4   153 2 B      3
       129 2   C    2              148 2            B   3      82 3   B   3   120 3        S   4
       155 1   S    4                                         141 2   B   3   124 2        A   4

157 1 11 3     S        2 29 4    S   4 3    21 2   A   4 4   9 2     B   3 5 10 4         A   4 6 13 4 A       4 7   1 4 A 4 8       2   4 A 4 3,72
      17 4     A    4      49 4   A   4      30 4   C   2    35 3     B   3    12 4        A   4    70 2 A      4
      26 2     A    4      92 4   S   4      32 3   C   2    41 3     S   4    19 2        A   4    79 3 A      4
      36 1     A    4      96 3   A   4      50 4   A   4    64 3     A   4    24 2        B   3   140 3 S      4
      58 2     C    2     116 3   A   4      57 3   S   4    75 3     B   3    63 3        A   4   152 2 A      4
      94 2     B    3     130 2   A   4      73 3   A   4    82 3     B   3    87 3        A   4   153 2 B      3
      129 2    A    4                       148 2   A   4   141 2     A   4   120 3        S   4
      155 1    A    4                                       143 2     A   4

 47 1 11 3     A    4 2 29 4 B 3 3 30 4             C   2 4   9 2     C   2 5 10 4         B   3 6 13 4 S       4 7 33 4 S 4 8 34 4 S 4 3,31
      17 4     A    4    49 4 S 4   32 3            B   3    35 3     C   2    12 4        A   4    70 2 B      3   84 2 S 4
      26 2     A    4    92 4 C 2   50 4            A   4    41 3     A   4    19 2        A   4    79 3 A      4
      36 1     A    4    96 3 S 4   57 3            S   4    64 3     C   2    63 3        B   3   140 3 A      4
      58 2     A    4   116 3 B 3   73 3            A   4    75 3     C   2    87 3        A   4   152 2 B      3
      94 2     C    2   130 2 A 4  111 2            B   3    82 3     C   2   120 3        A   4   153 2 B      3
      129 2    A    4              148 2            B   3   141 2     D   0   124 2        B   3
      155 1    A    4                                       143 2     B   3

 56 … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 3,90




                                                Fig. 1. Extract from the decision tree data


         5. VALIDATION TECHNOLOGY AND                                                 stud. ID       GPA        GPA estimation        difference
                     RESULTS                                                                89       3.40           3.23                 0.17
                                                                                          148        3,04           3,26                 0,22
There are several approaches to validate data mining                                      179        3,30           3,24                 0,06
technologies.                                                                               92       3,55           3,63                 0,08
    The holdout method splits the data into a training                                    178        3,91           3,40                 0,51
set and a test set, typically in the ratio 2/3 by 1/3. The                                164        3,29           3,71                 0,42
data mining technology is applied to the training set                                     177        3,52           3,60                 0,08
and validated with the test set. This method suffers                                          ..       ..             ..                   ..
                                                                                               .        .              .                    .
from the fact that it does not use the available data ex-
haustively. A sample, which is in the test set, is not                                               Table 2. Validation results
available for building the model (the decision tree, in
our case) and thus, decreases the performance of the
model. Thus, some performance features of the data                              whereas the the other k − 1 sets is used for training.
mining technology may not be revealed by such a test-                                The leave one out approach is a special case of
ing method. The splitting ratio is a trade off between                          cross validation with k being the number of data ob-
the quality of the model and a trustable statement about                        jects and makes the most exhaustive use of the data.
the performance of the data mining technology.                                       Finally, we used this approach to validate our data
    Random sub-sampling is a refinement of this                                 mining technology. In 104 cycles, we removed one
method, which is a repeated holdout with various splits                         path from the complete decision tree and used this sam-
of the available data and thus, uses the data a little more                     ple to check the remaining decision tree.
exhaustively. However, there is no control on the issue,                             As a result, we received a list of all the 104 samples
how often a data object is used for building the model                          along with their original GPA and the GPA as estimated
and how often it is used for test.                                              by the data mining technology as shown in Table 2. The
    A more exhaustive utilization of the available data                         mean of the difference between both was 0.43 with a
is done by cross validation. Here, each data object                             standard deviation of 0.30.
is used for training with the same frequency and for                                 Having in mind that this result is just based on a sta-
test exactly once. The data set is split into k equally                         tistical analysis of former students’ curricula and their
sized subsets. In k cycles, each subset is used for test,                       related success, an average error of 0.43 grade points is
not too bad and promises remarkable results, when the               attri-   attribute description           value range
learner’ individual characteristics are also included in            bute
the data mining technology.                                         d1       Linguistic intelligence          0 ≤ v1 ≤ 1
                                                                    d2       Logical-mathematical             0 ≤ v2 ≤ 1
  6. PERSONALIZED DATA MINING AND ITS                                        intelligence
             REALIZATION                                            d3       Musical intelligence             0 ≤ v3 ≤ 1
                                                                    d4       Bodily-kinesthetic intelli-      0 ≤ v4 ≤ 1
Individual learning plans should not only be based on                        gence
the success of former students who went similar ways.               d5       Spatial intelligence             0 ≤ v5 ≤ 1
Additionally, individual properties, talents and prefer-            d6       Interpersonal intelligence       0 ≤ v6 ≤ 1
ences should be considered.                                         d7       Intrapersonal intelligence       0 ≤ v7 ≤ 1
    For example, some students are more talented for                d8       Active vs. Reflective style      0 ≤ v8 ≤ 1
analytical challenges, some are more successful in cre-             d9       Sensing vs. Intuitive style      0 ≤ v9 ≤ 1
ative or composing tasks, and others may have an ex-                d10      Visual vs. Verbal style          0 ≤ v10 ≤ 1
traordinary talent to memorize a lot of factual knowl-              d11      Sequential vs. Global style      0 ≤ v11 ≤ 1
edge. Consequently, we need to include individual
learner profiles to avoid lavishing the students with sug-                    Table 3. Derived Learner Profile
gestions that don’t match their individual preferences
and talents.
    In [5], we introduced an approach of personalized                  We consider both in our model, which is defined as
data mining. This approach adopts the G ARDNER ’ S                 an array of 11 attribute-value pairs that contains 7 intel-
theory of multiple intelligences [6] and the learning              ligence attributes and 4 learning style attributes. Both
style model of F ELDER and S ILVERMAN [7]. The as-                 can be appraised by questionnaires that are available to
sumption behind this approach is that there is a link              the public in the web.
between                                                                To make the dimensions of both sources compara-
    • typical “competence traits” (according to G ARD -            ble to each other and see the quantitative relations, we
      NER ) and subjects that typically challenge the              normalized them in a way that they all have the same
      one or other “kind of intelligence” more than oth-           range of values. The intelligence dimensions rage from
      ers and                                                      10 to 40. The learning style dimensions range from -
                                                                   11 to +11 (opposite algebraic sign for opposite styles).
    • typical teaching methods (according to F ELDER               The normalization can be done by
      and S ILVERMAN) and subjects that are typically
      taught with these methods.                                       • v = result/40 for the intelligence dimensions
                                                                         according to G ARDNER and
According to [5], the next steps of collecting and pro-
cessing data to integrate this technology, are (1) the ap-             • v = (result + 11)/22 for the learning style di-
praisal of the learner profile introduced in [5] for the                 mensions accodrding to F ELDER and S ILVER -
very best students in each subject, (2) the derivation a                 MAN .
typical “success profile” for each subject, (3) the esti-
mation of learner profiles for all students as a (by suc-          Finally, our learner model looks as shown in Table 3.
cess degree) weighted average success profile of the                   However, it turned out to be very hard to find for-
subjects they took, and (4) the application of the same            mer students, who are still accessible and, moreover,
technology to the data of “personalized” decision trees            willing to fill in such questionnaires to obtain their
for each learner, which are composed by samples of                 learner profiles. Our students are very sensible in re-
learners, which have a similar learner profile.                    specting privacy and, vice versa, in expecting the same
    The appraisal of the G ARDNER - like items in the              respect from others. Since answers to the questions in
learner profile can be performed by a questionnaire,               the questionnaire may reveal some private issues, it is
which derives an estimation of a human’s intelligence              hard to ask them to answer these questions.
distribution by his/her answers on 70 questions. This                  However, there are some students, who we dare to
questionnaire is available to the public in the Internet           ask for filling in the questionnaires because they had a
as a downloadable Microsoft Excel file.1                           quite confidential relation to the one or other professor,
    The F ELDER -S ILVERMAN - like items of the                    but these students are not necessarily the best ones.
learner profile can be estimated by a questionnaire as                 Therefore, steps one and two of this plan need to
well. This questionnaire is also available to the public           be changed. To infer a typical ”success profile” of a
in the Internet.2                                                  subject, we can collect the questionnaire answers be
   1 see http://www.businessballs.com/howardgardnermultiple. . .   some student, which are not necessarily the best ones.
. . . intelligences.htm                                                Thus, we modified the approach of computing
       2 see http://www.engr.ncsu.edu/learningstyles/ilsweb.html   an ”average profile” of the best students towards a
”weighted average profile” of all available students,                However, the currently implemented way of statis-
who took part in a particular subject.                           tically analyzing all former students’ curricula ignores
      Let L(s) be the set of learners, who took part in the      the fact that the success chance heavily depends on in-
subject s and for who a learner profile can be composed          dividual properties.
from the questionnaires’ answers. So for each learner                A formerly developed approach to validate curric-
li ∈ L(s), i = 1...|L(s)|, a learner profile p(li ) =            ula personalized by building the decision tree based on
[di1 , di2 , · · · , di11 is available. Let                      former students with a similar learner profile only, was
                                                                refined here. This was necessary, because the required
               
                    1.00 , if li received in subject s mark S   personal data is not available.
               
               
               
                    0.80 , if li received in subject s mark A       As a result of practically implementing this re-
               
                     0.60 , if li received in subject s mark B   fined approach, we expect a remarkable improvement
succis =
               
                    0.40 , if li received in subject s mark C   of these results.
               
               
               
                    0.20 , if li received in subject s mark D
               
                     0.00 , if li received in subject s mark E
                                                                                  8. REFERENCES
be the success degree of the learner l1i in subject s.
    By using this success degree as a weight factor, the         [1] K.P. Jantke and R. Knauf, “Didactic design though
“typical success profile” of a subject s can be computed             storyboarding: Standard concepts for standard
as                                                                   tools,” in Proc. of 4th Int. Symposium on Infor-
                          ∑|L(s)|                                  mation and Communication Technologies, Work-
                                 i=1 (succs ∗ d1 )
                                              i     i                shop on Dissemination of e-Learning Technolo-
                          ∑                                        gies and Applications, Cape Town, South Africa.
                                |L(s)|
                                        (succis ∗ di2 ) 
                                                                   2005, ISBN 0-9544145-6-X, pp. 20–25, New
                 1              i=1
                                                        
   p(s) =                                .                         York: ACM Press.
           |L(S)|
             ∑                           ..            
                  succis 
                          ∑|L(s)|
                                                        
                                                                [2] R. Knauf, R. Böck, Y. Sakurai, S. Dohi, and S. Tsu-
                                i=1 (succs ∗ d11 )
            i=1                              i     i
                                                                     ruta, “Knowledge mining for supporting learning
                                                                     processes,” in Proc. of the 2008 IEEE Int. Con-
This calculation has to be done for each subject sepa-               ference on Systems, Man, and Cybernetics (SMC
rately and the set of “most successful students” differs             2008), Singapore. IEEE, Piscataway, NJ, USA,
from subject to subject, of course. The idea behind is               2008, IEEE Catalog number CFP08SMC-USB,
to mine a “typical success profile” for each subject sep-            ISBN 978-1-4244-2384-2, Library of Congress:
arately.                                                             2008903109, pp. 2615–2621.
    After performing these computations, steps three
and four can be conducted as planned originally and de-          [3] R. Knauf, Y. Sakurai, S. Tsuruta, and K.P. Jantke,
scribed in [5]. As a result of processing this additional            “Modeling didactic knowledge by storyboarding,”
data in the way sketched above, we expect a remarkable               Journal of Educational Computing and Research,
improvement the performance compared to the results                  vol. 42, no. 4, pp. in press, 2010.
presented in section 5.
                                                                 [4] Y. Sakurai, S. Dohi, S. Tsuruta, and R. Knauf,
                                                                     “Modeling academic education processes by dy-
          7. SUMMARY AND OUTLOOK                                     namic storyboarding,” Journal of Educational
                                                                     Technology & Society, vol. 12, no. 2, ISSN 1436-
The research reported here is focused on modeling,
                                                                     4522 (online) and 1176-3647 (print), pp. 307–333,
processing, evaluating and refining processes with hu-
                                                                     April 2009.
mans involved like learning. A formerly developed
concept called storyboarding is briefly introduced.              [5] R. Knauf, Y. Sakurai, S. Tsuruta, K. Takada, and
    Along with a storyboard application, we developed                S. Dohi, “Personalized curriculum composition by
a data mining technology to estimate success chances                 learner profile driven data mining,” in Proc. of the
of curricula, which are composed by students. So far,                2009 IEEE Int. Conference on Systems, Man, and
there was no practical significance for the performance              Cybernetics (SMC 2009), San Antonio, TX, USA,
of this technology.                                                  2009, ISBN 978-1-4244- 2794-9, pp. 2137–2142.
    The basic problem so far was the collection of data,
which has to be accumulated during a complete under-             [6] H. Gardner, Frames of Mind: The Theory of Mul-
graduate study of, which needs a period of four years.               tiple Intelligences, New York: Basic Books, 1993.
Meanwhile, we could gain a significant amount of data            [7] R.M. Felder and L.K. Silverman, “Learning and
to validate the technology.                                          teaching styles in en-gineering education,” Engi-
    By cross validation with the available data, we                  neering Education, vol. 78, no. 7, pp. 647–681,
could empirically show performance of our data min-                  1988.
ing technology.