<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Case-Base Maintenance Beyond Case Deletion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Brian Schack ??</string-name>
          <email>schackb@indiana.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indiana University</institution>
          ,
          <addr-line>Bloomington IN 47408</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Case-base maintenance strategies judiciously choose the most valuable cases to retain and the least valuable cases to delete in order to maintain a compact, competent case base. This research summary presents three case-base maintenance strategies which involve more than merely deleting cases: (1) Flexible feature deletion deletes components of cases instead of whole cases. (2) Adaptation-guided feature deletion prioritizes components for deletion according to their recoverability via adaptation knowledge. (3) Expansion-contraction compression, in addition to deleting cases, also adds cases in unexplored regions of the problem space. Evaluation of the strategies compared to standard case-base maintenance shows improved retention of competence and solution quality for suitable data sets compressed to the same sizes.</p>
      </abstract>
      <kwd-group>
        <kwd>arti cial intelligence</kwd>
        <kwd>case-based reasoning</kwd>
        <kwd>swamping utility problem</kwd>
        <kwd>case-base maintenance</kwd>
        <kwd>exible feature deletion</kwd>
        <kwd>adaptationguided feature deletion</kwd>
        <kwd>expansion-contraction compression</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Case-based reasoning gradually builds up a case base from training data,
knowledge engineered by human experts, and the retention phase. Each case
retained in the case base can potentially, through adaptation, solve future
problems. On the other hand, each retained case makes the case base larger. A larger
case base requires more storage, more time to search through, more time to
transmit over a network, and more time to manually review.</p>
      <p>
        The swamping utility problem describes this trade-o between the
competence, quality, and speed contribution of a case versus its storage, retrieval, and
bandwidth cost [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Much research over the years has attempted to mitigate
the utility problem [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Case-base maintenance strategies judiciously choose the
most valuable cases to retain and the least valuable cases to delete in order
to maintain a compact, competent case base. This research summary compares
standard case-base maintenance strategies to three strategies which go beyond
case deletion.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Flexible Feature Deletion</title>
      <p>
        Case-base maintenance strategies, whether based on coverage and
reachability or not, normally make two assumptions: (1) that all cases have a uniform
storage cost and (2) that they must retain or delete whole cases. This research
summary describes exible feature deletion, a knowledge-light case-base
maintenance strategy which, in contrast, subdivides variable-size cases for deletion of
their components [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Cases can have varying storage cost when they contain varying amounts of
information at varying levels of detail. The storage cost of both the problem
and the solution can vary independently because a simple problem may have a
complex solution and vice versa. This suggests balancing the competence
contribution of a case against its storage cost.</p>
      <p>A case-base maintenance strategy could delete an entire case, but it could also
delete a single feature across all cases, or a single feature from a single case. Each
of these alternatives presumably degrades problem-solving competence but not
necessarily to the same degree. Compared to per-case strategies, exible feature
deletion could reduce the size of a case base with less reduction in the number of
cases. It could also vary in the metric that it uses to order features for deletion.
Each of the variations uses a knowledge-light metric like the size of a case, the
rarity of a feature, or a hybrid of multiple metrics.</p>
      <p>Domains with large cases and multiple representations call for the application
of exible feature deletion. For example, cases based on medical imagery may
have various resolutions and a large number of features of which only some are
relevant to the diagnosis.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Adaptation-Guided Feature Deletion</title>
      <p>
        The adaptation-guided feature deletion case-base maintenance strategy builds
on exible feature deletion. Whereas exible feature deletion orders the features
according to a knowledge-light metric, adaptation-guided feature deletion
integrates additional knowledge from the solution transformation container about
the recoverability of features [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Similar to how reachability measures the
ability of adaptation knowledge applied to other cases to restore the solution to
a case considered for deletion, recoverability measures the ability of adaptation
knowledge applied to other features to restore a feature considered for deletion.
      </p>
      <p>A solution with recovered features may either match exactly the original
uncompressed solution, or it may solve the same problem in a di erent way.
Compression to smaller sizes can increase the time required for recovery and
decrease the quality of the recovered solution until adaptation knowledge can
no longer recover any solution at all. Therefore, in order to preserve
problemsolving competence, adaptation-guided feature deletion deletes features in order
from most recoverable to least.</p>
      <p>In addition to deleting features, adaptation-guided feature deletion can also
replace them with a smaller substitution or abstraction. Occasionally, this
reorganization can make case contents more accessible to an adaptation rule of
limited power. Even though case-base compression normally reduces competence,
compression under these circumstances, termed creative destruction, can improve
competence instead.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Expansion-Contraction Compression</title>
      <p>By the representativeness assumption, maintenance strategies predict that
future problems will follow the same distribution as the current case base, and
this works reasonably well for mature case bases in stable domains. But the
representativeness assumption may apply less accurately during early case base
growth, to dynamically changing domains, or in cross-domain transfer learning.</p>
      <p>
        In these situations, case-base maintenance strategies optimizing for assumed
representativeness may instead cause over tting. Over tting means that a
statistical model or a machine learning algorithm makes predictions based on
peculiarities in the training data not re ected in the testing data thereby improving
performance on the training data and sacri cing performance on the testing data
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        The over tting problem has received signi cant attention in the context of
arti cial neural networks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Among several potential mitigations, neural
networks may employ data augmentation which perturbs training data in order to
supplement it with additional instances [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. For example, cropping images
with0.97
0.95
0.93
0.91
0.95
0.93
0.91
0.89
0.87
      </p>
      <p>No Gap</p>
      <p>Medium Gap
CNN</p>
      <p>ECC
Large Gap
100% 90% 80% 70% 60% 50% 40% 30% 20% 10%
100% 90% 80% 70% 60% 50% 40% 30% 20% 10%
out obscuring their subjects or other minor deformations which maintain overall
cohesion.</p>
      <p>
        Case-based reasoning does not normally apply data augmentation, but the
solution transformation container provides a natural source for such
perturbations. Expansion-contraction compression explores unseen regions of the problem
space using adaptation knowledge to generate ghost cases and then exploits the
ghost cases to broaden the range of cases available for competence-based deletion
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Future Work</title>
      <p>
        Expansion-contraction compression explores the problem space arbitrarily |
not necessarily in the direction of unrepresentative regions. Therefore, similar to
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], work in progress involves designing a case-base maintenance strategy which
models the competence holes in a case base and targets expansion-contraction
compression to ll the competence holes located between nearby competence
groups by using adaptation knowledge to discover new cases. Evaluation will
compare this strategy to untargeted expansion-contraction compression on
multiple standard machine learning data sets and measure the retention of
competence and solution quality for case bases compressed to the same number of
cases.
      </p>
      <p>This research summary anticipates a dissertation consisting of six chapters.
The rst chapter will describe the case-based reasoning cycle and motivate
casebased maintenance in terms of the swamping utility problem. The second
chapter will explain the uniform storage and indivisiblity assumptions and evaluate
exible feature deletion. The third chapter will de ne recoverability and
evaluate adaptation-guided feature deletion. The fourth chapter will explain the
representativeness assumption and over tting problem and evaluate
expansioncontraction compression. The fth chapter will explain competence groups and
competence holes and evaluate targeted expansion-contraction compression. The
sixth chapter will envision future work and conclude by restating the key
contributions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dietterich</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>1995</year>
          ).
          <article-title>Over tting and Undercomputing in Machine Learning</article-title>
          .
          <source>ACM Computing Surveys</source>
          ,
          <volume>27</volume>
          (
          <issue>3</issue>
          ),
          <fpage>326</fpage>
          -
          <lpage>327</lpage>
          . https://doi.org/10.1145/212094.212114
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Juarez</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Craw</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Delgado</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Campos</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Maintenance of Case Bases: Current Algorithms After Fifty Years</article-title>
          .
          <source>International Joint Conference on Arti cial Intelligence</source>
          ,
          <volume>27</volume>
          ,
          <fpage>5457</fpage>
          -
          <lpage>5463</lpage>
          . https://doi.org/10.24963/ijcai.
          <year>2018</year>
          /770
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lawrence</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giles</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tsoi</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          (
          <year>1997</year>
          ).
          <article-title>Lessons in Neural Network Training: Over tting May Be Harder Than Expected</article-title>
          .
          <source>Proceedings of the 14th National Conference on Arti cial Intelligence</source>
          ,
          <fpage>540</fpage>
          -
          <lpage>545</lpage>
          . Retrieved from http://citeseerx.ist. psu.edu/viewdoc/download?doi
          <source>=10.1.1.38.6468n&amp;rep=rep1n&amp;type=pdf</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Leake</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schack</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <source>Flexible Feature Deletion: Compacting Case Bases by Selectively Compressing Case Contents. Case-Based Reasoning Research and Development</source>
          ,
          <volume>212</volume>
          -
          <fpage>227</fpage>
          . https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -24586-7 15
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Leake</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schack</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Adaptation-Guided Feature Deletion: Testing Recoverability to Guide Case Compression</article-title>
          .
          <source>Case-Based Reasoning Research and Development</source>
          ,
          <volume>9969</volume>
          ,
          <fpage>234</fpage>
          -
          <lpage>248</lpage>
          . https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -47096-2 16
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Leake</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schack</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Exploration vs</article-title>
          .
          <source>Exploitation in Case-Base Maintenance: Leveraging Competence-Based Deletion with Ghost Cases. Case-Based Reasoning Research and Development</source>
          ,
          <volume>11156</volume>
          ,
          <fpage>202</fpage>
          -
          <lpage>218</lpage>
          . https://doi.org/10.1007/ 978-3-
          <fpage>030</fpage>
          -01081-2 14
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mantaras</surname>
          </string-name>
          , R. L. De,
          <string-name>
            <surname>McSherry</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bridge</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leake</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Craw</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , ...
          <string-name>
            <surname>Watson</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2005</year>
          ). Retrieval, Reuse, Revision, and
          <source>Retention in Case-Based Reasoning. The Knowledge Engineering Review</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ),
          <fpage>215</fpage>
          -
          <lpage>240</lpage>
          . https://doi.org/10.1017/ S0269888906000646
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>McKenna</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <source>Competence-Guided Case Discovery. Research and Development in Intelligent Systems XVIII</source>
          ,
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          . https://doi.org/10.1007/ 978-1-
          <fpage>4471</fpage>
          -0119-2 8
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Schack</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Feature-Centric Approaches to Case-Base Maintenance</article-title>
          .
          <source>Proceedings of the ICCBR 2016 Workshops</source>
          ,
          <fpage>287</fpage>
          -
          <lpage>291</lpage>
          . Retrieved from https://pdfs. semanticscholar.org/9946/c00d9f03e5b7829b227d8501b4f7439d132d.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Schack</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Summers</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Flexible Feature Deletion</article-title>
          .
          <source>International Conference on Case-Based Reasoning Video Competition</source>
          (p.
          <fpage>2</fpage>
          <lpage>)</lpage>
          . United States. Retrieved from http://sce.carleton.ca/ m oyd/ICCBRVC2017/#nominees
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>1996</year>
          ).
          <article-title>The Utility Problem Analysed: A CaseBased Reasoning Perspective</article-title>
          .
          <source>Advances in Case-Based Reasoning</source>
          , (November),
          <fpage>392</fpage>
          -
          <lpage>399</lpage>
          . https://doi.org/10.1007/BFb0020625
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>S. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gatt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatescu</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>McDonnell</surname>
            ,
            <given-names>M. D.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Understanding Data Augmentation for Classi cation: When to Warp? Digital Image Computing: Techniques and Applications</article-title>
          . Retrieved from https://arxiv.org/pdf/1609.08764.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>