<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>First Workshop on Computational Design and Computer-aided Creativity</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>SkipInject: Expanding Control of Difusion Models Leveraging the Models Themselves</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ludovica Schaerf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Digital Visual Studies, UZH-MPG</institution>
          ,
          <addr-line>Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>23</volume>
      <abstract>
        <p>The landscape of computational design has evolved rapidly with the advent of generative models, such as difusion models, which have revolutionized image creation and editing across diverse creative domains. These models, particularly Stable Difusion [ 1] and its variants, provide designers with powerful tools that blur the line between creation and conception, opening new opportunities for creative experimentation. This paper explores an innovative approach to image editing that leverages the internal architecture of Stable Difusion, specifically the skip connections of its U-Net [ 2] backbone, to enable training-free, flexible content and style transfer. This research contributes to the growing field of computational creativity, ofering fresh perspectives on how computational methods can be employed to manipulate design artifacts in novel ways. Computational design traditionally involves automation of design processes, the creation of custom tools, and the extension of stylistic forms. However, the integration of generative models into the design workflow moves beyond mere automation, positioning the computer as a collaborator in the conceptualization of creative works. While recent advances in difusion models have largely focused on fine-tuning or retraining models for specific tasks, our approach utilizes the inherent flexibility of the U-Net architecture, particularly its skip connections, to ofer a more controlled and eficient means of editing images. Unlike prior methods that rely on training models from scratch or fine-tuning existing models, our approach, SkipInject [ 3], enables precise content and style transfer by injecting the spatial information from one image into another, preserving the core structural elements while modifying stylistic features. We systematically investigate the role of skip connections within the Stable Difusion model, analyzing how these connections contribute to the separation of content and style in the image generation process. Through a series of experiments, we demonstrate that the third encoder-decoder block in the U-Net architecture plays a crucial role in disentangling content from style, making it possible to transfer these elements independently. Our method ofers significant improvements over state-of-the-art techniques, achieving superior content alignment and structural preservation in image editing tasks. The creative potential of this method extends beyond basic style transfer. By modulating the intensity of the content and style blending process, designers can fine-tune the extent of transformation, allowing for a more iterative, exploratory approach to design. Furthermore, we introduce three modulation techniques - classifier-free guidance, depth-wise alternation, and timestep manipulation - that provide additional layers of control, enhancing the flexibility of the editing process. These techniques make the method adaptable to a wide range of creative tasks, from subtle, detail-oriented modifications to more radical, transformative changes. Our experiments cover a broad spectrum of image editing tasks, including text-guided image manipulation, style transfer, and fine-grained feature editing, demonstrating that our method achieves state-of-the-art performance on various benchmarks. Qualitative and quantitative evaluations show that SkipInject outperforms other leading methods in both text fidelity and structural preservation, ofering a more robust and controllable approach to image editing. The findings from this paper suggest that difusion models, and particularly the U-Net architecture, can serve as a foundation for more intuitive, flexible, and creative design practices, with wide-ranging applications in fields such as visual arts, graphic design, and multimedia. In conclusion, this paper presents a novel, eficient, and controlled method for image editing and style transfer that leverages the skip connections of Stable Difusion. Through systematic exploration and experimentation, we provide new insights into the inner workings of U-Netbased difusion models, ofering tools for creative professionals to explore and manipulate design artifacts in ways that were previously unattainable. As generative models continue to advance, the possibilities for computational creativity are vast, and this research serves as a step toward more dynamic and participatory design processes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Difusion Models</kwd>
        <kwd>Skip Connections</kwd>
        <kwd>Content-Style Disentanglement</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Declaration on Generative AI
During the preparation of this work, the author(s) used GPT-4 in order to: Paraphrase and reword.
After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s)
full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rombach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blattmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Esser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ommer</surname>
          </string-name>
          ,
          <article-title>High-resolution image synthesis with latent difusion models</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>10684</fpage>
          -
          <lpage>10695</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          , U-net:
          <article-title>Convolutional networks for biomedical image segmentation, in: Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference</article-title>
          , Munich, Germany, October 5-
          <issue>9</issue>
          ,
          <year>2015</year>
          , proceedings,
          <source>part III 18</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Schaerf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alfarano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          , L. Impett,
          <article-title>Training-free style and content transfer by leveraging u-net skip connections in stable difusion 2</article-title>
          ., arXiv preprint arXiv:
          <volume>2501</volume>
          .14524 (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>