<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">StrokeCoder: Path-Based Image Generation from Single Examples using Transformers</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Sabine</forename><surname>Wieluch</surname></persName>
							<email>sabine.wieluch@uni-ulm.des.wieluch</email>
							<affiliation key="aff0">
								<orgName type="department">Institute for Neural Information Processing</orgName>
								<orgName type="institution">Ulm University</orgName>
								<address>
									<postCode>89081</postCode>
									<settlement>Ulm</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Friedhelm</forename><surname>Schwenker</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute for Neural Information Processing</orgName>
								<orgName type="institution">Ulm University</orgName>
								<address>
									<postCode>89081</postCode>
									<settlement>Ulm</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">StrokeCoder: Path-Based Image Generation from Single Examples using Transformers</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">4FEBF5BF4333E7442A584E935065B8BF</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:44+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Transformer</term>
					<term>generative</term>
					<term>sketch</term>
					<term>vector graphic</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper demonstrates how a Transformer neural network can be used to learn a generative model from a single path-based example image. We further show how a data set can be generated from the example image and how the model can be used to generate a large set of deviated images, which still represent the original image's style and concept.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Hand-drawn sketches are often used to quickly illustrate a scene or easily capture a thought. So a quick drawing often becomes the first step of a large design process and is an essential part in generating new design ideas <ref type="bibr" target="#b0">[1]</ref>. To support users in their design processes <ref type="bibr" target="#b1">[2]</ref> or to give users the possibility for creative experimentation with sketches, for example with with Casual Creators <ref type="bibr" target="#b2">[3]</ref>, it will be very interesting to build such supportive tools with the help of generative machine learning. Though for most design process it is not interesting or even not possible to collect a large training data set. For example it is not suitable for a computer game level designer to create thousands of levels to train a generative model, as the data set generation exceeds the generative model's value. For this reason, this work focuses on three main points. First: one-shot learning, where a model is trained by only one or few examples. Second: generative models that produce path-based images, not pixel-based images. For supportive drawing applications and also for other design tasks like in digital fabrication <ref type="bibr" target="#b3">[4]</ref>, it is important to work with path data (for example a laser cutter needs path data to move along a line). Third: generation processes that can derive new work from the training data but preserve the original style.</p><p>Interesting work on sketch image generation has been performed by <ref type="bibr" target="#b4">[5]</ref>, who also focus on the generation of style preserving derivatives for Co-Creative settings. Also other research groups <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref>, worked on sketch data mainly utilizing Generative Adversarial Networks <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11]</ref>. Therefore the resulting images are pixel-based and not path-based images.</p><p>Machine learning on path-based data has been performed in related research work. One very well known example is Sketch-RNN <ref type="bibr" target="#b11">[12]</ref> where a recursive neural net was trained with a large data set of small sketches depicting different objects. Another very interesting approach by <ref type="bibr" target="#b12">[13]</ref> proposes a GAN-like architecture utilizing Long Short-Term Memory (LSTM) neural nets to generate path-based images from large sketch datasets. Recent studies by Xu et al. showed that Transformer neural networks are well suited to be trained on sketch data. Transformers <ref type="bibr" target="#b13">[14]</ref> are state-of-the art architectures for handling sequential data and outperform RNN or LSTM architectures in Natural Language Generation tasks. Xu et al. implemented a graph-based representation for strokes to perform free-hand sketch recognition <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16]</ref>. Training and generation of Scalable Vector Graphics has been performed by <ref type="bibr" target="#b16">[17]</ref>, where they used several fonts as data sets.</p><p>In our last research <ref type="bibr" target="#b17">[18]</ref>, we examined how Dropout can be used with Generative Adversarial Networks to create different, but coherent images in image-to-image translation tasks. For this research paper we aim to learn a generative model from one single hand-drawn image. We focus on generating diverse images that match the input image's style and concept. The generated images will be especially useful for creative support tools, art or digital fabrication.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Data Structure and Transformer Architecture</head><p>In this research, we aim to learn a neural representation for path-based sketch drawings. A sketch drawing consists of a sequence of pen strokes and each stroke can be approximated by small straight line segments. This sequence of sequences can be easily flattened to one large sequence. Therefore learning a neural representation for sketch drawings can be formulated as a sequence generation task. Transformers <ref type="bibr" target="#b13">[14]</ref> are a new class of neural nets which have been especially useful to the Natural Language Processing community. They are used for sequence-to-sequence translation tasks <ref type="bibr" target="#b18">[19]</ref> as well as for text generation <ref type="bibr" target="#b19">[20]</ref>. Also other domains have used Transformers for various tasks like music generation <ref type="bibr" target="#b20">[21]</ref>. A Transformer <ref type="bibr" target="#b13">[14]</ref> consists of an Encoder and a Decoder, as it is usually used for sequence-tosequence translation tasks. However, in our setting we aim to learn a generative model and therefore we will only use a Transformer Encoder. This seems counterintuitive at first, because for a generation task usually the Decoder would be used. Though Transformers Encoder and Decoder are very similarly constructed and mainly differ in an additional input from the Encoder to the Decoder in a translation setting. As we only want to generate a sequence, we can discard this input and end up with with the Transformer Encoder.</p><p>The Transformer Encoder consists of multiple layers, which end in a linear and softmax layer. Decoder layers can be stacked on top of each other any number of times. One Decoder layer consists of three sub-layers. The first and second layers are Multi-Head-Attention layers, which are a number of parallel attention layers, whose output is concatenated and finalized with a linear layer. A self-attention layer gives the neural net the ability to focus more on certain moves or ignore other moves in the sequence. A mask is applied to the first attention layer to prevent the neural net from seeing future sequence elements. Each of these two sub-layers end with a layer normalization. The third sub-layer is a feed-forward network, also ending with a layer normalization. Figure <ref type="figure" target="#fig_0">1</ref> gives a visual overview of our used architecture. The Transformer architecture does not receive information of the positional order of the sequence elements. Therefore, additional positional encoding is required. In our research, we use the standard positional encoding defined in <ref type="bibr" target="#b13">[14]</ref>. Encoder layers can be stacked on top at any number. The Encoder layer output is processed through a linear and a softmax layer to receive the final one-hot encoded vector, which can then be used to read the final move from the embedding.</p><p>But before we can use the Encoder, the input needs to be prepared accordingly. The sequence of straight lines needs to be converted to a sequence of vectors that can be processed by a neural network. In Natural Language Processing, using word sequences as input is a very similar problem. Here, word embeddings <ref type="bibr" target="#b21">[22]</ref> are used to encode words into vectors that can be used as input data. Following this method, we also embed our lines, but first we need to define what such a line actually consists of:</p><p>• Pen State (indicator if line should be drawn or not)</p><p>• Position (position to move to; relative to last position) This representation is very similar to Turtle Graphics, where a virtual pen can be moved with relative position commands. In the following, we refer to this line definition as a "move". Additionally, we introduce a special move to indicate the end of the image, which is also added to the embedding. The stroke ending does not need an own indicator move, as it is encoded in the Pen State change. The embedding works like a trainable lookup table: each pen move is assigned a real number vector of a certain length. This vector can be retrieved by the move's index in the lookup table. The embedding vectors are initialized randomly and are trained with gradient decent. By training the embedding vectors, similar moves (in the context of the drawing) receive similar vectors. This similarity is easier to observe in language: for example the words "rose" and "flower" are often used in similar contexts and should therefore receive a more similar vector than the words "house" and "spaghetti".</p><p>For a greedy sampling, the Transformer output is a one-hot encoded vector with the length of the embedding size. It contains one 1 at the index of the predicted next move.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data Set Generation</head><p>In our experiments, the initial stroke drawings are hand-drawn with a digitizer pen, from which only points are recorded (see figure <ref type="figure" target="#fig_1">2</ref>). The point sequence is then simplified to remove unnecessary points and instead describe the hand-drawn stroke by few curves instead of many points. So, a sequence of as few curves as possible is fitted through the recorded pen position points with an allowed maximum error <ref type="bibr" target="#b22">[23]</ref>. The resulting sequence of curves will in the following be referred to as a path. The recorded drawing is stored in this path state, because it is a better approximation as the later constructed straight moves, especially if the path is altered to create a large data set as described below. When learning generative models from few natural images <ref type="bibr" target="#b23">[24]</ref>, the images are altered into a variety of so called patches. These patches are cut out parts of the original images, which are also often slightly deformed, scaled or changed in other manners to produce a larger amount of training data as the initial images would have provided.</p><p>To learn a generative model on one single sketch image, we propose a similar method: the initial strokes are altered in different ways to produce a large and diverse training data set. As strokebased images differ a lot from natural, pixel-based images, the altering methods need to be adapted accordingly. All proposed altering methods are visualized in figure <ref type="figure" target="#fig_2">3</ref> and will be described below: • Translation: The whole path-image is moved to a new position in a way, that it is still contained in the initial image boundaries.</p><p>• Rotation: The whole path-image is rotated by a random angle.</p><p>• Mirror: the path image is mirrored along an axis.</p><p>• Path Reversal: As each path consists of a list of curves with start and end points, a path has an implicit direction. In our setting, the path direction is not important, so a path can be reversed to generate new patches. The path direction might be important for other settings like sketch or handwriting classification <ref type="bibr" target="#b14">[15]</ref>, where the stroke direction and path order are very similar in one letter. As Path Reversal is a binary state (either the path is reversed or not), this manipulation is applied with a probability of 0.5.</p><p>• Scaling: The whole path-based image is scaled to a smaller size.</p><p>After all paths have been manipulated, they are rearranged in a new order. We sort them in a greedy way by distance, so that the pen travel is as short as possible. We start at a randomly chosen path in the image. With this new path order, we add more variety in the data set while assuring that the model will learn to draw close by the last stroke. If paths would be not sorted but shuffled randomly, the resulting images will look more scattered, as new paths would appear in greater distances.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Training</head><p>"curles" "boxes" "spikes" "cirlces" We recorded 4 different drawings with an image boundary of 180x180 units to serve as initial strokebased images, which can be seen in figure <ref type="figure" target="#fig_3">4</ref>.</p><p>Each model was trained for 200 epochs, where for each epoch a new data set of 500 patches is generated from the original image. These patches were then converted to moves with a maximum line length of 15 units. It is important to generate a new patch data set each epoch to prevent overfitting to a small patch set. Instead, with the changing patch training data, the model sees a larger portion of the possible patchs.</p><p>To verify that the model learns to represent the whole patch distribution, we calculated the Cross Entropy Loss of an unseen data set of 500 patches after each training episode. The results can be seen in figure <ref type="figure" target="#fig_4">5</ref>. After an initial increase, the loss sinks for each new training epoch. Because the model sees a large portion of the data set distribution, it better learns to represent the whole data set distribution.</p><p>If the model would only be trained on one patch data set, the model quickly begins to overfit as can be seen in figure <ref type="figure" target="#fig_5">6</ref>. Here the "boxes" model was trained with one single patch data set of size 100, 500 and 1000. The plot depicts the Cross Entropy Loss of these models between an unseen patch set of size 500. The larger the training patch set is, the lower the loss. Though, the loss increases for each episode as one small patch set badly represents the whole data set distribution and the model overfits to the small sample.  When generating moves, the approximation error for the curve flattening algorithm should not be set too high, as the resulting image quality suffers. The difference of an error between 1 and 3 can be seen in figure <ref type="figure" target="#fig_6">7</ref>. With an large allowed error, especially details in narrow curves are lost. For our research we chose a maximum error value of 1.</p><p>In the training phase, we do not feed patch by patch to the Transformer. Instead we feed in a continuous stream of patches (though the input vectors are shuffled). So one input vector can contain for example the end half of a patch and the beginning halt of another patch. This way the Transformer learns to generate a stream of images. This will be helpful in the generation phase. For our training, we use the Adam optimizer <ref type="bibr" target="#b24">[25]</ref> with the described changes and parameters in <ref type="bibr" target="#b13">[14]</ref>, where the learning rate is first linearly increased for the first warm-up steps and thereafter decreased again. As a loss function, we use Cross Entropy Loss. For all of our experiments, we used the following Transformer settings:</p><p>• Batch Size: 200</p><p>• Sequence Length: max. move length of recorded image • Hidden Embedding Size: 52</p><p>• Decoder Layers: 6</p><p>• Attention Heads: 4</p><p>• Feed Forward Size: 2048</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Inference and Sampling</head><p>The Transformer can only predict one new move dependent on the previous moves. So the Transformer needs at least one move to start generating. We experimented with different lengths of random input vectors as initialization. These random vectors always ended with the "image-end" move to trigger the generation of a new image. The random initialization vector is of course discarded and not shown in the drawings. Figure <ref type="figure" target="#fig_7">8</ref> shows the results of three different initialization vector lengths. The black group was generated only by passing the "image-end" move to the Transformer. The images are often rather short (in sense of stroke count) and rarely show strokes that don't seem to fit to the original image. Also some empty images occurred. The red block was generated with a random initialization vector of half the sequence length. The images show a high variety and seem to fit well to the original image. The blue group on the bottom of figure <ref type="figure" target="#fig_7">8</ref> was generated with an initialization vector of the full sequence length. Here, the model seems "confused" by the random input and produces a large variety of long or unfitting strokes. We suggest to use an initialization vector in the range of half the sequence length to create enough randomness for interesting results but not to confuse the model too much. Though other initialization vectors might be better suited certain situations. For example the blue result group in figure <ref type="figure" target="#fig_7">8</ref> might be interesting to use in creativity support tasks where an AI helps the drawing user to get new inspiration. Here it might be beneficial that the model results differ more from the original concept.  As a sampling strategy we use top-k sampling. Here, only the best 𝑘 predictions are considered and are then chosen according to their probability. If 𝑘 = 1 top-k sampling is equivalent to a greedy approach where every time only the best prediction is chosen. Figure <ref type="figure" target="#fig_8">9</ref> shows results from all trained models with two different 𝑘 values: the left side is a greedy sampling with 𝑘 = 1 where the right side shows a sampling with 𝑘 = 10. The mayor difference in both sets is the image variety. With the greedy approach, the model often falls back to a small set of images. But these images fit well to the original image style. Though when generating, we experienced that the Transformer got stuck in generative loops, where it would draw one shape indefinitely. The images sampled with top-10 show a larger variety and also fit well to the original image, except some individual strokes. Over all these generated images look a bit more chaotic and also show new shapes that are not part of the original image, but fit the style (f.e. double-curled lines in the "curles" image). In Natural Language Processing, the sampling technique is often adjusted to the generation task. So in case of generating stroke images, it might also be best to adjust the sampling algorithm to the intended purpose. To evaluate if the generated images preserve the style of the original sketch, we asked 18 participants to choose from a set of 12 adjectives (parallel, ordered, stacked, chaotic, curly, straight, symmetric, repetitive/rhythmic, tangled, circular, jagged, nested) which ones best describe the original image and a set of generated images. The adjectives were chosen in a way that in combination they are able to describe a large variety of pattern. The question order was counterbalanced to avoid bias. We calculated the difference in chosen adjectives for each person. Figure <ref type="figure" target="#fig_9">10</ref> shows the results for each model: the mean error over all models is 2.94 from a maximum possible error of 12. From the results we can see, that some models better preserve the style than others. The "boxes" and "spikes" models most likely perform worse in this test, because the trained models seem to be good in replicating drawing shapes, but not in positioning them. And these two model's main features lie in the stroke positions (for example the "boxes" input image is symmetric). Depending on the model, an average of two to three attributes changed between the assessment of the original image and the generated images. So the majority of attributes is shared between both images, which allows the conclusion that the generated model does not perfectly preserve the style but learned to replicate the large majority of attributes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In our work, we showed that a Transformer neural net can be used to learn and generate stroke-based images from one single input image. We generated large training data sets from one image, using different path altering methods. Four hand-drawn image samples were recorded and used to generate our training data sets. We proved that is is essential to train the Transformer with a changing subset of training data in each epoch. This training data rotation prevents the model from overfitting to a too small sample.</p><p>We compared different initialization vector lengths and found that a lengths of half the Transformer sequence length gives the best results concerning variety and similarity to the original image. We also compared different sampling parameters in top-k sampling. We could observe that a greedy sampling method like 𝑘 = 1 results in a very low variety in images which fit very well to the original image.</p><p>Increasing 𝑘 also increases variety but also introduces strokes that do not fit well to the original image's style. Therefore it is important to choose 𝑘 depending on the application needs.</p><p>Finally we evaluated the style preservation capabilities of our models in an assessment of 18 participants. The results showed that most of the style attributes are learned by the model, though style fitting stroke placement was not always achieved.</p><p>In our future research we want to expand these generative methods to create larger path-based pattern images from one input image. For this it might be interesting to use hierarchical approaches, as they have been successfully used in other domains like dialogue generation <ref type="bibr" target="#b25">[26]</ref>. An hierarchical approach might be helpful to give the neural net an overview of already generated paths and their places beyond the sequence length memory. We see future applications for these large generated pattern or also expanded pattern in games, art, digital fabrication and many more creative domains. Another interesting field is Co-Creative Design <ref type="bibr" target="#b26">[27]</ref>, where a user is cooperating with a supportive artificial agent in a design task. Here, a stroke-based image representation can be very useful, especially in the domain of Casual Creators, because users can very intuitively interact by drawing with a pen. It will be interesting to explore these applications and domains in future research.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1:Transformer Encoder let as used in this work: after the input is embedded and receives positional encoding information, it uses multiple self-attention layers and a feed-forward net to process the sequence. Encoder layers can be stacked on top at any number. The Encoder layer output is processed through a linear and a softmax layer to receive the final one-hot encoded vector, which can then be used to read the final move from the embedding.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Simplification of a path: the first image depicts a hand-drawn stroke, where on each mouse event a new point is recorded. The next image shows a simplified version (a path), where multiple points have been substituted with a curve. In the next step, the path is converted to moves where too long moves are divided into multiple shorter moves.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Five types of path manipulations used to create a large data set from one example. Translation, Rotation, Scaling and Mirroring are used on the whole stroke-based image, whereas Path Reversal is used on individual paths in the image.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Four recorded stroke-based images to be used in the experiments. Each stroke is colored randomly for better distinction.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Loss of one unseen patch set of size 500. Loss is calculated with every model after each epoch. Models were trained with newly generated patch sets for every epoch. The falling loss curve indicates that the model learns to represent the whole patch distribution. (Absolute Loss differs because of different sequence lengths between models).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure6: Loss of one unseen patch set of size 500. Loss is calculated with three versions of the "boxes" model: one which was trained with a patch set size of 100, 500 and 1000. The rising loss curve indicates that the model overfits to the small patch sample.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Curve flattening results with maximum allowed error of 1(left) to 3(right). With an higher allowed error, smaller details like narrow curves suffer and are approximated to sharp edges.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Group of sampled sketches, differing in the inizialization vector length: 1 (black), half of sequence length (red) and full sequence length (blue).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: Groups of top-k sampled sketches from all four models. Left: k=1, right: k=10.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 10 :</head><label>10</label><figDesc>Figure 10: 18 participants chose out of a set of 12 pattern describing adjectives, which fit best for the original image and a set of generated images. Plot shows the error distribution grouped by model. The average error is 2.94.</figDesc></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The roles of sketches in early conceptual design processes</title>
		<author>
			<persName><forename type="first">M</forename><surname>Suwa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Gero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Purcell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Twentieth Annual Meeting of the Cognitive Science Society</title>
				<meeting>Twentieth Annual Meeting of the Cognitive Science Society<address><addrLine>New Jersey</addrLine></address></meeting>
		<imprint>
			<publisher>Lawrence Erlbaum Hillsdale</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="1043" to="1048" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Design sketches and sketch design tools</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">Y</forename></persName>
		</author>
		<author>
			<persName><forename type="first">-L</forename><surname>Do</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="383" to="405" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Casual creators: Defining a genre of autotelic creativity support systems</title>
		<author>
			<persName><forename type="first">K</forename><surname>Compton</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<pubPlace>Santa Cruz</pubPlace>
		</imprint>
		<respStmt>
			<orgName>University of California</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Retrieving 3d cad model by freehand sketches for design reuse</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Cai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advanced Engineering Informatics</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="385" to="392" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Unified classification and generation networks for co-creative systems</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">Y</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">M</forename><surname>Davis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-P</forename><surname>Hsiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Macias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Magerko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICCC</title>
		<imprint>
			<biblScope unit="page" from="237" to="244" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Toward realistic face photo-sketch synthesis via composition-aided gans</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Cybernetics</title>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">How do humans sketch objects?</title>
		<author>
			<persName><forename type="first">M</forename><surname>Eitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hays</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alexa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on graphics (TOG)</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="1" to="10" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Sketch me that shoe</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-Z</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Xiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Hospedales</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-C</forename><surname>Loy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="799" to="807" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">What you sketch is what you get: Quick and easy augmented reality prototyping with pintar</title>
		<author>
			<persName><forename type="first">D</forename><surname>Gasques</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Sharkey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Weibel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Unsupervised representation learning with deep convolutional generative adversarial networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Metz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chintala</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1511.06434</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><surname>Goodfellow</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1701.00160</idno>
		<title level="m">Nips 2016 tutorial: Generative adversarial networks</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Ha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Eck</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1704.03477</idno>
		<title level="m">A neural representation of sketch drawings</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Balasubramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">N</forename><surname>Balasubramanian</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1904.03620</idno>
		<title level="m">Teaching gans to sketch in vector format</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ł</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="5998" to="6008" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">K</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Bresson</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1912.11258</idno>
		<title level="m">Multi-graph transformer for free-hand sketch recognition</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-Z</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2002.00867</idno>
		<title level="m">Deep self-supervised representation learning for free-hand sketch</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A learned representation for scalable vector graphics</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">G</forename><surname>Lopes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Eck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shlens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE International Conference on Computer Vision</title>
				<meeting>the IEEE International Conference on Computer Vision</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="7930" to="7939" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Dropout induced noise for co-creative gan systems</title>
		<author>
			<persName><forename type="first">S</forename><surname>Wieluch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Schwenker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE International Conference on Computer Vision Workshops</title>
				<meeting>the IEEE International Conference on Computer Vision Workshops</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="0" to="0" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Language models are unsupervised multitask learners</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">OpenAI Blog</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">9</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">C.-Z</forename><forename type="middle">A</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Simon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hawthorne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Hoffman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dinculescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Eck</surname></persName>
		</author>
		<title level="m">Music transformer: Generating music with long-term structure</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Neural word embedding as implicit matrix factorization</title>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="2177" to="2185" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">An algorithm for automatically fitting digitized curves</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Schneider</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Graphics gems</title>
				<imprint>
			<publisher>Academic Press Professional, Inc</publisher>
			<date type="published" when="1990">1990</date>
			<biblScope unit="page" from="612" to="626" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Singan: Learning a generative model from a single natural image</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">R</forename><surname>Shaham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Dekel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Michaeli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE International Conference on Computer Vision</title>
				<meeting>the IEEE International Conference on Computer Vision</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4570" to="4580" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6980</idno>
		<title level="m">Adam: A method for stochastic optimization</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">A hierarchical latent variable encoder-decoder model for generating dialogues</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">V</forename><surname>Serban</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sordoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Charlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pineau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Thirty-First AAAI Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Guzdial</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Liao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riedl</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1809.09420</idno>
		<title level="m">Co-creative level design via machine learning</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
