<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Cococo: AI-Steering Tools for Music Novices Co-Creating with Generative Models 1</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ryan</forename><surname>Louie</surname></persName>
							<email>ryanlouie@u.northwestern.edu</email>
						</author>
						<author>
							<persName><forename type="first">Andy</forename><surname>Coenen</surname></persName>
							<email>andycoenen@google.com</email>
						</author>
						<author>
							<persName><forename type="first">Zhi</forename><surname>Cheng</surname></persName>
						</author>
						<author>
							<persName><surname>Huang</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Michael</forename><surname>Terry</surname></persName>
							<email>michaelterry@google.com</email>
						</author>
						<author>
							<persName><forename type="first">Carrie</forename><forename type="middle">J</forename><surname>Cai</surname></persName>
							<email>cjcai@google.com</email>
						</author>
						<author>
							<persName><forename type="first">Michael</forename><surname>Huang</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Carrie</forename><forename type="middle">J</forename><surname>Terry</surname></persName>
						</author>
						<author>
							<persName><surname>Cai</surname></persName>
						</author>
						<author>
							<persName><surname>Cococo</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">Northwestern University Evanston</orgName>
								<address>
									<region>IL</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">Google Research Mountain View</orgName>
								<address>
									<region>CA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<address>
									<settlement>Mountain View</settlement>
									<region>CA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="institution">Google Research Cambridge</orgName>
								<address>
									<region>MA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="institution">Google Research Mountain View</orgName>
								<address>
									<region>CA</region>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Cococo: AI-Steering Tools for Music Novices Co-Creating with Generative Models 1</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">5163D5B2DAD625E4C67B2EE1C36B04C4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:10+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this work 1 , we investigate how novices co-create music with a deep generative model, and what types of interactive controls are important for an effective co-creation experience. Through a needfinding study, we found that generative AI can overwhelm novices when the AI generates too much content, and can make it hard to express creative goals when outputs appear to be random. To better match co-creation needs, we built Cococo, a music editor web interface that adds interactive capabilities via a set of AIsteering tools. These tools restrict content generation to particular voices and time measures, and help to constrain non-deterministic output to specific high-level directions. We found that the tools helped users increase their control, self-efficacy, and creative ownership, and we describe how the tools affected novices' strategies for composing and managing their interaction with AI.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Recent generative music models have made it conceivable for novices to create an entire musical composition from scratch, in partnership with a generative model. For example, the widely available Bach Doodle <ref type="bibr" target="#b8">[9]</ref> sought to enable anyone on the web to create a four-part chorale in the style of J.S. Bach by writing only a few notes, allowing an AI to fill in the rest. While this app makes it conceivable for even novices with no composition training to create music, it is not clear how people perceive and engage in co-creation activities like these, or what types of capabilities they might find useful.</p><p>In a need-finding study we conducted to understand the novice-AI co-creation process, we found that generative music models can sometimes be quite challenging to co-create with. Novices experienced information overload, in which they struggled to evaluate and edit the generated music because the system created too much content at once. They also struggled with the system's nondeterministic output. While the output would typically be coherent, it would not always align with users' musical goals at the moment. Having surfaced these challenges, this paper seeks to understand what interfaces and interactive controls for generative models are important in order to promote an effective co-creation experience.</p><p>As a step towards explicitly designing for music novices cocreating with generative models, we present Cococo (collaborative co-creation), a music editor web-interface for novice-AI co-creation that augments standard generative music interfaces with a set of AI-steering tools: 1) Voice Lanes that allow users to define for which time-steps (e.g. measure 1) and for which voices (e.g. soprano, alto, tenor, bass) the AI generates music, before any music is created, 2) an Example-based Slider for expressing that the AI-generated music should be more or less like an existing example of music, 3) Semantic Sliders that users can adjust to direct the music toward high-level directions (e.g. happier / sadder, or more conventional / more surprising), and 4) Multiple Alternatives for the user to select between a variety of AI-generated options. To implement the sliders, we developed a soft priors approach that encodes desired qualities specified by a slider into a prior distribution; this soft prior is then used to alter a model's original sampling distribution, in turn influencing the AI's generated output.</p><p>In a summative evaluation with 21 music novices, we found that AI-steering tools not only increased users' trust, control, comprehension, and sense of collaboration with the AI, but also contributed to a greater sense of self-efficacy and ownership of the composition relative to the AI. We also reveal how AI-Steering tools affected novices co-creation process, such as by working with smaller, semantically-meaningful components and reducing the non-determinism in AI-generated output. Together, these findings inform the design of future human-AI interfaces for co-creation. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">NOVICE'S NEEDS FOR CO-CREATION</head><p>To understand challenges when composing music with generative models, we conducted a 25 minute needfinding study with 11 music composition novices. We observed novices use a tool that mirrored conventional interfaces for composing music with deep generative models <ref type="bibr" target="#b8">[9]</ref>.</p><p>Participants experienced information overload: they struggled to evaluate the generated music due to the amount of AIgenerated content. Participants struggled to identify which note was causing a discordant sound after multiple generated voices were added to their original. Participant were naturally inclined to work on the composition "bar-by-bar or part-by-part"; however in contrast to these expectations, the generated output felt like it "skipped a couple steps" and made it difficult to follow all at once. Participants struggled to express desired musical objectives due to the AI's non-deterministic output. Even though what was produced sounded harmonious to the user, they felt incapable of giving feedback about their goal in order to constrain the kinds of notes the model generated. Participants likened this frustrated feeling to "rolling dice" to generate a desired sound, and instead wished to control generation based on relevant musical objectives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">COCOCO</head><p>Based on identified user needs, we developed Cococo (collaborative co-creation), a music editor web-interface 3 for novice-AI co-creation that augments standard generative music interfaces 3 https://github.com/pair-code/cococo with a set of AI steering tools (Figure <ref type="figure" target="#fig_0">1</ref>). Cococo builds on top of Coconet <ref type="bibr" target="#b6">[7]</ref>, a state-of-the-art deep generative model trained on 4 part harmony that accepts incomplete music as input and outputs complete music. Coconet works with music that can have 4 parts or voices playing at the same time (represented by Soprano Alto Tenor Bass), are 2-measures long or 32 timesteps of sixteenth-note beats, and where each voice can take on any one of 46 pitches. Coconet is able to infill any section of music, including gaps in the middle or start of the piece. To mirror the most recent interfaces backed by these infill capabilities <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b4">5]</ref>, Cococo contains an infill mask feature, with which users can crop a passage of notes to be erased using a rectangular mask, and automatically infill that section using AI. Users can also manually draw and edit notes.</p><p>Beyond the infill mask, Cococo distinguishes itself with its AI steering tools. In the following subsections, we describe in detail each of the four tools. Additionally, we illustrate the co-creation workflow enabled by these tools in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>3.0.1 Voice Lanes. Voice Lanes allow specifying for which voice(s) and for which time steps to generate music. With this capability, users can control the amount of generated content they would like to work with. This was designed to address information overload caused by Coconet's default capabilities to infill all remaining voices and sections at a time. For example, a user can request the AI to add a single accompanying bass line to their melody by highlighting the bass (bottom) voice lane for the duration of the melody, prior to clicking the generate button (Figure <ref type="figure" target="#fig_0">1B</ref>). To support this type of request, we pass a custom generation mask to the Coconet model including only the user-selected voices and time-slices to be generated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.0.2">Multiple</head><p>Alternatives. Cococo provides affordances for auditioning multiple alternatives generated by the AI. This capability was designed based on formative feedback, in which users wanted a way to cycle through several generated suggestions to decide which was the most desirable. Users first choose the number of alternatives to be generated (Figure <ref type="figure" target="#fig_0">1C</ref>), audition each alternative by clicking on the different preview thumbnails (Figure <ref type="figure" target="#fig_0">1F</ref>), and listen to an alternative which is substituted within the larger musical context (Figure <ref type="figure" target="#fig_0">1G</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.0.3">Example-based Slider.</head><p>While prototyping the Multiple Alternatives feature, we found that the non-determinism inherent in Coconet could cause generated samples to be both (1) random and unfocused, or (2) too similar to each other and lack diversity. As a solution, we developed the example-based slider for expressing that the AI-generated music should be more or less like an existing example of music. Before this slider is enabled, the user must select a reference example chunk of notes. Example-based sliders use soft priors to guide music generation.</p><p>3.0.4 Semantic Sliders. We implemented two semantic sliders in Cococo (Figure <ref type="figure" target="#fig_0">1D</ref>) to constrain generated output along meaningful dimensions: a conventional vs. surprising slider, and a major (happy) vs. minor (sad) slider. Users can adjust how predictable vs. unusual notes should be using the "conventional" and "surprising" dimensions of the slider. The conventional/surprising slider adjusts the temperature (𝑇 ) of the sampling distribution <ref type="bibr" target="#b3">[4]</ref>. A lower temperature makes the distribution more "peaky" and even more likely for notes to be sampled that had higher probabilities in the original distribution (conventional), while higher temperatures makes the distribution less "peaky" and sampling more random (surprising). The major vs. minor slider constrains generated notes to a happier (major) quality or a sadder (minor) quality. This slider defines a soft prior that adjusts the sampling distribution to have higher probabilities for the most-likely major triad (for happy) or non-major triad (for sad) at each time-step. We provide visual intuition for how these distributions interact in Figure <ref type="figure" target="#fig_1">2</ref>. More formally, we use the equation below to alter the distribution used to generate outputs:</p><formula xml:id="formula_0">𝑝 adjusted (𝑥 𝑣, 𝑡 |𝑥 𝐶 ) ∝ 𝑝 coconet (𝑥 𝑣, 𝑡 |𝑥 𝐶 ) 𝑝 softprior (𝑥 𝑣, 𝑡 )</formula><p>where 𝑝 coconet (𝑥 𝑣,𝑡 |𝑥 𝐶 ) gives the sampling distribution over pitches for voice 𝑣 at time 𝑡 from Coconet given musical context 𝑥 𝐶 (𝐶 gives the set of 𝑣, 𝑡 positions constituting the context), 𝑝 softprior (𝑥 𝑣,𝑡 ) encodes the distribution over pitches specified by the user or AIsteering tool designer (serving as soft priors), and 𝑝 adjusted (𝑥 𝑣,𝑡 |𝑥 𝐶 ) gives the resulting adjusted posterior sampling distribution over pitches. The soft priors 𝑝 softprior (𝑥 𝑣,𝑡 ) are defined so that notes that should be encouraged are given a higher probability, and those discouraged are given a lower, but non-zero probability. Since none of the note probabilities are forced to zero, very probable notes in the model's original sampling distribution can still be likely after incorporating the priors, thus making it possible for the model's output to adhere to both the original context and the additional user-desired qualities.</p><p>The example-based and semantic sliders define a soft prior to modulate the model's generated output. When the user sets the example-based slider to more "similar, " Cococo defines a soft prior with higher probabilities for notes in the example. Conversely, for a slider setting of more "different, " Cococo defines a soft prior with lower probabilities for notes in the example.</p><p>The minor/major slider uses a slightly more complicated approach to define the soft prior distribution. When the user sets the slider to happy (major), for example, Cococo defines the soft prior by asking what is the most likely major triad at each time slice within the model's sampling distribution. The log likelihood of a triad is computed by summing the log probability of all the notes that could be part of the triad (e.g., for C major triad, this includes all the Cs, Es, and Gs in all octaves). We repeat this procedure for all possible major triads to determine which is the most likely for a time slice. We then repeat this procedure for all time slices to be generated, in order to create our soft prior for most likely major triads.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">USER STUDY</head><p>We conducted a within-subjects study to compare the user experience of Cococo to that of the conventional interface. The conventional interface is aesthetically similar to Cococo, but does not contain the AI-steering tools. To mirror the most recent deep generative music interfaces, the conventional interface does include the infill-mask feature, which enables users to crop any region of the music and request that it be filled in by the AI <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b4">5]</ref>. Through a quantitative survey study, we seek to answer RQ1 How the AIsteering tools in Cococo affects user perceptions of the creative process and the creative artifacts made with the AI. Through qualitative interviews and observations, we seek to understand RQ2 How music novices apply the AI-steering tools within Cococo in their creative process? What patterns of use and strategies arise?  interfaces on their own (30 minutes). Then, they composed two pieces, one with Cococo and one with the conventional interface, with the order counterbalanced (15 minutes each). As a prompt, users were provided a set of images from the card game Dixit <ref type="bibr" target="#b13">[14]</ref> and were asked to compose music that reflected the character and mood of one image of their choosing. This task is similar to imagebased tasks used in prior music studies <ref type="bibr" target="#b7">[8]</ref>. Finally, they answered a post-study questionnaire and completed a semi-structured interview (20 minutes). So that we could understand their thought process, users were encouraged to think aloud while composing. 4.0.2 Quantitative Measures. For our quantitative questionnaire, we evaluated the following outcome metrics. All items below were rated on a 7-point Likert scale (1=Strongly disagree, 7=Strongly agree) except where noted below.</p><p>The following set of metrics sought to measure users' compositional experience. Creative expression: Users rated "I was able to express my creative goals in the composition made using [System X]. " Self-efficacy: Users answered two items from the Generalized Self-Efficacy scale <ref type="bibr" target="#b12">[13]</ref> that were rephrased for music composition. Effort: Users answered the effort question of the NASA-TLX <ref type="bibr" target="#b5">[6]</ref>, where 1=very low and 7=very high. Engaging: Users rated "Using [System X] felt engaging." Learning: Users rated "After using [System X], I learned more about music composition than I knew previously." Completeness of the composition: Users rated "The composition I created using [System X] feels complete (e.g., there's nothing to be further worked on). " Uniqueness of the composition: Users rated "The composition I created using System X feels unique. "</p><p>In addition, we evaluated users' attitudes towards the AI. AI interaction issues: Users rated the extent to which the system felt comprehensible and controllable, two key challenges of human-AI interaction raised in prior work on DNNs <ref type="bibr" target="#b11">[12]</ref>. Trust: Participants rated the system along Mayer's dimensions of trust <ref type="bibr" target="#b10">[11]</ref>: capability, benevolence, and integrity. Ownership: Users rated two questions, one on ownership ("I felt the composition created was mine. "), and one on attribution ("The music created using [System X] was 1=totally due to the system's contributions, 7=totally due to my contributions. "). Collaboration: Users rated "I felt like I was collaborating with the system. "</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">QUANTITATIVE FINDINGS</head><p>Results from the post-study questionnaire are shown in Figure <ref type="figure" target="#fig_3">3</ref>. We conducted paired t-tests using Benjamani-Hochberg correction to account for the 15 planned-comparisons, using a false discovery rate 𝑄 = 0.05.</p><p>In regards to users perceptions of the creative process, we found Cococo significantly improved participants ability to express their creative goals, self-efficacy, perception of learning more about music, and engagement compared to the conventional interface. No significant difference was found in effort; participants described the two systems as requiring different kinds of effort: While Cococo required users to think and interact with the controls, the conventional interface's lack of controls made it effortful to express creative goals. Users perceptions of the completeness of their composition made with Cococo was significantly higher than the conventional interface; however, no significant difference was found for uniqueness.</p><p>The comparisons for users' attitudes towards the AI were all found to be statistically significant: Cococo was more controllable, comprehensible, and collaborative than the conventional interface; participants using Cococo expressed higher trust in the AI, felt more ownership over the composition, and attributed the music to more of their own contributions relative to the AI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">QUALITATIVE FINDINGS</head><p>In this section, we first report how AI-Steering tools supported novices' composing strategies and experience, including 1) working with smaller, semantically meaningful components and 2) reducing non-determinism through testing a variety of constrained settings for generation. We then describe 3) how novices' prior mental models shaped their interaction with AI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Effects of Partitioning AI Capabilities into</head><p>Semantically-Meaningful Components AI-Steering tools allowed participants to build up the composition from smaller components, bit-by-bit. For example, one participant who used the Voice Lanes said, "I'm trying to get the bass right, then the tenor right, then soprano and alto right, and build bit-bybit" (P2). Participants who worked bit-by-bit thought about their compositions in semantically-meaningful chunks, such as melody vs. background or separate musical personas. For example, one participant gave the tenor voice an "alternating [pitch] pattern" to express indecision in the main melody, then gave other voices "mysterious... dinging sounds" as a harmonic backdrop (P4).</p><p>Working bit-by-bit helped participants feel less overwhelmed and better understand their compositions. For example, those working voice-by-voice could better handle the combination of multiple voices: "As someone who cannot be thinking about all 4 voices at the same time, it's so helpful to generate one at a time" (P2). Participants then became familiar with their own composition during the creation process, which enabled them to more quickly identify the "cause" of problematic areas later on. For example, one participant indicated that "[because] I had built [each voice] independently and listened to them individually, " this helped them "understand what is coming from where" (P7).</p><p>Through this bit-by-bit process, participants learned how subcomponents can combine to achieve desired musical outcomes. For instance, one participant learned that "a piece can become more vivid by adding both a minor and major chord" after they applied the major/minor slider to generate two contrasting, side-by-side chunks (P12).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Effects of Constraining Non-Determinism in Generated Output</head><p>AI-Steering tools helped to constrain the non-deterministic output inherent in the generative model. As a result, the tools allowed users to steer generation in desired directions when composing with AI. Multiple Alternatives reduced the uncertainty that AIgenerated output would be misaligned with a user's musical goals.</p><p>Participants could simply generate a range of possibilities, audition them, and choose the one closest to their goal before continuing.</p><p>During different phases of the composing process, participants used the sliders to constrain the large space of possibilities that could be generated. The Semantic Sliders were sometimes used to set an initial trajectory for generated music: "Because I was able to give more inputs to <ref type="bibr">[Cococo]</ref> about what my goals were, it was able to create some things that gave me a starting point" (P8). Sliders were also used to refine what the AI had already generated: "It was... not dramatic enough. Moving the slider to more surprising, and more minor added more drama at the end" (P5).</p><p>Participants constrained generation by setting the sliders to their outer limits. This enabled them to test the boundaries of AI output. For example, one participant moved a slider to the "similar" extreme, then incrementally backed it off to understand what to expect at various levels of the slider: "On the far end of similar, I got four identical generations, and now I'm almost at the middle now, and it's making such subtle adjustments" (P18). In contrast, when using the conventional interface, participants could not as easily discern whether undesirable model outputs were due to AI limits, or a simple luck of the draw.</p><p>Participants also used the tools to consider how a specific input configuration affects the limits of AI output. For example, one participant used the Voice Lanes to generate multiple alternatives for a single-voice harmony. This enabled them to consider the limits imposed by specific voice components: "Maybe the dissonance [in the single-voice] is happening because of how I had the soprano and bass... which are limiting it... so it's hard to find something that works" (P15). The Multiple Alternatives capability further enabled this participant to systematically infer that the specific configuration of existing voice components was unlikely to produce better results through the observation of multiple poor results generated for the single-voice.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Effects of Users' Prior Mental Models</head><p>Participants brought with them prior mental models that impacted how they interacted with the generative model. First, many participants already had a set of primitives for expressing high-level musical goals. For example, higher pitches were used to communicate a light mood, long notes to convey calmness or drawn-out emotions, and a shape of ascending pitches to communicate triumph and escalation. When participants who could not find an explicit tool that mapped to their envisioned primitive, they repurposed the tools as "proxy controls" to enact their strategy. For example, a common pattern was to set the slider to "conventional" to generate music that was "not super fast... not a strong musical intensity" (P9), and to "surprising" for generating "shorter notes... to add more interest" (P15).</p><p>In some cases, even use of the AI-steering tools did not succeed in generating the desired quality. For example, the music produced using the "similar" setting was not always similar along the userenvisioned dimension. To overcome these challenges, participants developed a strategy of "leading by example" by populating the input context with the type of content they desired from the AI. For instance, one participant manually drew an ascending pattern in the first half of the alto voice, in the hopes that the AI would continue the ascending pattern in the second half.</p><p>Second, several participants believed that the AI model was superior to their skills as novice composers. As such, when specific errors arose during the composing process, they often blamed their own efforts for these mistakes and hesitated to play an active role in the process. While we found evidence that the tools helped improve feelings of self-efficacy (See Quantitative Findings), there were also times when participants doubted their own musical abilities. Novices experienced self-doubt when poor sounding music was generated based off of their user-composed notes as the input context. For example, one user said, "All the things it's generating sound sad, so it's probably me because of what I generated" (P11). In cases such as this, participants seemed unable to disambiguate between AI failures and their own composing flaws, and placed the blame on themselves.</p><p>In other scenarios, novices were hesitant to interfere with the AI music generation process. For instance, some assumed that the AI's global optimization would create better output than had they worked bit-by-bit: "Instead of doing [the voice lanes] one by one, I thought that the AI would know how to combine all these three [voices] in a way that would sound good" (P1). While editing content, others were worried that making local changes could interfere with the AI's global optimization and possibly "mess the whole thing up" (P3). In these cases, an incomplete mental model of how the system functions seemed to discourage experimentation and their sense of self-efficacy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">DISCUSSION</head><p>7.0.1 Partition AI Capabilities into Semantically-Meaningful Tools. Our results suggest that AI-steering tools played a key role in breaking the co-creation task down into understandable chunks and generating, auditioning, and editing these smaller pieces until users arrived at a satisfactory result. Unexpectedly, novices quickly became familiar with their own creations through composing bit-bybit, which later helped them debug problematic areas. Interacting through semantically meaningful tools also helped them learn more about music composition and effective strategies for achieving particular outcomes (e.g., the effect of a minor key in the composition). Ultimately, AI-steering tools affected participants' sense of artistic ownership and competence as amateur composers, through an improved ability to express creative intent. In sum, beyond reducing information overload, tools that partition AI capabilities into semantically-meaningful components may be fundamental to one's notion of being a creator, while opening the door for users to learn effective strategies for creating in that domain.</p><p>7.0.2 Onboard Users and Divulge AI Limitations. While participants were able to develop productive strategies using AI-steering tools, they were sometimes hesitant to make local edits for fear of adversely affecting the AI's global optimization. These reactions suggest that participants could benefit from a more accurate mental model of the AI. Previous research suggests benefits of educating users about the AI and its capabilities <ref type="bibr" target="#b0">[1]</ref>, or providing onboarding materials and exercises <ref type="bibr" target="#b1">[2]</ref>. For example, an onboarding tutorial could demonstrate contexts in which the AI can easily generate content, and situations where it is unable to function well. For instance, the system could automatically detect if the AI is overly constrained and unable produce a wide variety content, and display a warning sign on the tool icon. Or, semantic sliders could divulge certain variables they are correlated with but not systematically mapped to, to set proper expectations when users leverage them as proxies. This could help users better debug the AI when it produces undesirable results. It could also prevent them from incorrectly attributing themselves and their lack of experience in composing as the source of the error, rather than the AI being overly constrained. 7.0.3 Bridge Novice Primitives with Desired Creative Goals. Though we created an initial set of dimensions for AI-steering, we were surprised that participants already had a set of go-to primitives to express high-level creative goals, such a long notes to convey calmness or ascending notes to express triumph and escalation. When the interactive dimensions did not explicitly map to these primitives, they re-purposed the existing tools as proxy controls to achieve the desired effect. Given this, one could imagine directly supporting these common go-to strategies. Given a wide range of possible semantic levers, and the technical challenges of exposing these dimensions in DNNs, model creators should at minimum prioritize exposing dimensions that are the most commonly relied upon. For music novices, we found that these included pitch, note density, shape, voice and temporal separation. Future systems could help boost the effectiveness of novice strategies by helping them bridge between their primitives to high-level creative goals, such as automatically "upgrading" a series of plodding bass line notes to create a foreboding melody.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Users of Cococo can manually write some notes (A), specify which voices and in which time range to request AIgenerated music using Voice Lanes (B), click Generate (C) infill the music given the existing notes, constrain generation along specific dimensions of interest using the Semantic Sliders (D) and Example-Based Slider (E), or audition Multiple Alternatives (F) of generated output by selecting a sample thumbnail to temporarily substitute it into the music score (shown as glowing notes in this figure (G)). Users can also use the Infill Mask (H) to crop a section of notes to be infilled again using AI.</figDesc><graphic coords="2,53.80,83.69,504.41,231.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Visualization of using soft priors to adjust a model's sampling distribution. The shape of the distributions are simplified to 1 voice, 7 pitches, and 4 timesteps. In CoCoCo, the actual shape is 4 voices, 46 pitches, and 32 timesteps 3.0.5 Soft Priors: a Technique for AI-Steering. The soft prior approach enables the generation of output that adheres to both the</figDesc><graphic coords="3,60.54,509.42,226.76,102.58" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>4.0.1 Method. 21 music composition novices participated in the study. Each participant first completed an online tutorial of the two</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Results from post-study survey comparing the conventional interface and Cococo, with standard error bars.</figDesc><graphic coords="4,53.80,83.69,504.37,123.03" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">This workshop paper is a shortened summary of the full CHI'20 paper<ref type="bibr" target="#b9">[10]</ref> </note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">This work was completed during the first author's summer internship at Google.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Guidelines for Human-AI Interaction</title>
		<author>
			<persName><forename type="first">Saleema</forename><surname>Amershi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Weld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mihaela</forename><surname>Vorvoreanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Adam</forename><surname>Fourney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Besmira</forename><surname>Nushi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Penny</forename><surname>Collisson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jina</forename><surname>Suh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shamsi</forename><surname>Iqbal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paul</forename><forename type="middle">N</forename><surname>Bennett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kori</forename><surname>Inkpen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jaime</forename><surname>Teevan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ruth</forename><surname>Kikin-Gil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eric</forename><surname>Horvitz</surname></persName>
		</author>
		<idno type="DOI">10.1145/3290605.3300233</idno>
		<ptr target="https://doi.org/10.1145/3290605.3300233" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems</title>
				<meeting>the 2019 CHI Conference on Human Factors in Computing Systems<address><addrLine>Glasgow, Scotland Uk; New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page">13</biblScope>
		</imprint>
	</monogr>
	<note>CHI &apos;19</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Hello AI&quot;: Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making</title>
		<author>
			<persName><forename type="first">Carrie</forename><forename type="middle">J</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Samantha</forename><surname>Winter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Steiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lauren</forename><surname>Wilcox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Terry</surname></persName>
		</author>
		<idno type="DOI">10.1145/3359206</idno>
		<ptr target="https://doi.org/10.1145/3359206" />
	</analytic>
	<monogr>
		<title level="m">Proc. ACM Hum.-Comput. Interact. 3</title>
				<meeting>ACM Hum.-Comput. Interact. 3</meeting>
		<imprint>
			<publisher>CSCW</publisher>
			<date type="published" when="2019-11">2019. Nov. 2019</date>
			<biblScope unit="page">24</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">Monica</forename><surname>Dinculescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cheng-Zhi Anna</forename><surname>Huang</surname></persName>
		</author>
		<ptr target="https://coconet.glitch.me/" />
		<title level="m">Coucou: An expanded interface for interactive composition with Coconet, through flexible inpainting</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Deep learning</title>
		<author>
			<persName><forename type="first">Ian</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aaron</forename><surname>Courville</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>MIT press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">DeepBach: a Steerable Model for Bach Chorales Generation</title>
		<author>
			<persName><forename type="first">Gaëtan</forename><surname>Hadjeres</surname></persName>
		</author>
		<author>
			<persName><forename type="first">François</forename><surname>Pachet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Frank</forename><surname>Nielsen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1362" to="1371" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research</title>
		<author>
			<persName><forename type="first">G</forename><surname>Sandra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lowell</forename><forename type="middle">E</forename><surname>Hart</surname></persName>
		</author>
		<author>
			<persName><surname>Staveland</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Advances in psychology.</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="page" from="139" to="183" />
			<date type="published" when="1988">1988</date>
			<publisher>Elsevier</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Counterpoint by Convolution</title>
		<author>
			<persName><forename type="first">Cheng-Zhi Anna</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tim</forename><surname>Cooijmnas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Adam</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aaron</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Douglas</forename><surname>Eck</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<publisher>ISMIR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Chordripple: Recommending chords to help novice composers go beyond the ordinary</title>
		<author>
			<persName><forename type="first">Cheng-Zhi Anna</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Duvenaud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Krzysztof</forename><forename type="middle">Z</forename><surname>Gajos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21st International Conference on Intelligent User Interfaces</title>
				<meeting>the 21st International Conference on Intelligent User Interfaces</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="241" to="250" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">The Bach Doodle: Approachable music composition with machine learning at scale</title>
		<author>
			<persName><forename type="first">Cheng-Zhi Anna</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Curtis</forename><surname>Hawthorne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Adam</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Monica</forename><surname>Dinculescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Wexler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leon</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jacob</forename><surname>Howcroft</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<publisher>ISMIR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Novice-AI Music Co-Creation via AI-Steering Tools for Deep Generative Models</title>
		<author>
			<persName><forename type="first">Ryan</forename><surname>Louie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andy</forename><surname>Coenen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhi</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carrie</forename><forename type="middle">J</forename><surname>Terry</surname></persName>
		</author>
		<author>
			<persName><surname>Cai</surname></persName>
		</author>
		<idno type="DOI">10.1145/3313831.3376739</idno>
		<ptr target="https://doi.org/10.1145/3313831.3376739" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems</title>
				<meeting>the 2020 CHI Conference on Human Factors in Computing Systems<address><addrLine>Honolulu, HI USA; New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">13</biblScope>
		</imprint>
	</monogr>
	<note>) (CHI &apos;20</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">An integrative model of organizational trust</title>
		<author>
			<persName><forename type="first">James</forename><forename type="middle">H</forename><surname>Roger C Mayer</surname></persName>
		</author>
		<author>
			<persName><surname>Davis</surname></persName>
		</author>
		<author>
			<persName><surname>David Schoorman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Academy of management review</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="709" to="734" />
			<date type="published" when="1995">1995. 1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">I Lead, You Help but Only with Enough Details: Understanding User Experience of Co-Creation with Artificial Intelligence</title>
		<author>
			<persName><forename type="first">Changhoon</forename><surname>Oh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jungwoo</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jinhan</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Seonghyeon</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sungwoo</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bongwon</forename><surname>Suh</surname></persName>
		</author>
		<idno type="DOI">10.1145/3173574.3174223</idno>
		<ptr target="https://doi.org/10.1145/3173574.3174223" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems</title>
				<meeting>the 2018 CHI Conference on Human Factors in Computing Systems<address><addrLine>Montreal QC, Canada; New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">649</biblScope>
			<biblScope unit="page">13</biblScope>
		</imprint>
	</monogr>
	<note>CHI &apos;18</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Generalized self-efficacy scale. Measures in health psychology: A user&apos;s portfolio</title>
		<author>
			<persName><forename type="first">Ralf</forename><surname>Schwarzer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matthias</forename><surname>Jerusalem</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Causal and control beliefs</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="35" to="37" />
			<date type="published" when="1995">1995. 1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<ptr target="https://en.wikipedia.org/w/index.php?title=Dixit_(card_game)&amp;oldid=908027531" />
		<title level="m">Dixit (card game) -Wikipedia, The Free Encyclopedia</title>
				<imprint>
			<date type="published" when="2019-09-19">2019. 19-September-2019</date>
		</imprint>
	</monogr>
	<note>Wikipedia contributors</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
