<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Thought-provoking Question Matrix to Guide the Development of Foundation-Model-based Applications</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sietske</forename><surname>Tacoma</surname></persName>
							<email>sietske.tacoma@hu.nl</email>
							<affiliation key="aff0">
								<orgName type="institution">Utrecht University of Applied Sciences</orgName>
								<address>
									<addrLine>Heidelberglaan 15</addrLine>
									<postCode>3584 CS</postCode>
									<settlement>Utrecht</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jimmy</forename><surname>Mulder</surname></persName>
							<email>jimmy.mulder@hu.nl</email>
							<affiliation key="aff0">
								<orgName type="institution">Utrecht University of Applied Sciences</orgName>
								<address>
									<addrLine>Heidelberglaan 15</addrLine>
									<postCode>3584 CS</postCode>
									<settlement>Utrecht</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Matthieu</forename><surname>Laneuville</surname></persName>
							<email>matthieu.laneuville@surf.nl</email>
							<affiliation key="aff1">
								<orgName type="institution">SURF</orgName>
								<address>
									<addrLine>Moreelsepark 48</addrLine>
									<postCode>3511 EP</postCode>
									<settlement>Utrecht</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stefan</forename><surname>Leijnen</surname></persName>
							<email>stefan.leijnen@hu.nl</email>
							<affiliation key="aff0">
								<orgName type="institution">Utrecht University of Applied Sciences</orgName>
								<address>
									<addrLine>Heidelberglaan 15</addrLine>
									<postCode>3584 CS</postCode>
									<settlement>Utrecht</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Thought-provoking Question Matrix to Guide the Development of Foundation-Model-based Applications</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">9FCCC4162027D494F84EB24241F7DFE1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:13+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>foundation models, use cases, model cards 1 (S. Leijnen) 0000-0002-9662-8489 (S. Tacoma)</term>
					<term>0000-0001-9681-863X (J. Mulder)</term>
					<term>0000-0001-6022-0046 (M. Laneuville)</term>
					<term>0000-0002-4411-649X (S. Leijnen)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Organizations feel an urgency to develop and implement applications based on foundation models: AI-models that have been trained on large-scale general data and can be finetuned to domain-specific tasks. In this process organizations face many questions, regarding model training and deployment, but also concerning added business value, implementation risks and governance. They express a need for guidance to answer these questions in a suitable and responsible way. We intend to offer such guidance by the question matrix presented in this paper. The question matrix is adjusted from the model card, to match well with development of AIapplications rather than AI-models. First pilots with the question matrix revealed that it elicited discussions among developers and helped developers explicate their choices and intentions during development.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>With the recent advent of foundation models, defined as general purpose AI-models that have been trained on large-scale data, organizations are more eager than ever to develop AI-powered applications. Foundation models have quickly built a reputation as powerful building blocks for domain-specific applications, by diminishing the need to explicate the logic needed for such applications <ref type="bibr" target="#b0">[1]</ref>. They perform well on numerous general tasks such as text and image generation, speech recognition and graph creation <ref type="bibr" target="#b1">[2]</ref>. Furthermore, with only limited further training, they can quickly outperform more traditional AI-models on a wide variety of domain-specific tasks. It is no wonder that organizations see the potential of foundation models and feel an urgency to explore use cases in which foundation models can be of added value for their organization.</p><p>Developing applications based on foundation models also raises many questions and challenges for organizations. These include short-term questions, such as whether to use available foundation models or to train an own model from scratch, whether to use an existing model as is or to finetune it with own data, and in the latter case, which data to use. Evaluating performance is also a challenge, as the capabilities of the foundation model that have been demonstrated on benchmarks may be quite distant from the capabilities required in the organization's use case. Long-term strategic topics, such as added business value, risks and governance, are also a concern <ref type="bibr" target="#b2">[3]</ref>. Added value can be conceptualized financially, in terms of return-on-investment, but also more generally, in terms of for example efficiency, effectivity and job satisfaction of the people using the applications. Regarding risks and governance, organizations have concerns about the dependency on models provided by Big Tech companies such as Microsoft (OpenAI), Google (Deepmind), and Amazon (Anthropic), the transparency of and possible bias in these models and the transference of intellectual property, especially when prompting or finetuning these models with own data.</p><p>Organizations are looking for guidance in addressing these questions and concerns. More specifically, once a use case has been identified and the decision has been made to start developing an application based on a foundation model, organizations are looking for ways to make responsible choices in this process. Many of these choices involve considering several options and weighing several perspectives (e.g., performance, financial and ethical aspects). In this paper we present a question matrix to guide reflection on these choices from different perspectives. By using this question matrix repeatedly during application development, developers are encouraged to explicate the options and considerations they have and to track the development of their thinking over time. This has the potential to foster 1) more deliberate choices in the designed application, both in terms of the perspectives considered as well as in both short-term and long-term benefits; 2) transparency about the design of the application; and 3) traceability which enables reuse of datasets, models, and other components in designing other, similar applications within the organization.</p><p>In this paper, we describe the design of and first experiences with this question matrix. We have used model cards <ref type="bibr" target="#b3">[4]</ref> as a basis for the question matrix, as further elaborated in section 2. How we have transformed the model card structure into the question matrix is described in section 3. Section 4 gives an overview of our first experiences with the question matrix. In section 5 we present our conclusions and directions for further research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Literature review: documentation approaches as the basis</head><p>When releasing AI-models, it is common practice to provide documentation with it, about the model's architecture, the (type of) data it was trained on and evaluated with, and its intended use. Such documentation fosters transparency of AI models and serves as a basis for assessment regarding compliance with legal requirements <ref type="bibr" target="#b4">[5]</ref>. Documenting the characteristics of the released model asks for explicating and motivating the choices that have been made during development. Hence, such documentation approaches foster reflection on these choices. Therefore, documentation approaches could serve as a solid starting point for designing an instrument to facilitate making these choices in a responsible way.</p><p>Most documentation approaches that have been proposed focus on data and AI models, rather than AI-systems or AI-based applications. Therefore, we chose to base our instrument on a seminal approach for documenting AI models, the model card <ref type="bibr" target="#b3">[4]</ref>. The model card approach was proposed as a framework to report on model performance characteristics and to clarify which use cases the released machine learning model is and is not intended for. An appealing characteristic of the model card is that it asks for a description of contextual factors: the variety in groups, instrumentation, and environmental factors that the model has been evaluated on. Addressing and explicating this variety can spur reflection on inclusion and diversity during development.</p><p>The model card is an example of an information sheet: a structured collection and presentation of information on different technical and non-technical aspects. Micheli and colleagues have identified three other main categories of documentation approaches: questionnaires, composable widgets and checklists. For the purpose of guiding development, and prompting discussion and reflection, questionnaires and checklists are generally more appropriate than information sheets <ref type="bibr" target="#b5">[6]</ref>. Especially questionnaires provide more in-depth coverage and hence encourage solid reflection about the use and potential misuse of the AI-model or system under consideration <ref type="bibr" target="#b4">[5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Development of the question matrix</head><p>As argued above, the model card structure provided a solid basis for an instrument to guide the development of AI-applications based on foundation models. This basis had to be expanded for two reasons. First, to suit AI-powered applications rather than AI models only, additional categories were needed to address the deployment and implementation of such applications. Second, to adjust the instrument for the purpose of providing guidance during development, rather than post-development documentation only, we reshaped it into a question matrix instead of an information sheet. In the next two subsections, we elaborate on these two adjustments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Additional categories for AI-powered applications</head><p>The model card structure consists of nine categories: Model details, Intended use, Factors, Metrics, Evaluation data, Training data, Quantitative analyses, Ethical considerations, and Caveats and Recommendations. Except for model details such as model date and version, all these categories are relevant for the purpose of providing guidance during application development. Inspiration for additional categories to address deployment and implementation of AI-applications was drawn from two dominant frameworks for AIdeployment and integration: CRISP-DM <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref> and ML-Ops <ref type="bibr" target="#b8">[9]</ref>.</p><p>The CRISP-DM cycle consists of six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation and Deployment. While Data Preparation and Modeling were judged to be fully addressed by the model card structure, for all other phases additional items were needed. For Business Understanding, additional items concerned the use case, and more specifically the aim of developing the application, specific tasks of the application, and the context in which the application was te be used. Furthermore, an item was added on the intended role of the application in the users' daily working processes. For Data Understanding we decided to add an item on data quality. For the Evaluation phase, we added a more general evaluation item besides the technical metrics for model performance, to evaluate whether the application indeed is appropriate for the task it was intended for. Finally, Deployment was not yet addressed in the model card, so items regarding maintenance and the embedding in the organization's software systems were added.</p><p>From the ML-Ops perspective, two additional themes where identified: future monitoring of model performance and addition of new data. Therefore, items addressing future monitoring and training new model versions were added.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Shaping the instrument into a question matrix</head><p>The model card consists of a list of items, divided into several categories. To prompt discussion and reflection, we reshaped the items into questions. Furthermore, we included multiple columns, thus shaping the instrument as a question matrix rather than as a questionnaire. The matrix consists of five categories: 1) Intended use, 2) Model properties, 3) Training, model performance and application performance, 4) Scope of the application (contextual factors), and 5) Implementation, maintenance and development. The first column resembles the model card: by answering the questions, developers give an overview of the current status of the AI-application under development. The second column asks to motivate the choices that have been made and to specify considerations that led to certain choices. The third column asks developers for the alternatives that they are considering or have considered during development.</p><p>The obtained question matrix was presented to two experts in the field of AI. They suggested that addressing internal organization, especially stakeholders that are to make decisions regarding implementation, would be useful, as these factors could also influence the choices that developers make. Adding these questions resulted in the final question matrix, of which all questions are presented in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">First experiences with the question matrix</head><p>The question matrix was first piloted in three Dutch media organizations. In each of these organizations a foundation-model-based application was developed. During the development, in each organization the first author conducted a one-hour interview with an involved AI-developer, using the question matrix as interview guide. In one project, a foundation model was finetuned to be adapted for a specific purpose. The other two projects concentrated on using foundation models as offered and evaluating their performance on the organization's data for specific purposes.</p><p>In all interviews approximately half an hour was needed for specifying the intended use and the datasets needed for using or finetuning the foundation models. For these topics, the interviewees generally knew which alternatives had been considered and how choices had been made. They also were clear about their choices for the foundation models that had been selected for experimentation and development.</p><p>Concerning evaluation metrics, added value, scope, implementation, maintenance and development, their answers were less clear and complete. By analyzing the interview transcripts, three types of less concrete answers were identified. First, interviewees seemed to explicate ideas for the first time during the interview. For example, interviewees used phrases such as "Now that I think of it" and "We didn't mention it explicitly, but I think so." In multiple cases, this happened for the questions concerning what was in scope and outof-scope for the application. Interviewees did not seem to have addressed this in their discussions with colleagues, but did appear to have implicit ideas about what was beyond the scope of their application, which they explicated during the interviews. Second, interviewees identified topics that had not been addressed yet in development and needed attention. This was especially the case for more technical topics such as the use of specific evaluation metrics and the way in which cross-validation could or should be used in the finetuning procedure. An interviewee pondered that "maybe these are questions that we should take into the organization", expressing a realization that more attention for these topics was needed and fellow developers and other stakeholders within the organization should be involved. Third, interviewees started developing new ideas during the interview. This especially happened in an interview with two interviewees, were answers by one interviewee seemed to ignite new ideas in the other. This shows that using the question matrix in development teams may help teams to explicate ideas, develop a shared understanding of these ideas and build on each others ideas.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions and future research directions</head><p>In this paper we presented a question matrix that is aimed at helping developers explicate their options and the consequences of their choices repeatedly during development of foundation-model-based applications. The question matrix is based on seminal approaches for documenting AI-models, and adjusted to apply to AI-applications by drawing from literature on CRISP-DM and ML-Ops. First experiences with the question matrix show that it indeed seems to encourage discussion and reflection during development. To exploit this potential, we envision that developers fill in the question matrix repeatedly during the development and deployment of a foundation-model-based application, for example at the beginning and halfway through the development project, towards the deployment phase and repeatedly during deployment.</p><p>We conjecture that filling in the question matrix also serves well as a documentation approach, especially within organizations. It fosters transparency of these applications and could enable easier reuse of data, (foundation) models and architectures for other purposes within the organization. Further research is needed to address this potential.</p><p>Another direction for future research is the completeness of this question matrix. Organizations express a desire that an instrument like this may help them avert or mitigate future risks, such as dependence on Big Tech companies and bias caused by foundation models. Using, for instance, separate ethics checklists may feel like an extra burden. Therefore, in the question matrix we have aimed to address AI-application development from multiple perspectives and throughout its lifecycle, to obtain a sense of completeness. Future research is needed to further develop and assess this completeness, for example by</p></div>		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>aligning the instrument with the practice of regulatory oversight, as will be required by the AI Act. As regulatory oversight may differ between sectors, this may lead to tailored question matrices for different sectors. Hence, evaluation of the question matrix and its completeness in various sectors is also a promising venue towards more responsible implementation of foundation-model-based applications.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. The questions in the question matrix</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Questions in the Question matrix</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Intended use</head><p>Questions in this category concern the intended use of the AI-application under development.</p><p>Purpose With what purpose is the application being developed? What is the task the application is supposed to carry out? In which context or situation is the application supposed to be used? </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">On the Opportunities and Risks of Foundation Models</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bommasani</surname></persName>
		</author>
		<idno type="DOI">10.48550/arxiv.2108.07258</idno>
		<imprint>
			<date type="published" when="2021-08">Aug. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT</title>
		<author>
			<persName><forename type="first">C</forename><surname>Zhou</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/2302.09419" />
		<imprint>
			<date type="published" when="2023-02">Feb. 2023. Mar. 21, 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">An agile framework for trustworthy AI</title>
		<author>
			<persName><forename type="first">S</forename><surname>Leijnen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Aldewereld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van Belkom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bijvank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ossewaarde</surname></persName>
		</author>
		<ptr target="https://www.academia.edu/download/76467298/leijnen.pdf" />
	</analytic>
	<monogr>
		<title level="m">NeHuAI@ ECAI</title>
				<imprint>
			<date type="published" when="2020-04-15">2020. Apr. 15, 2024</date>
			<biblScope unit="page" from="75" to="78" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Model cards for model reporting</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mitchell</surname></persName>
		</author>
		<idno type="DOI">10.1145/3287560.3287596</idno>
	</analytic>
	<monogr>
		<title level="m">FAT* 2019 -Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency</title>
				<imprint>
			<date type="published" when="2019-01">Jan. 2019</date>
			<biblScope unit="page" from="220" to="229" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The landscape of data and AI documentation approaches in the European policy context</title>
		<author>
			<persName><forename type="first">M</forename><surname>Micheli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Hupont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Delipetrev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Soler-Garrido</surname></persName>
		</author>
		<idno type="DOI">10.1007/S10676-023-09725-7</idno>
	</analytic>
	<monogr>
		<title level="j">Ethics Inf Technol</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">4</biblScope>
			<date type="published" when="2023-12">Dec. 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Madaio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Stark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Wortman</forename><surname>Vaughan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</author>
		<idno type="DOI">10.1145/3313831.3376445</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems</title>
				<meeting>the 2020 CHI Conference on Human Factors in Computing Systems<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2020-04">Apr. 2020</date>
			<biblScope unit="page" from="1" to="14" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A systematic literature review on applying CRISP-DM process model</title>
		<author>
			<persName><forename type="first">C</forename><surname>Schröer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Kruse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Gómez</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.procs.2021.01.199</idno>
	</analytic>
	<monogr>
		<title level="j">Procedia Comput Sci</title>
		<imprint>
			<biblScope unit="volume">181</biblScope>
			<biblScope unit="page" from="526" to="534" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">CRISP-DM 1.0 Step-by-step data mining guide</title>
		<author>
			<persName><forename type="first">P</forename><surname>Chapman</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:59777418" />
		<imprint>
			<date type="published" when="2000-03-22">2000. Mar. 22, 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Machine Learning Operations (MLOps): Overview, Definition, and Architecture</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kreuzberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kuhl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hirschl</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2023.3262138</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="31866" to="31879" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
