The influence of audiovisual elements on the realism of generative AI videos: the case of Sora Alberto Sanchez-Acedo1 , Alejandro Carbonell-Alcocer1,∗ , Pasquale Cascarano2 , Shirin Hajahmadi3 , Giacomo Vallasciani2 , Manuel Gertrudix1 and Gustavo Marfia2 1 Department of Audiovisual Communication and Advertising, Rey Juan Carlos University, Camino del Molino, 5, 28942 Fuenlabrada, Madrid, Spain 2 Department of the Arts, University of Bologna, Via Barberia 4, 4013 Bologna, Italy 3 Department of Computer Science and Engineering, University of Bologna, Via Mura Anteo Zamboni 7, 40126 Bologna, Italy Abstract Generative Artificial Intelligence (Gen-AI) tools are in the spotlight in every professional field. In the last decade, artificial intelligence technologies that are capable of creating content in various formats such as texts, images, audios, or videos, have emerged. Among the well known tools are those developed by OpenAI, such as ChatGPT, DALL⋅E and Sora. They can generate text, images, and videos respectively, with the help of instructions given in the form of prompts, in an accessible and efficient way. This study aims to evaluate the attraction, composition and realism of Gen-AI videos in comparison to real videos. Therefore, a quasi-experimental design is conducted using a validated survey with two groups. The experimental group contains two videos produced by Sora as stimuli, while the control group contains two real videos. The results highlight key factors influencing perceived realism, such as natural lighting, saturation, color and perspective. However, the videos that Sora can generate have such a great degree of realism in terms of audiovisual composition that it will be necessary to educate people on the subject of content generation with artificial intelligence to prevent disinformation. Keywords Artificial Intelligence, Sora, Videos generated with AI, Text-to-video, Audiovisual analysis, Experiment, 1. Introduction Generative Artificial Intelligence (Gen-AI) is a specialised field of Artificial Intelligence (AI) that deals with the generation of human-like texts, the creation of images from written descriptions and the production of videos based on predefined instructions [1]. Today, the potential of these Gen-AI tools is the subject of considerable debate on various key issues, ranging from the quality and authenticity of the content created to the ethical implications of their use[2, 3, 4, 5]. However, the ability of Gen-AI to produce original materials in various forms has made a significant impact in various sectors such as creative industries, manufacturing, design, entertainment, and education [2, 5, 6]. Many researchers working for companies and academia have focused their efforts on developing efficient and accessible Gen-AI tools for content creation. Most notably, OpenAI’s GPT series, which began with its first release in 2018 and was followed by GPT-2, GPT-3 and GPT-4, as well as its conversational variant, ChatGPT, have significantly impacted the landscape of text generation [1, 7]. GPT is built on the principles of Large Language Models (LLMs) [8] which are designed to process and generate natural language text. The outstanding performances of LLMs, in synthesizing complex information, whether in the form of text or images, stems from the use of advanced techniques such as positional encoding and attention mechanisms [8]. Moreover, the main core of LLMs are complex neural network architectures like Transformers which represent the state-of-the-art for numerous natural language tasks [9, 10]. Later, in 2021, OpenAI continued to push the boundaries of generative AI by releasing DALL⋅E, a tool capable of generating images based on textual descriptions [11]. While GPT focuses on generating coherent and contextually relevant texts based on input prompts, DALL⋅E integrates linguistic and International Workshop on Artificial Intelligence and Creativity (CREAI), co-located with ECAI 2024. ∗ Corresponding author. © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings visual information and extends this capability to visual content generation [11]. The tool employs the same Transformer architecture as GPT-3 [11]. Unlike traditional models that handle either text or images, DALL⋅E is a multimodal model [12, 11], meaning that it can understand both types of data, integrating them in creative ways. The achievements reached by GPT and DALL⋅E have significantly influenced the text-to-video domain, culminating in the remarkable capabilities showcased by OpenAI’s Sora [13, 14, 15] realised in 2024. Sora is an AI model capable of creating realistic and imaginative scenes from text instructions [16, 14]. Similarly to GPT and DALL⋅E, Sora can analyze text and understand intricate user directives. The process of generating videos is based on a diffusion Transformer architecture [9, 10, 17]. This process begins with a video resembling static noise and progressively refines it by removing the noise over many steps introducing details based on the provided text prompt [17]. Sora is notable for its capability to create up to 1-minute long videos, ensuring a strict adherence to user text instructions while delivering high visual quality and maintaining strong visual coherence, thus allowing users to provide visual contents from, even complex, text narratives [13, 16]. The results of the productions made with Sora are highly realistic and applicable to a multitude of professional fields [15, 14, 18] using a prompt that can be as specific and detailed as the user wishes. These outstanding results might become controversial. On the one hand, these Gen-AI tools offer tremendous benefits, such as enhancing creativity [19]. On the other hand, since a false perception of the world is possible by making indistinguishable what is real from what has been produced with Gen-AI tools [20, 21], they also raise concerns about misinformation leading to the spread of fake news. Therefore, social awareness is necessary for the verification of exposed stimuli [22], as well as personal judgement in recognising AI-generated materials [23]. From an ethical point of view, the use of these models can lead to privacy violations, as they might inadvertently reveal sensitive information embedded in their training data. Furthermore, they can perpetuate and even amplify biases present in their training datasets, leading to unfair or discriminatory outcomes [24, 25, 26]. The deployment of Gen-AI thus demands careful consideration of these factors, emphasizing transparency, accountability, and the implementation of robust ethical guidelines to mitigate potential harms [27]. This study aims to evaluate the attraction, composition, and realism of Gen-AI videos compared to real videos. According to the manual written by Achi [28], multimedia data must adhere to some standards in terms of audiovisual recordings, such as lighting, colour, or scale, which are the most relevant attributes for visual realism [29]. As a case study, we focus on two landscape videos available on Sora’s website. The first video is a recreation of Santorini (Greece), while the second one showcases the Amalfi Coast (Italy). Additionally, we consider two real videos reporting the same content. Through a detailed survey, the study measures attraction via parameters such as illumination, saturation, colourfulness, brightness, and sharpness. Composition is assessed by evaluating the video quality, the presence of shadows, focus, perspective, and shot range. Furthermore, the level of realism of the Sora videos is assessed by determining if the videos appear natural, contain fine details, and resemble drone footage, as well as whether respondents recognize the location and believe the video to be real. Finally, we identify which aspects of attraction and which compositional elements most significantly affect the perceived realism of the Gen-AI videos. For these reasons, we seek answers to the following research questions: • R.Q.1. How do respondents perceive the attraction and composition of AI-generated videos compared to real videos depicting the same landscapes and environments? • R.Q.2. What are the key attraction and composition elements that influence the perceived realism of Gen-AI videos of landscapes? The paper is organized as follows. Section 2 outlines the methodological design of the survey conducted. In Section 3, we analyze the obtained results. Finally, Section 4 discusses these results, offering valuable insights into the research questions and concluding with a discussion of the study’s limitations. 2. Research methodology The research objective is to evaluate the attraction, composition and realism of Gen-AI videos compared to real videos, and therefore a quasi-experiment will be conducted. For this reason, an ad hoc survey including either real or Gen-AI videos of landscapes, has been developed. Since the possibilities of producing videos using Gen-AI tools are still limited and the release of Sora is imminent, two landscape videos have been selected from those available on the Sora website. The methodological approach of this research is to make a first approach in the field of Generative AI in video generation and to study how it affects its use in university students. The first is a recreation of Santorini (Greece) and the second shows the Amalfi Coast (Italy). For the selection of the real videos, a search was carried out on Youtube, selecting those that are in the same location and have similar audiovisual characteristics [30, 31]. All videos were customised to have the same duration and resolution. Figure 1 shows a frame for each video. In the validation process, the experts considered the videos to be similar in both similarity and format. (a) Amalfi Coast (real) (b) Amalfi Coast (Sora) (c) Santorini (real) (d) Santorini (Sora) Figure 1: Selected frames of real and Sora’s videos. (a) Frame of a real video of the Amalfi Coast. (b) Frame of a Sora’s video of the Amalfi Coast. (c) Frame of a real video of Santorini. (d) Frame of a Sora’s video of Santorini. The design of a quasi-experiment is based on the construction of two groups, a control group and an experimental group, to which a stimulus is exposed. [32] The quasi-experiment design is based on the collection of information by means of a self-administered online survey [33]. To ensure that the quasi-experiment design fits the research objective, it undergoes validation by expert judges (n=12). Experts in the field of computer science communication and artificial intelligence were selected for this purpose. They were provided with a guide explaining in detail the procedure and method of the experiment, as well as the questions in the questionnaire for collecting information. The purpose of this process is to ensure that there is a degree of agreement in terms of univocity and relevance. [34]. The validation includes the procedure of the quasi-experiment and the questionnaire administered. The questionnaire (see Table 1) is designed as an information collection system based on the validated framework proposed in [29] for the human characterisation of visual realism in images. Since the ques- tionnaire focuses on variables related to videos, the structure of the questions is modified accordingly, while the variables related to realism, attraction, and composition are maintained. The questionnaire is structured in two sections. The first section is designed to collect socio-demographic variables (P1-P5). The second section contains two videos to be evaluated independently in terms of realism, attraction and composition. In particular, the questions P6-P11 aim at evaluating the level of attraction, P12-P15 address the key elements of composition and, finally, P16-P20 focus on realism. The concepts covered by the survey were explained before the questionnaire was carried out in order to reduce bias in the interpretation of the questions. Table 1: Survey questions and variables Question Items Variable ID Section 1: Sociodemographic Q1 How old are you? Age Q2 Gender. How do you identify? Gender Q3 Are you currently working? Professional activity Q4 If yes, what is your current position? Professional activity Q5 Which country are you from? Geographical Section 2: Survey Q6 How does the illumination appear to you? Attraction (1) Natural (2) Slightly natural (3) Not clearly natural or unnatural (4) Slightly unnatural (5) Unnatural Q7 How does the saturation appear to you? Attraction (1) Very saturated (2) Fairly saturated (3) Neutral (4) Slightly saturated (5) Without saturation Q8 How does the colour appear to you? Attraction (1) Very colourful (2) Slightly colourful (3) Neutral (4) Slightly uncolorful (5) Uncolorful Q9 How does the brightness appear to you? Attraction (1) Very bright (2) Fairly bright (3) Neutral (4) Slightly bright (5) Without bright Question Items Variable ID Q10 How does the sharpness appear to you? Attraction (1) Very sharp (2) Moderately sharp (3) Neither sharp nor blurry (4) Moderately blurry (5) Very blurry Q11 What’s the quality of the video? Attraction (1) High quality (2) Moderately high quality (3) Medium quality (4) Moderately low quality (5) Very low quality Q12 Do you see shadows in the image? Attraction (1) Definitely yes (2) Probably yes (3) Not clearly yes or no (4) Probably no (5) Definitely not Q13 Does the video appear to have objects well focused? Composition (1) Definitely yes (2) Probably yes (3) Not clearly yes or no (4) Probably not (5) Definitely not Q14 Does the perspective of the video appear natural? Composition (1) Definitely natural (2) Moderately natural (3) Not clearly natural or unnatural (4) Moderately unnatural (5) Definitely unnatural Q15 Does the video appear to be a close-range shot or Composition distant view shot? (1) Very close range (2) Moderately close range (3) Between close and distant (4) Moderately distant view (5) Very distant view Q16 Do you recognize the location of the video? Realism (1) Definitely yes (2) Probably yes (3) Not clearly yes or no (4) Probably no (5) Definitely not Q17 Does the colour in the video appear natural? Realism (1) Definitely yes (2) Probably yes (3) Not clearly yes or no (4) Probably no (5) Definitely not Question Items Variable ID Q18 Does the image contain fine details? Realism (1) Definitely yes (2) Probably yes (3) Not clearly yes or no (4) Probably no (5) Definitely not Q19 Does this video look like it is a video taken by a Realism drone? (1) Definitely yes (2) Probably yes (3) Not clearly yes or no (4) Probably no (5) Definitely not Q20 Do you think the video is real? Realism (1) Definitely yes (2) Probably yes (3) Not clearly yes or no (4) Probably no (5) Definitely not As the chosen method is a quasi-experimental design, the survey is carried out on two groups. The survey for the control group contains two videos of real landscapes recorded by a professional as stimuli, while the survey for the experimental group contains two landscape videos produced by Gen-AI as stimuli. A non-probabilistic sample is selected as the results of the study aim to collect data to get more insights about the phenomenon of video production with Gen-AI tools. University students with a background in visual arts, theatre and music are therefore taking part in the study. Allocation to the individual groups was randomised and in proportion to each other. The data was collected in April 2024. Data was collected from n=62 participants, 28 from the control group and 34 from the experimental group. An online survey was used for data collection. All participants are young university students in the field of computer science, communication and new technologies. 3. Results In this section, we report some statistics derived from the survey for both the control and experimental groups in order to seek for answers to the research questions R.Q.1 and R.Q.2 outilined in Section 1. The answers to the socio-demographic questions (Q1, Q2, Q3, Q4, Q5 in Table 1) show that the average age of the participants is 21 years, of which 68% are female and 27% are male. The 81% of respondents are not employed and 71% of them are Italian. Concerning the variables attraction, composition, and realism, the results are presented below in percentages, distinguishing between real videos and videos generated with Sora. We first focus on the attraction variable by analyzing the answers to Q6-Q12. We found significant differences in terms of distribution between real and AI videos when evaluating the ”illumination” (Q6). The bar plots are shown in Figure 2. Participants consider this item, in the case of the Sora’s Santorini video, to be predominantly unnatural or slightly unnatural (91%) in the case of the Santorini video by Sora, while in the case of real Santorini video, the lighting is considered more natural and only 32% of respondents consider it slightly unnatural. For the rest of the items, such as saturation (Q7), colour (Q8), brightness (Q9) or sharpness (Q10), the differences between the real videos and Sora are more neutral and less significant. In the case of saturation, for example, participants mostly considered the four videos to be quite saturated (50% for Figure 2: The bar plots depict the responses gathered for question Q6 which assesses the factor of ”illumination” concerning the variable of attraction. the real video of Santorini and 44% for the Sora’s one). The same applies to colour, where participants mostly rated the videos as slightly colourful, regardless of whether it was a real or AI-generated video (57% in the case of the real Amalfi video; 50% in the case of the real Santorini video; 71% in the case of the AI Amalfi video and 47% in the case of the AI Santorini video). In terms of sharpness, the majority of participants felt that all videos were neither sharp nor blurred. Regarding the quality of the videos (Q11), most of them are categorised as of medium quality, with no significant differences between the distribution of the Sora’s videos and the real ones. In summary, if we compare the Sora and real videos, in the case of Santorini there are major differences, especially in the lighting, and less in the saturation or colour elements. We point out that Sora’s video of Santorini stands out as very colourful and saturated compared to the other stimuli. In the case of Amalfi, the differences of the distribution between Sora and real videos in terms of saturation, colour and lighting are small. We now focus on the composition variable by analyzing the answers to Q13-Q16 which assess the degree of focusing, the perspective and the camera distance. Concerning audiovisual features such as the degree of focusing (Q13), the majority of participants consider real videos better focus if compared to Sora’s ones. More precisely, the percentage of participants perceiving the videos well focused are: 57% for the real video of Amalfi; 64% for the real video of Santorini; 41% in the video of Sora of Amalfi and 47% in the video of Sora of Santorini. Another important compositional element that was analyzed is perspective (Q14). The barplots are shown in Figure 3. For the real videos, most of the respondents believe that the perspective appears natural. For the Sora videos, however, the range of answers is wider, which indicates that the perspective can be perceived as neither natural nor unnatural. Finally, concerning the camera distance (Q16) all the videos are perceived with a moderately or very distant view. By analyzing the answers to the questions about the realism variable (Q16, Q17, Q18, Q19, Q20), Figure 3: The bar plots depict the responses gathered for question Q14 which assesses the factor of ”perspective” concerning the variable of composition. it turns out that the majority of participants recognise Sora’s video about Santorini as false (35%) or probably false (32%). In the case of the AI generated video of Amalfi, participants predominantly (32%) recognise it as probably true. As with the real videos shown in the control group, the majority of participants recognise the Santorini video as probably true (43%). For the Amalfi video, 36% of participants think it is probably false and 32% think it is probably true (Q20). The results are shown in Figure 4. For both the real video of Amalfi and its AI-generated counterpart, participants mostly do not recognize the location. In contrast, participants do recognise the location of Santorini for the most part in both experimental groups (Q16). The results are shown in Figure 5. 4. Discussion and Conclusions The results of the experiment are based on the assumption that Gen-AI tools for video generation, in particular Sora, are capable of generating realistic content that is practically impossible to differentiate from a real video [35]. In order to answer to R.Q.1. ”How do respondents perceive the attraction and composition of AI-generated videos compared to real videos depicting the same landscapes and environments?”, in the Section 3 we reported the results obtained by analyzing some items of the attraction and composition variables. The results reveal distinct perceptions of attraction and composition between AI-generated and real videos, thus highlighting both technological limitations and areas of potential improvement in AI video generation. Concerning the attraction variable, we observed the lighting in AI-generated videos, particularly Sora’s Santorini video, was largely deemed unnatural, in contrast with the more natural lighting perceived in real videos. Despite this, other attributes such as saturation, colour, brightness, and sharpness showed less pronounced differences, indicating that AI-generated videos can achieve a Figure 4: The bar plots depict the responses gathered for question Q20 which assesses the realism. comparable aesthetic quality in these areas. These results suggest that Sora can replicate certain visual aspects, but it can struggles with replicating natural lighting, which is a critical component of realism [29]. Concerning the composition, the main differences between AI-generated videos and real videos were observed regarding the perspective item. The perspective in AI videos was also perceived less consistently, often seen as neither entirely natural nor unnatural. We now try to seek answers to R.Q.2. ”What are the key attraction and composition elements that influence the perceived realism of Gen-AI videos of landscapes?”. It is evident that attraction variable influences the perceived realism of Sora videos. The results indicate that participants could easily distinguish the AI-generated Santorini video due to its unnatural illumination, despite slight differences in saturation and colour. This highlights that elements such as illumination, saturation, and colour are key factors in recognizing an AI-generated video. Conversely, when these elements are closely matched between real and AI-generated videos, as seen with the Amalfi videos, it becomes more challenging for participants to identify the AI-generated content. In these cases, the majority of participants did not recognize the Sora-generated videos as artificial, suggesting that a high degree of similarity in these compositional and attraction elements can enhance the perceived realism of AI-generated videos. The key compositional elements influencing the perceived realism of Gen-AI videos of landscapes include the perspective. Specifically, the natural appearance of perspective in real videos contrasts with the broader range of perceptions for AI-generated videos, where perspectives were often seen as neither natural nor unnatural. Overall, while AI-generated videos are making strides in matching the visual appeal of real videos, significant challenges remain in achieving complete realism, particularly in aspects like lighting and focus that contribute heavily to the perceived naturalness of a scene. These insights underscore the importance of further advancements in AI video synthesis to enhance the authenticity and visual coherence of generated content. In terms of location recognition, Sora is able to generate highly realistic videos that resemble real Figure 5: The bar plots depict the responses gathered for question Q16 which assesses the realism. locations, as demonstrated in the cases of Santorini and Amalfi [21, 36]. Participants were more likely to recognize Santorini because iconic elements, such as the blue domes, were accurately recreated. In contrast, Amalfi lacks such iconic features, making it less recognizable. The era of Sora has just begun and this initial research on the realism of its videos clearly indicates that they are already difficult to distinguish from reality. However, this research suggest that for video generation to achieve results that closely mimic reality, factors such as attraction and composition must be considered must be considered to increase the level of realism. Additionally, there is a need to educate viewers on the potential for AI-generated videos. Consequently, regulations have been developed to control such content, and various studies on Sora emphasize the importance of addressing ethical risks related to misinformation[18, 37]. This study faces several limitations. First of all, the accessibility of the Sora tool. At the time of the experiment, Sora was not available to the general public and the study had to rely on default videos provided by the tool, without the ability to explore its full capabilities through detailed instructions. Furthermore, the study exclusively utilized the Sora tool, as other text-to-video AI tools were deemed less effective in producing realistic results. Consequently, Sora is currently regarded as the most efficient tool for generating highly realistic videos using artificial intelligence. Since there is no research of this kind, this experiment is a first approximation to the work of creating videos with AI tools and the results are not generalisable to the rest of the population. To strengthen the findings of this research, replication of the experiment with a larger and more diverse sample of participants across various educational and professional backgrounds is necessary, as well as to compare and study other types of videos that are generated with the Sora tool. Funding This work was supported by the Autonomous Community of Madrid (Spain) with a grant for industrial doctorates (IND2022/SOC-23503) with the collaboration agreement with Prodigioso Volcán S.L; Univer- sidad Rey Juan Carlos (ID 501100007511) with a grant call for Personnel in Training 2020 (PREDOC 20-008). This study was carried out within the MICS (Made in Italy – Circular and Sustainable) Extended Partnership and received funding from the European Union Next-GenerationEU (Piano Nazionale di ripresa e resilienza (PNRR) – Missione 4 Componente 2, Investimento 1.3 - D.D. 1551.11-10-2022, PE00000004). References [1] Y. Cao, S. Li, Y. Liu, Z. Yan, Y. Dai, P. S. Yu, L. Sun, A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt, arXiv preprint arXiv:2303.04226 (2023). [2] F. Fui-Hoon Nah, R. Zheng, J. Cai, K. Siau, L. Chen, Generative ai and chatgpt: Applications, challenges, and ai-human collaboration, 2023. [3] F. E. Babl, M. P. Babl, Generative artificial intelligence: Can chatgpt write a quality abstract?, Emergency Medicine Australasia 35 (2023) 809–811. [4] J. Joosten, V. Bilgram, A. Hahn, D. Totzek, Comparing the ideation quality of humans with generative artificial intelligence, IEEE Engineering Management Review (2024). [5] Z. Epstein, A. Hertzmann, I. of Human Creativity, M. Akten, H. Farid, J. Fjeld, M. R. Frank, M. Groh, L. Herman, N. Leach, et al., Art and the science of generative ai, Science 380 (2023) 1110–1111. [6] E. A. Alasadi, C. R. Baiz, Generative ai in education and research: Opportunities, concerns, and solutions, Journal of Chemical Education 100 (2023) 2965–2971. [7] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al., Gpt-4 technical report, arXiv preprint arXiv:2303.08774 (2023). [8] K. S. Kalyan, A survey of gpt-3 family large language models including chatgpt and gpt-4, Natural Language Processing Journal (2023) 100048. [9] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al., A survey of large language models, arXiv preprint arXiv:2303.18223 (2023). [10] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Fun- towicz, et al., Transformers: State-of-the-art natural language processing, in: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 2020, pp. 38–45. [11] J. Betker, G. Goh, L. Jing, T. Brooks, J. Wang, L. Li, L. Ouyang, J. Zhuang, J. Lee, Y. Guo, et al., Improv- ing image generation with better captions, Computer Science. https://cdn. openai. com/papers/dall- e-3. pdf 2 (2023) 8. [12] M. Suzuki, Y. Matsuo, A survey of multimodal deep generative models, Advanced Robotics 36 (2022) 261–278. [13] OpenAI, Creating video from text sora is an ai model that can create realistic and imaginative scenes from text instructions. (2024). URL: https://openai.com/index/sora/. [14] Y. Liu, K. Zhang, Y. Li, Z. Yan, C. Gao, R. Chen, Z. Yuan, Y. Huang, H. Sun, J. Gao, et al., Sora: A review on background, technology, limitations, and opportunities of large vision models, arXiv preprint arXiv:2402.17177 (2024). URL: https://doi.org/10.48550/arXiv.2402.17177. [15] R. Sun, Y. Zhang, T. Shah, J. Sun, S. Zhang, W. Li, H. Duan, B. Wei, R. Ranjan, From sora what we can see: A survey of text-to-video generation, arXiv preprint arXiv:2405.10674 (2024). [16] T. Brooks, B. Peebles, C. Holmes, W. DePue, Y. Guo, L. Jing, D. Schnurr, J. Taylor, T. Luhman, E. Luhman, C. Ng, R. Wang, A. Ramesh, Video generation models as world simulators. openai. (2024). URL: https://openai.com/research/video-generation-models-as-world-simulators. [17] W. Peebles, S. Xie, Scalable diffusion models with transformers, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4195–4205. [18] A. J. Adetayo, A. I. Enamudu, F. M. Lawal, A. O. Odunewu, From text to video with ai: the rise and potential of sora in education and libraries, Library Hi Tech News (2024). URL: https: //doi.org/10.1108/LHTN-02-2024-0028. [19] A. R. Doshi, O. Hauser, Generative artificial intelligence enhances creativity, Available at SSRN (2023). [20] J. Fernández Mateo, et al., Realidad artificial. un análisis de las potenciales amenazas de la inteligencia artificial. (2023). [21] R. H. Mogavi, D. Wang, J. Tu, H. Hadan, S. A. Sgandurra, P. Hui, L. E. Nacke, Sora openai’s prelude: Social media perspectives on sora openai and the future of ai video generation, arXiv preprint arXiv:2403.14665 (2024). URL: https://doi.org/10.48550/arXiv.2403.14665. [22] J. E. Suárez-Roca, G. L. Vélez-Bermello, Verificación de los hechos: Aplicación metodológica en el medio de comunicación el bacán, Revista Científica Arbitrada de Investigación en Comunicación, Marketing y Empresa REICOMUNICAR. ISSN 2737-6354. 5 (2022) 163–184. URL: https://doi.org/10. 46296/rc.v5i9.0042. [23] C. Belloch, Las tecnologías de la información y comunicación en el aprendizaje, Departamento de Métodos de Investigación y Diagnóstico en Educación. Universidad de Valencia 4 (2012) 1–11. URL: https://bit.ly/468T21C. [24] A. Ara, A. Ara, Exploring the Ethical Implications of Generative AI, IGI Global, 2024. [25] B. Obrenovic, X. Gu, G. Wang, D. Godinic, I. Jakhongirov, Generative ai and human–robot interaction: implications and future agenda for business, society and ethics, AI & SOCIETY (2024) 1–14. [26] K. Wach, C. D. Duong, J. Ejdys, R. Kazlauskaitė, P. Korzynski, G. Mazurek, J. Paliszkiewicz, E. Ziemba, The dark side of generative artificial intelligence: A critical analysis of controversies and risks of chatgpt, Entrepreneurial Business and Economics Review 11 (2023) 7–30. [27] N. Díaz-Rodríguez, J. Del Ser, M. Coeckelbergh, M. L. de Prado, E. Herrera-Viedma, F. Herrera, Connecting the dots in trustworthy artificial intelligence: From ai principles, ethics, and key requirements to responsible ai systems and regulation, Information Fusion 99 (2023) 101896. [28] M. C. R. Achi, Manual de Formación Audiovisual, Cholsamaj Fundacion, 2004. [29] S. Fan, T.-T. Ng, B. L. Koenig, J. S. Herberg, M. Jiang, Z. Shen, Q. Zhao, Image visual realism: From human perception to machine computation, IEEE transactions on pattern analysis and machine intelligence 40 (2017) 2180–2193. URL: https://doi.org/10.1109/TPAMI.2017.2747150. [30] R. Shirley, Top 10 places on the amalfi coast - 4k travel guide (2021). URL: https://www.youtube. com/watch?v=Mupom-sgjAU. [31] G. P. Pro, Santorini, greece - 4k uhd drone video (2021). URL: https://www.youtube.com/watch? v=rXlqSYZOGnQ&t=456s. [32] C. A. R. Galarza, Diseños de investigación experimental, CienciAmérica: Revista de divulgación científica de la Universidad Tecnológica Indoamérica 10 (2021) 1–7. URL: http://dx.doi.org/10. 33210/ca.v10i1.356. [33] C. Ramos-Galarza, Editorial: Diseños de investigación experimental. cienciamérica, 10 (1), 1-7, 2021. [34] J. Escobar-Pérez, Á. Cuervo-Martínez, Validez de contenido y juicio de expertos: una aproximación a su utilización, Avances en medición 6 (2008) 27–36. URL: https://bit.ly/3IlxiDV. [35] T. Brooks, B. Peebles, C. Homes, W. DePue, Y. Guo, L. Jing, D. Schnurr, J. Taylor, T. Luhman, E. Luhman, et al., Video generation models as world simulators, 2024. [36] M. Kustudic, G. F. N. Mvondo, A hero or a killer? overview of opportunities, challenges, and implications of text-to-video model sora, Authorea Preprints 10 (2024). URL: https://doi.org/10. 36227/techrxiv.171207528.88283144/v1. [37] J. Cho, F. D. Puspitasari, S. Zheng, J. Zheng, L.-H. Lee, T.-H. Kim, C. S. Hong, C. Zhang, Sora as an agi world model? a complete survey on text-to-video generation, arXiv preprint arXiv:2403.05131 (2024). URL: https://doi.org/10.48550/arXiv.2403.05131.