=Paper=
{{Paper
|id=Vol-3037/paper12
|storemode=property
|title=Cloud Application for the Generation of Static Websites Through the Recognition of Wireframes using Artificial Intelligence
|pdfUrl=https://ceur-ws.org/Vol-3037/paper12.pdf
|volume=Vol-3037
|authors=Cesar Gutierrez,Rodrigo Lara,Daniel Subauste
}}
==Cloud Application for the Generation of Static Websites Through the Recognition of Wireframes using Artificial Intelligence==
Cloud Application for the Generation of Static Websites Through the Recognition of Wireframes using Artificial Intelligence Cesar Gutierrez 1, Rodrigo Lara 1 and Daniel Subauste 1 1 Universidad Peruana de Ciencias Aplicadas, Prolongación Primavera 2390, Lima, 15023, Perú Abstract Nowadays, companies need to have a presence on the Internet to offer their products or services. This involves high costs and long lead times, as well as specialized personnel in web development. Therefore, we propose the implementation of a solution that allows the generation of static web pages from hand-drawn drawings. This solution allows users to automate the process of creating HTML and CSS code, reducing time and cost. In this research, a model based on the standard nomenclature of the basic wireframes of a web page was trained and then ordered using a tree-based algorithm. The results show a reduction in the time and cost invested by developers in the wireframe to source code transformation process. Also, the acceptance of users who have no knowledge of HTML and CSS is evident, as they find the tool a simple way to generate web pages. to generate web pages. Keywords 1 Computer Vision, Wireframe, Web Page, N-ary tree 1. Introduction In recent years, companies have increased the need to have a website, due of covid-19 pandemic where they have had to develop or update their own web pages to have more presence on the Internet [1]. However, for web development, it is necessary to invest time and money, as well as in pre- development designs. These previous designs are represented through a wireframe or mockup. The former is a low-fidelity version of the product that is hand-drawn or made through software, while the latter is a high-fidelity design that includes colors, images, and text and consumes more resources to create than the first [2]. The opportunity to have a tool that allows the automatic generation of static web pages from a handmade design will allow more users to have in less time and at a lower cost a website that allows them to have a presence on the Internet. There are currently proposals to solve this problem, but they are limited, that is, they present as functionality to enter a wireframe image and generate HTML code. In our research we propose additional functionalities that allow the user to develop a customized web page. In general, the main contributions of this document are summarized below: A web application was built to allow the generation of web pages from pattern recognition in wireframes. The proposed solution improves the transformation process from wireframes to user interface since it reduces time and costs. A model was trained using Azure Custom Vision, which allows the recognition of previously standardized components of a web page. CISETC 2021: International Congress on Educational and Technology in Sciences, November 16-18, 2021, Chiclayo, Peru EMAIL: u201611510@upc.edu.pe (C. Gutierrez); u201614682@upc.edu.pe (R. Lara); daniel.subauste@upc.pe (D. Subauste) ORCID: 0000-0002-5126-882X (C. Gutierrez); 0000-0001-8315-5275 (R. Lara); 0000-0003-1131-1384 (D. Subauste) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) A tree-based algorithm was built for the ordering of components by rows and columns using the Bootstrap grid. A series of experiments were conducted on a group of users to evaluate the performance of our proposal. The transformation results are more accurate compared to other solutions. This paper is organized in 7 sections. In section 2, the context is developed. Section 3 describes the work related to our proposal. Section 4 presents the web solution (Wire2web), detailing the model training guidelines, the implementation process and the algorithms used. Section 5 explains the validation of the proposed web solution. Finally, section 6 presents the conclusions. 2. Context 2.1. Artificial Intelligence (AI) The term artificial intelligence (AI) refers to any human-like intelligence exhibited by a computer, robot, or other machine [3] . The main research fields of AI include expert system, machine learning, pattern recognition, natural language understanding, and so on. In addition, there are application fields of AI, such as virtual reality, machine translation, computer vision, etc. The latter being the one we will use in our solution [4]. 2.2. Computer Vision Computer vision is one of the branches of computer science that has experienced a remarkable growth in recent years, both in face and object detection. It also presents a sequence of stages in which the image is processed at different levels, in addition to taking actions or making recommendations based on that information [5]. Computer vision needs to be trained with a large amount of data until it identifies distinctions and finally recognizes images. 2.3. Web Page Web pages are documents that are written in HTML and can be stored on a computer or on a remote web server [6]. These are divided into two types. Firstly, static web pages have the main functionality of being informative and are stored as simple files that are then served by a web server [7] . Secondly, dynamic web pages allow a web page to communicate with a server and change its content without visiting a new page or updating the previous one and offer greater interactivity with visiting users [8]. 2.4. Wireframe A wireframe is a static, low-fidelity representation of a final product, and is made up of several visual components, represented in a simplified way, that aim to show the location of each of them together [9]. 2.5. N-ary Tree An n-ary tree, of height h, is a tree whose nodes that are at a maximum distance of h - 1 from the root, have n child nodes, these children are known as leaves since there are no nodes below them [10]. 3. Related Works In this section we examine the main research related to our project. First, with respect to automating the process of transforming hand-drawn drawings to source code, we found articles proposing software solutions. One research is the one shown in sketch2code [11] where they develop a system capable of generating web pages from hand-drawn sketches. This research proposes the following process: dataset development, model training and application implementation. Such proposal generates a significant impact on our research since we perform a similar process. Another research is the one shown in Pix2code [12], which is a system based on convolutional and recurrent neural networks that allows code generation from a GUI screenshot as input. That research proposes a model, which we took as a reference to realize the wireframe standards for our project. In addition, the application implemented by [12] is named Uizard, which presents several functionalities that we take as a reference, such as uploading a wireframe, editing the generated view, and relating views. 4. Proposal This section will show the solution development process. For this purpose, we propose five processes divided into two stages. In the first stage, we explain the process for the construction of our dataset and the model training using computer vision techniques. In the second stage, the implementation of the web application is explained, as well as the algorithm used and the functionalities that the application will present. The 5 processes will be explained below. 4.1. Data Set First, we evaluated the composition of the wireframes, where we obtained as a result that these drawings are composed of components. Second, we performed component standardization, where we investigated about the most used components in web pages. These standards were obtained as a reference from Justinmind, Uizard and Scketch2code, which are platforms where wireframes are designed or used. Finally, 12 components were obtained, which were represented in handmade drawings or also known as wireframes. Table 1 Standard wireframe components Components Hand drawing Components Hand drawing Components Hand drawing Circle image Square image Text Text Area Input Number Input Button Combo box Radio Button off Radio Button Checkbox on Checkbox off on 4.2. Model Training To develop the model, training was carried out to recognize the components of the wireframes. To carry out this process, a cloud service that uses computer vision to detect objects was used. In addition, the processing capacity and the price-capacity ratio were taken as variables to choose this service. Then we concluded that Azure Custom Vision will be used since it meets the requirements that our project needs. To train the model, two iterations were carried out. In the first one, thirty wireframe images were added to the dataset. While, in the second iteration, seventy additional wireframe drawings were added with different colors and ways of capturing images, which allowed our model to be more accurate in detecting components. At the end of each iteration, we obtained indicators, such as precision, recall and mAP. Accuracy indicates the fraction of identified images that were correct, Recall the fraction of real images that were correctly recognized and finally mean Average Precision (mAP) the overall accuracy of the object detector in finding a component. The results of each iteration show that with the second iteration all three indicators improved, making the model more stable. Figure 1. Results of the two iterations. Source: Azure Custom Vision. 4.3. Model consultation To consult the trained model, you must have a photo of a hand-drawn wireframe. This must be uploaded to the application. It is then converted to base64 and sent to the Azure Custom Vision service for analysis. The request returns a JSON with a structure defined by each component. Each one contains the probability, position, tag name, width, and height. Finally, each component was used by a tree-based algorithm to sort them into rows and columns and have a better distribution of these. Result in JSON format of the Azure Custom Vision of the Text component. { "probability": 0.875464261, "tagId": "1932c95f-ed4a-4675-bde4-c2457e1389e6", "tagName": "Text", "boundingBox":{ "left":0.453497916, "top": 0, "width": 0.2523211, "height": 0.8738168 } } 4.4. Algorithm This tree-based algorithm was developed to display the distribution of detected wireframe components in rows and columns for better visualization by the user. This development was divided into two processes. 4.4.1. Components sorting by rows First, the algorithm detects the components from top to bottom. This comparison is made with respect to the "top" property provided by the Azure Custom Vision service. After that, it checks if any component is inside its section (red lines) "Fig. 2". Also, it adds a margin (yellow lines) "Fig. 2", to detect components that are within the margins and determine whether they belong to the same section. In addition, if the height of the component found is greater than the components that are within the same section, this will be the element of comparison. If there are no more elements to compare within the section, a row is assigned, and the elements of the next lower sections are analyzed. Figure 2. Analysis and representation of the wireframe image by rows. Second, once all the components have been detected and assigned to a specific row, they are added to the tree, that is, each node is the row, and the child nodes (leaves) are the detected components. Finally, the tree generated in the first process has a hierarchical structure and is level 3. Figure 3. Tree generated from a wireframe by rows. 4.4.2. Components sorting by columns When the tree reached level three in height. In the third level a comparison is made between the child nodes with the same parent. This comparison is made from left to right with the "Left" field which is obtained by the Azure Custom Vision service. For example, for the first row: if node 2 has node 3 within its range, then they are joined in the same column. Figure 4. Analysis and representation of the original image by rows and columns. For the example shown, once all the columns within each row were detected, the tree must be in 4 levels as follows: Figure 5. Tree generated from a wireframe - Second stage. Finally, this tree is saved in the database in JSON format, to be used later in other functionalities of the developed application. 4.5. Results and Functionalities The result is the source code generated in HTML and CSS which uses the Bootstrap grid to display the rows and columns in an orderly fashion. On the other hand, the developed application allows grouping these views within a project, as well as making changes to each view, either by editing each attribute, adding new elements to the generated view, choosing a theme for the entire project, and allowing the download of the project in a .zip file. Figure 6. HTML generated from wireframe. 5. Validation In this section, we will detail the results obtained by testing the application and the feedback obtained through the questions asked to the users. A total of 20 users were interviewed. First, a detailed explanation of the project was given to each user. Then, a URL of the deployed web application was sent. Then, each user logged in through a browser using their PC and went through the entire flow, from creating a profile to downloading one or more projects. Finally, users had to answer a questionnaire based on their experience with the application. A validation was also performed to measure the time and cost-effectiveness of using the application versus traditional development by a programmer. To do this, three developers implemented a static two-view web page. Then, these same developers made the same web page using the proposed application. Having as initial design the same wireframes. Figure 7. Wireframes used for validation. Table 2 and 3 below show the results obtained: Table 2. Times obtained from validation. Users Traditional method Using the proposal Time saved Developer 1 50,25 min 11,17 min 39,08 min – 77,77% Developer 2 42,57 min 9,72 min 32,85 min – 77,18% Developer 3 46,05 min 13,34 min 32,71 min – 71,03% Table 3. Costs obtained from validation Users Traditional method Using the proposal Cost saved Developer 1 3.81 USD 0.85 USD 2.96 USD – 77,77% Developer 2 3.23 USD 0.74 USD 2.49 USD – 77,18% Developer 3 3.49 USD 1.01 USD 2.48 USD – 71,03% The results show that for the development of a static web page, the proposal reduces the implementation time and cost for a developer by 70 to 80 percent. Figure 8 shows the web page made manually using HTML and Bootstrap. Figure 8. Static web page made by the developer 2 with the traditional method. On the other hand, Figure 9 shows a web page using the proposed application. Figure 9. Static web page made by developer 2 using the proposed application. 6. Conclusions After training the model, it can be concluded that for adequate training it is recommended to use at least fifty images per label. Because, in tests performed, the first iteration had a total of 30 images per label and as a result the model still did not detect some objects. Then, a second iteration was performed, and 70 more images were added, having a minimum of 70 images per tag and a maximum of 100 images per tag, where the result was favorable, since it improved the accuracy of recognition of web components. Secondly, after performing the corresponding validations and the different tests, it was concluded that the detection of web page components, the transformation of a wireframe to HTML and CSS code, as well as the sorting by rows and columns using the proposed tree-based algorithm complied with the established requirements. On the other hand, with respect to the validations with the group of users through the software tests and the survey conducted, it can be concluded that the solution, for 89% of the surveyed developers reduces the development time, having as a result that the average response was 4.45 within a response range of 1 to 5. Also, it can be concluded for 83% of the interviewed developers, our solution allows them to reduce the implementation costs, having that the average response is 4.15 in a range of 1 to 5. Finally, for future work it could be extended to more complex components like cards, navbars, sliders and iconography. As well as the recognition of mobile device components and code generation. In addition, the project allows the extension of the use of frontend development frameworks such as: Vuejs, React or Angular. 7. References [1] EL PAIS, "Casi la mitad de las empresas no tenía web antes de la pandemia, según un estudio | Pyme | Cinco Días," Cinco Días, 2021. [2] Justinmind, "Wireframes Vs Mockups: what's the best? - Justinmind," 3 2019. [3] IBM, Acelere su camino hacia la IA - Argentina | IBM, 2020. [4] X. Fu, The Application of Artificial Intelligence Technology in College Physical Education, Institute of Electrical and Electronics Engineers Inc., 2020, pp. 263-266. doi: 10.1109/ICBAIE49996.2020.00062. [5] J. Sigut, M. Castro, R. Arnay and M. Sigut, OpenCV Basics: A Mobile Application to Support the Teaching of Computer Vision Concepts, vol. 63, Institute of Electrical and Electronics Engineers Inc., 2020, pp. 328-335. doi: 10.1109/TE.2020.2993013. [6] MDN, HTML: básico Aprende sobre desarrollo web, 2020. [7] A. Anagnostopoulos, A. Z. Broder, E. Gabrilovich, V. Josifovski and L. Riedel, Web page summarization for just-in-time contextual advertising, vol. 3, 2011, pp. 1 - 32. doi: 10.1145/2036264.2036278. [8] A. Brown, C. Jay and S. Harper, "Tailored presentation of dynamic web content for audio browsers," International Journal of Human Computer Studies, vol. 70, no. 3, pp. 179-196. doi: 10.1016/j.ijhcs.2011.11.001, 3 2012. [9] J. Chen, C. Chen, Z. Xing, X. Xia, L. Zhu, J. Grundy and J. Wang, "Wireframe-based UI Design Search through Image Autoencoder," ACM Transactions on Software Engineering and Methodology, vol. 29, no. 3, pp. 1-33. doi: 10.1145/3391613, 7 2020. [10] F. Duque, A. Roldán-Correa and L. A. Valencia, "Accessibility Percolation with Crossing Valleys on n-ary Trees," Journal of Statistical Physics, vol. 174, no. 5, pp. 1027-1037. doi: 10.1007/s10955-019-02223-5, 3 2019. [11] A. Robinson, "Sketch2code: Generating a website from a paper mockup," 5 2019. [12] T. Beltramelli, "pix2code: Generating Code from a Graphical User Interface Screenshot," EICS '18: Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems, pp. 1-6. doi: 10.1145/3220134.3220135, 5 2017.