Data Computing and Artificial Intelligence Online News Analysis on Cloud Computing Platform for Market Prediction⋆ Claudia Juarez1 and Haithem Afli2 ADAPT Centre, Cork Institute of Technology, Cork, Ireland 1 juarez.moreno@gmail.com,2 haithem.afli@cit.ie Abstract. Stock market price fluctuations and predictions have been widely examined; there are two approaches for analysis, fundamental analysis (data from financial records and balance sheets) and technical analysis (focused on past market action). However, past trends cannot predict stock market movement alone; external factors significantly in- fluence them, a notable example being how Twitter comments by Elon Musk affected the stock price for Tesla and the subsequent follow-up by the SEC with the corresponding sanctions. In this highly interconnected society and 24 hours news cycles, we require an extra tool to study the stock market. This research evaluates news articles from financial publi- cations and determines the word patterns that will help make a buy or sell decision, by identifying a combination of words or phrases that indi- cate if a stock price might go up or down. This research also focuses on the creation of a blueprint for the implementation using cloud technolo- gies to house the financial information, perform the analysis and then present it as a web app for more comfortable use and interpretation. A novice trader could benefit from a simple indicator that provides infor- mation on whether the stock might go up or down, as it would facilitate the decision-making process by identifying the exact day for buying or selling, or in case there is no relevant news, then hold the position. Keywords: Natural Language · Cloud Deployment · Custom Corpora · Deep Learning. 1 Introduction 1.1 Motivation Stock market analysis is a field that has been studied from several angles, such as economics, finance and statistics, to name a few. Typically for an experimented broker, the breakdown of a stock behavior will be formed by two strategies, the technical analysis, which uses mathematical models to predict the stock variations in the stock charts, and the fundamental strategy. ⋆ Supported by Cork Institute of Technology. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License 125 CERC 2020 Attribution 4.0 International (CC BY 4.0). Data Computing and Artificial Intelligence 2 Juarez and Afli The fundamental analysis comprehends the breakdown of a company’s finan- cial statements, historical data, and the interpretation of the summary provided by specialized websites [14]. It also involves plenty of reviews of news, financial articles, and awareness of the company’s media presence as new developments or products might bring the stock price up, whereas a news scandal might affect the price and make it go down (albeit temporarily). The main focus of this research paper and the contribution to the current work is twofold. First, we developed the code for the stock behavior analysis based on the news articles which the end-user will have the ability to select and customize according to their needs. The second contribution concentrates on cloud architecture development; this paper provides a blueprint for implementing the behavior analysis code in an app that can be accessed over the internet using Python, machine learning, and a commercial cloud platform. This paper differentiates from past research with these new approaches: 1. It has more extensive data sets, as we created our corpus using financial news articles and stock market opening and closing prices. 2. Our model uses natural language and deep learning as it works best with unstructured data. 3. We provide a practical way for the trader to interact with the predictions through a web app, where they have the ability to choose the source of data collection 4. Simultaneously, the algorithm retrains itself each time the user changes source data with an automated data collection process. 1.2 Contribution This paper takes away part of the complexity of stock analysis for the average investor, by creating an easy to use web app that will take advantage of machine learning and cloud computing to deliver pertinent information such as a predic- tion of stock behavior. This tool will help any investor who might not have a financial background but an eagerness to manage their portfolio. It can also help the part-time stock investor to avoid contacting a career advisor and spending a percentage of the profit on these services. 2 Background 2.1 Natural Language Processing Research Due to this subject’s popularity, there has been a myriad of research papers that tested the link between media and stock price movement. Yaojun and Wan have tried to tie stock price fluctuations to social media (although restricted to Chinese social media) and tied it to a ’sentiment analysis,’ this means label- ing words as positive or negative; even if the result was successful, they only considered Chinese social media accounts that provided financial information, CERC 2020 126 Data Computing and Artificial Intelligence Online News Analysis on Cloud Computing Platform for Market Prediction 3 thus having a minimal information source [15]. Nareen found in his study that sentiment directly influences a stock price; however, it did not quantify or offer a prediction [13]. Joshi’s study also found a high correlation between price fluc- tuation and sentiment however lacked a broad set of testing data; they based their research only on historical data from yahoo.com for just one company from news aggregators [8]. Deep learning is widely used with Python as a tool that helps a program learn patterns based on data [4]. This project works based on a correlation between ’words as vectors’ natural language processing and sentiment analysis to start mapping keywords/phrases and comparing them to previous stock market historical prices; it also implements word to vector (word2vec) technique to help us train a program in human language nuances such as ambiguities and grammar, while sentiment analysis provides a positive or negative connotation to text and phrases [9]. 2.2 Existing Methods We have taken as starting point three existing research papers and built upon them; Joshi’s and Nareen’s studies found a correlation between stock movement and sentiment analysis. Yaojun and Wan created a base lexicon that labeled words as positive or negative to relate stock fluctuations to social media posts. We took both ideas and methodologies and created a custom corpora that mea- sured positive and negative in terms of stock market fluctuation and instead of social media news article headlines. 2.3 Cloud Computing There have been some examples of machine learning models successfully imple- mented in cloud technologies, although not directly related to stock predictions. There has been evidence that utilizing cloud computing technologies provides a significant reduction of execution time for requests from stakeholders by maxi- mizing the utilization of cloud resources [3]. One such case of analysis is in the healthcare industry. Although it can be applied to any industry, the primary tools Abdelaziz uses are machine learning algorithms deployed in virtual machines. Abdelaziz performed predictions using linear regression and neural networks, finding a considerable improvement in response times of other models using these models in conjunction with cloud capabilities [3]. Currently, there are several tools for analyzing vast quantities of text; one of them is Python NLTK (Natural Language Toolkit). It can help us extract text excerpts and detect patterns, word structure, and frequency to determine mean- ing and intention[6]. As Feyzkhanov states in his book, serverless deep learning deployment is a novel approach that has the advantages of being scalable, simple, and cheap to start [7]. Most of the academic work does not focus on providing a design for cloud deployment. 127 CERC 2020 Data Computing and Artificial Intelligence 4 Juarez and Afli 2.4 Contribution We determined that is possible to predict with accuracy the movement of the stock market based solely on news articles taking advantage of cloud computing architecture. 3 Natural Language Processing Implementation 3.1 Methodology We used model manipulation and local improvement as our heuristic methodol- ogy, we changed the nature of the deep learning model to apply it to our particu- lar test case scenario by doing a trial and experiment using different testing data sets; we also created localized improvements by starting with a feasible solution on a working deep learning model and constructing and improving iteratively upon it.[11] For this research paper, we did not use a standardized sentiment lexicon as we are not looking for a sentiment such as ‘positive’ in the general sense of the word, but ‘positive’ as it is looking for an impact on the stock market. Instead, we created and customized for this particular research necessities’ a custom corpora using different sources and combining them to obtain the data that is relevant for our research. [11] We analyzed the headers of various news articles from online news outlets based on a stock ticker and supplied a prediction for said stock price movement. The Natural Language Processing or NLP model is housed in cloud-based ar- chitecture and made available to the public for ease of use in a web app. We created a custom corpus that worked with the NLP prediction model forgoing using a pre-trained model. We chose this approach because, upon closer inspection of the individual texts, they contained a lot of ’noise’ such as advertisement, excerpts of other pieces, links, or author’s bios, which were not associated with the article on hand. Headlines of articles are a synthesis of the entire sentiment of the article. 3.2 Data Creation For the creation of the data set, we used an API to retrieve the publication name, headline, source article and date of publication from Google News and then paired it with the stock movement for that particular day; if the news fell on a weekend or a holiday, we moved to the next business day. We calculated the percentage of change for that day, either up or down on price, and since these numbers varied widely, we normalized on a scale from 0 to 1, we did this operation for every set of stock ticker and news article. The main two discoveries we made after creating this custom corpus were that the data set was broad enough to provide information on global markets alongside the search of a single news resource and that the paid API provides more online publications even after just indicating 5 to search through. CERC 2020 128 Data Computing and Artificial Intelligence Online News Analysis on Cloud Computing Platform for Market Prediction 5 3.3 Prediction Models We selected this model because the Keras/TensorFlow Model can be used for fast prototyping and combined with TensorFlow is optimal for a production- ready system; the downside is that we do not have much control over the code as it is best for fast experimentation. We are working with neural networks that cannot work on raw data text, so the next step is to convert it to tokens, which are just integers. [12] Tokenize; The tokenizing method works by going through the complete dataset and counting the number of times each word is used and then making a vocab- ulary where each word gets an index; this way our data samples are converted to numbers called tokens [12] Pad-Truncate; The sequences are padded/truncated to make sure they are all of a standardized size by taking the average number of words in all the se- quences and add two standard deviations to ensure we are aiming at 95% of the data; the padded works by adding zeros to the sequence. [12] Embedding; Even after converting the words to tokens, a neural network still cannot work on this data due to vocabulary limitations and semantics; so, we use embedding; it works by converting integer tokens into real-valued vectors. This technique is known as representation learning, which is a way to obtain a real-valued representation from a text while preserving the semantics. [12] Recurrent Neural Network (RNN); The result of the embedding process is a two-dimensional matrix called a tensor, which now is in the correct format to be used in an RNN; its main characteristic is that it can process sequences of arbitrary length. Each layer is dependent on the result of the layer before it, which makes it perfect for natural language processing. [12] Sequential Model; In this model, we run 3 activation layers plus one dense layer that provides the numerical value. [12] Activation Function; We use sigmoid as an activation function; it is an excel- lent generic distribution that handles well randomly occurring events and works best with a small number of layers; also, the sigmoid activation provides us a range of values between 1 and 0. [12] The result provided is a number between 0 and 1, where closer to zero means that the price of the stock is more likely to go down and closer to one means the price of the stock is more likely to go up. The Deep Learning model workflow and script we use is based on Menshawy’s approach to analyzing move reviews and works the following way: 129 CERC 2020 Data Computing and Artificial Intelligence 6 Juarez and Afli Fig. 1. Deep Learning models Workflow [12] 1. Load the train/test set and prepro- 3. Ensure all sequences have the same cess data to: length (a) Normalize result behavior be- 4. Pad or truncate tween 0 and 1 5. Embedding or Vector creation (b) Fill empty lines with zeros. 6. RNN model (c) Select the size to train and test sets 7. Train model against the test set (d) Convert to array 8. Evaluate model accuracy 2. Tokenize train/test set data 9. Try out with an example. The deep learning models gives an immediate prediction for the latest news article, it is printed in-screen and saves the result in a CSV file. Input1: User input of stock name and online publication Input2: Custom Corpora Data Set Output1: Print value of model result, between 0 and 1. Output2: the results of the model are saved in a CSV file. The CSV file saves all queries for later processing and ingesting into the train/test data set. These archived results keep incrementing as the app gains popularity and more users query different stocks/publications combinations creating a bigger archiving file. Model Learning Cycle The archived results get verified daily through a cron job, archived data gets processed and verified; if enough lines for the same stock (200) are recorded in the file, then said lines would get processed, reusing the script we used for gathering the initial data set. These lines will get their financial information attached to them, the percentage of change calculated, and value normalized. Then the data that has a normalized value of 1 or 0 will become part of the primary test/train data set and flagged 1, which means it was already processed; the workflow is represented on figure 2 Detailed Script Workflow. CERC 2020 130 Data Computing and Artificial Intelligence Online News Analysis on Cloud Computing Platform for Market Prediction 7 Fig. 2. Prediction System Architecture 4 Experiments All scripts were created using Python, Python libraries, and custom APIs; the code for the final scripts is available at the private GitHub repository: https://github.com/claudia0juarez/Thesis 4.1 Parameters The parameters used in this research were: model accuracy and prediction re- sults, the model accuracy will measure the performance of the model while the prediction results are calculated based on a comparison between the model pre- diction and the actual stock market movement. 4.2 Web Scrapper Development The first step in the process was to develop the scripts that will fetch the data from online resources (news articles/web publications). We used Python and the BeautifulSoup library for the web scraper due to the ease of parsing the HTML files. For scanning the web for relevant news, we use a paid API that executes search requests in several search engines called SerpWow. [2] 4.3 Deep Learning Model Fine Tuning The deep learning model fine-tuning was run using proof of concept tests and a small data sample of 9 news articles with an assessment based on stock price movement; we ran our model and then determined the level of success by com- paring the results with the real data. The deep learning model provides a result in the range of 0 to 1 where: 0 to 0.5 is a negative result meaning the stock is most likely to go down 0.5 to 1 is a positive result meaning the stock is most likely to go up 131 CERC 2020 Data Computing and Artificial Intelligence 8 Juarez and Afli POC Test1 The first proof of concept test ran with a custom corpus size of 107,000 headlines. The model results remained pretty much in the same range; as shown on table 1 POC Test1 with Custom Data Set in page 8 we are still not getting a reliable prediction. No Model Accuracy Prediction Results No Model Accuracy Prediction Results 1 69.21 0.49 1 71.8 0.38 2 69.31 0.47 2 71.92 0.37 3 69.27 0.46 3 71.49 0.35 4 69.35 0.47 4 72.08 0.37 5 69.22 0.48 5 71.92 0.38 6 69.13 0.46 6 71.68 0.38 7 69.21 0.47 7 72.31 0.38 8 69.33 0.47 8 72.29 0.35 9 69.15 0.47 9 71.33 0.38 10 69.26 0.48 10 71.66 0.39 Table 1. POC Test1 Custom Data Set Table 2. POC Test2 Custom Data Set POC Test2 With an extended custom data set of 173,000 data points; again, the accuracy of the model greatly improved as shown on table 2 POC Test2 with Custom Data Set on page 8; however, the results were again in the same range and not providing any valuable feedback. POC Test3 The custom data set we used has test values ranging from 0 to 1, so for the next test, we only used the data that has naturally 0 (negative) or 1 (positive) to train the model as shown on table 3 for New Testing Data on page 9. After running the model with this new data set, we had a better accuracy result, and the actual results with the model started to vary from one heading to the next, although it was only correct 3 out of 9 times the main issue was that we had a minimal test/train data set of only 2,500 records as shown on table 4 for New Testing Data on page 9. CERC 2020 132 Data Computing and Artificial Intelligence Online News Analysis on Cloud Computing Platform for Market Prediction 9 Table 3. New Testing Data No Stock Headline Date Open Close Assessment Based on Stock Movement Workplace vs. coronavirus: No one 1 Microsoft 05-Mar-20 166.045 166.27 Positive has a playbook for this Pentagon asks to reconsider part of 2 Microsoft JEDI cloud decision after Amazon 12-Mar-20 145.3 139.06 Negative protest Windows 10 Warning: Anger At 3 Microsoft Microsoft Rises With Serious New 09-Feb-20 183.58 188.7 Positive Failure How corporate IT is entering the 4 Microsoft 14-Mar-20 140 135.42 Negative multi-cloud Microsoft’s Massive Stock Gains 5 Apple 09-Feb-20 314.18 321.55 Positive May Be Far From Over Tesla’s Sales Fell 68% In The 6 Tesla Netherlands And 92% In Norway In 02-Mar-20 711.26 743.62 Positive February Starbucks Is Bringing Beyond Meat 7 Starbucks 26-Feb-20 82.6 80.67 Negative To Its Canada Locations Samsung Unveils Samsung Galaxy 8 Samsung S20 Series With AI-Powered Cam- 14-Mar-20 2209.7 2209.7 Neutral era Amazon’s Stock May Jump Follow- 9 Amazon ing Quarterly Results Despite Ris- 26-Jan-20 1820 1828.34 Positive ing Costs Table 4. POC Test3 with Custom Data Set No Model Accuracy Prediction Results MODEL ACTUAL STOCK MODEL VS ACTUAL STOCK 1 75.17 0.61 positive POSITIVE 2 76.75 0.31 negative NEGATIVE 3 73.51 0.26 negative POSITIVE 4 73.57 0.69 positive NEGATIVE 5 72.29 0.39 negative POSITIVE 6 72.94 0.44 negative POSITIVE 7 72.61 0.38 negative NEGATIVE 8 76.68 0.47 negative NEUTRAL 9 75.19 0.48 negative POSITIVE POC Test4 We did a round of testing converting the entire data set (173,000) and nor- malizing their score values to either 0 or 1 depending on if it was higher or equal than .5. With this test, we discovered that this model behaves similarly to a Naives Bayer model as it only works with naturally 0 and 1 values. 133 CERC 2020 Data Computing and Artificial Intelligence 10 Juarez and Afli Table 5. POC Test4 with Custom Data Set No Model Accuracy Prediction Results MODEL ACTUAL STOCK MODEL VS ACTUAL STOCK 1 75.2 0.66 POSITIVE POSITIVE 2 72.35 0.69 POSITIVE NEGATIVE 3 76.18 0.62 POSITIVE POSITIVE 4 77.77 0.6 POSITIVE NEGATIVE 5 75.97 0.62 POSITIVE POSITIVE 6 77.4 0.6 POSITIVE POSITIVE 7 72.15 0.62 POSITIVE NEGATIVE 8 77.8 0.65 POSITIVE NEUTRAL 9 76.78 0.61 POSITIVE POSITIVE The accuracy remained in the same range; however, the result returned to the same behavior, only giving a value within the same range as shown on table 5 POC Test4 with Custom Data Set on page 10. The best way to obtain better results with the deep learning model is by using a good data set both in quality and quantity. Quality meaning that only truly 0 or 1 results will provide a good result, there is no use in artificially moving the value set to 0 or 1 as it will throw the overall model. Also, the quantity, with a more significant data set, the results get consistently better. 4.4 Results After running these POC and refining the model with the custom data set, we ran the custom script to get more data points (from 2500 to 3500) and re-run the test with a more prominent test set (around 272 headers). We ran 3 tests with an incrementally higher number of test sets: Test1 (2505 test set) The 272 news headlines had a 46% accuracy when tested along with the custom test set of 2505 data points, which meant that 124 out of 272 headlines were correctly predicted, and the stock price moved accordingly to said value. Correct Prediction 124 46% Incorrect Prediction 147 54% Test2 (3521 test set) For the second test the 272 news headlines had a 43% accuracy when tested along with the custom test set of 2505 data points, which meant that 116 out of 272 headlines were correctly predicted, and the stock price moved accordingly to said value. Correct Prediction 116 43% Incorrect Prediction 155 57% CERC 2020 134 Data Computing and Artificial Intelligence Online News Analysis on Cloud Computing Platform for Market Prediction 11 Test3 (3789 test set) To get the final set, I needed to create 264,000 overall custom data set and then only use the 0 and 1 values; this test retrieved a 63% of accuracy that represented 170 correct predictions out of 272. Correct Prediction 170 63% Incorrect Prediction 101 37% To get a better prediction result, our work shows that there is a need to keep incrementing the custom test/train data set. However, it is very resource- intensive as it takes about a day to get around 100,000 results (provided the paid APIs don’t throw an error), and from this set, we will still have to filter the 0/1 results, which are about 1,000. 5 Cloud Architecture Service-oriented-architecture or SOA is known for increasing the capability of an enterprise to address new business requirements with minimal cost, resources, and time overheads. We based the development of the cloud-based architecture of our prediction system on this framework.[5] The blueprint was based in SOA to maximize the cloud benefits; it can lever- age cloud computing resources as services contained within itself; it will help layout the design of services that will increase usability and durability as well as the blueprint for design, development, and deployment. [10] Data to Cloud Roadmap: 1. Define the data. The data was taken from news publications (financial, and well-reputed sources). [5] 2. Define the services. We used AWS, although this layout might be applicable to any other cloud service provider, along with Python libraries, and paid APIs. [5] 3. Define the processes. We used ETL (Extract/Transform/Load). [5] which is a quantitative methodology based on observations and experimentation. Extract; Automate the data collection with web scrapping tools for these parameters: renowned online financial publications and historical data on market prices for the past years (opening price, closing price). Transform; Use data analytics tools to correlate the price changes of Com- pany Stock vs. news articles mention on the same day. With this information, we created custom corpora and utilize NLP for training a model into deter- mining the sentiment/vector and the effect on the price (high or low). Load; Create a cloud-based web app to review different stocks, followed by the creation and documentation for a blueprint design. 4. Define governance; it is the ability to control changes to services and the usage of said services; we must control how our data is accessed, deleted, added, and altered employing processes procedures and technology. We used the AWS in-place security protocols to make sure that our Data is persistent and safe. [5] 135 CERC 2020 Data Computing and Artificial Intelligence 12 Juarez and Afli 5. Define which candidate data, services, and processes should live in the cloud and which should live on-premise (if any). As for this project, all of our data will resides in cloud services. [5] The high-level architecture of the prediction model, as shown in figure 3 Cloud High-Level Architecture is an integration of three AWS subsystems, each of them housing a subsection of the overall process. Fig. 3. Cloud High-Level Architecture EC2 hosts the instance, application files, the network security setups, load balancer, the security groups, the VPC and the certificate manager. The EFS (Elastic File System contains the CSV files created by the application and the data set and S3 contains the HTML web application. The more detailed process described in figure 4 Detailed Cloud Process on page 13 starts with the public internet and the main website www.moneyplease.trade , the request goes through the CloudFront, which warrantees that the edge ser- vices of AWS will be close to our primary users; at the moment, we are only using the EU, USA, and Canada as this is the starter level. However, we can upgrade in the future to include more edge zones. The Cloudfront distribution is connected to an S3 bucket that contains our web app HTML file. We choose S3 because, unlike our CSV files, the content of this bucket will be static. This content is the one distributed to the static edges. CloudFront relies on the Certificate Manager to generate trust certificates and to have a secure connection; the CM creates and updates the certificates from now on, so we do not need to worry about it. CERC 2020 136 Data Computing and Artificial Intelligence Online News Analysis on Cloud Computing Platform for Market Prediction 13 The communication between the S3 bucket/the load balancer and the EC2 instances is hosted on our virtual private cloud and not over the public internet. S3 will function as our storage system, it works best for hosting our HTML file because it is not primary for reading/writing. Whenever we require to change this file, we will need to download a copy of it, modify it and reload it again as changes in our webpage occur. S3 will connect then to the load balancer; at this point, it is only managing a single instance. However, it can be set up for escalation (managing recurring instances in case the demand increases) In EC2 is where the Python scripts reside, they are all managed by a Flask interface that is waiting for the user input to start the process, once the primary process starts it reads/writes from the CSV files in the EFS (elastic file system). The EFS contains all the working files in CSV format, the data set used for training and testing of the deep learning model, the results from the searches, the historical headlines, the financial information, and the complete result data set. These files will continue to grow each time a user inputs a new search; the EFS will grow with them and will place them in the correct availability order, as files that do not get much use will have a lower priority than for example the binary data set that works with deep learning model during each run. Another advantage to EFS is that it can be deployed and mounted to all instances and work seamlessly with them if we need escalation. Fig. 4. Detailed Cloud Process [1] 5.1 Integration in web-based application(interface) Flask is the microserver used to run the application; we selected it due to its inte- gration with Python and its straightforward interface with our already working code; the stack for the backend is shown on figure 5 Backend Stack on page 14 137 CERC 2020 Data Computing and Artificial Intelligence 14 Juarez and Afli To have a correct division of responsibilities in our app server, we cannot use Flask as the webserver, we needed to use HTML technology; this way we can do front-end developments independently from the back end. Flask has standard methods that allow communication with the front-end; this way we ensure that our deployment is tech agnostic, meaning that we can have future mobile apps or different web applications without changing our main code. Fig. 5. Backend Stack For the content distribution, AWS CloudFront is connected to the origin of the data. Then the DNS registries are updated for the domain to work along with CloudFront as shown in figure 6 Backend Distribution. Fig. 6. Backend Distribution CERC 2020 138 Data Computing and Artificial Intelligence Online News Analysis on Cloud Computing Platform for Market Prediction 15 6 Conclusion and Discussion This work was successful in implementing a prediction model with a 64% accu- racy in prediction of stock market price movement; we were also successful in creating a feasible blueprint for a deployment entirely on cloud premises services. The most important part of the prediction model is the customized corpus made from a suitable binary dataset (only 0 and 1 values after normalizing). If we do not have a dataset containing the parameters we are evaluating, the results are not going to be representative or useful for the type of prediction we are after. A small accurate test set brings better predictions than a big messy one. This corpus will be extended with the new data searches provided by users to retrain our model. The pay-as-you-go services for the cloud provider and API’s are required, as the model grows we will need to pay for the services to keep our implementation going; the model was successfully deployed in the free-tier applications, and it is possible to scale all of the applications once the work extends its size and scope. For future work and further research we might want to explore the growth of the cloud deployment, at the moment, we are using CSV files in an elastic file system; however, if the data keeps growing as intended this might not be the best use of cloud computing resources, for the next phase in the project we might want to add a relational database and convert the CSV files to a SQL format. Also, we are not saving any user’s personal information or manipulating any other sensitive information; however, if the application grows, we might want to add authentication based on user emails or social media. Another improvement opportunity worth mentioning is the webscrapper de- velopment, we might want to add priorities for the websites that were crawled for the custom corpora; as it stands, the webscrapper takes headlines information for different sources (blogs, smaller news outlets, opinion pieces) as it is focusing on volume for the creation of the testing data set. If this first step gets refined and the corpora not only increases in size but in quality of the headline, it will ideally increase its reliability. Also the deep learning model used was a popular teaching model, we might consider improving the model type to increase reliability. Although at this point it is only using one computer it is set up for future growth within the cloud space These considerations for future releases might be excellent additions and nice- to-have, the objective of the research was met with a minimum viable product or MVP that gives a prediction based on a selected publication plus a stock name and is deployed entirely in a cloud environment. References 1. MicroServices Architectures on AWS. https://www.slideshare.net/AmazonWebServices/microservices-architectures-on- amazon-web-services 139 CERC 2020 Data Computing and Artificial Intelligence 16 Juarez and Afli 2. SerpWow - Google Search Results API, https://app.serpwow.com/playground 3. Abdelaziz, A., Elhoseny, M., Salama, A.S., Riad, A.M.: A machine learn- ing model for improving healthcare services on cloud computing environment. Measurement: Journal of the International Measurement Confederation (2018). https://doi.org/10.1016/j.measurement.2018.01.022 4. Akshay, K., Shivananda, A.: Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python. Apress (2019) 5. Banke K., Slama D., K.D.: Enterprise SOA: Service-Oriented Architecture Best Practices. Prentince Hall (2004) 6. Bird, S.: Natural Language Processing with Python. O’Reilly Media (2016) 7. Feyzkhanov, R.: Hands-On Serverless Deep Learning with TensorFlow and AWS Lambda. Packt Publishing (2019) 8. Joshi, K.: Stock trend prediction using news sentiment analysis. Tech. rep. (2013) 9. Lamons, M., Kumar, R., Nagaraja, A.: Python Deep Learning Projects. Packt Publishing (2018) 10. Linthicum, D.S.: Cloud Computing and SOA Convergence in Your Enterprise: A Step-by-Step Guide. Addison-Wesley Professional (2009) 11. Mathirajan, A.I.S.K.N.K.M.: Management Research Methodology: Integration of Principles, Methods and Techniques. Pearson India (2006) 12. Menshawy, A.: Deep Learning by Example. Packt Publishing (2018) 13. Naren, J.: News analytics and dual sentiment analysis for stock market prediction. IEEE International Conference on Big Data Analysis(ICBDA) (December) (2017) 14. Piard, F.: The Lazy Fundamental Analyst: Applying Quantitative Techniques to Fundamental Stock Analysis. Harriman House Ltd, (2014) 15. Yaojun, W., Wan, Y.: Using social media mining technology to assist in price prediction of stock market. IEEE International Conference on Big Data Analy- sis(ICBDA) (2016). https://doi.org/10.1109/icbda.2016.7509794. CERC 2020 140