1. Introduction

Multimodal Attention is all you need

Marco Saioni

Cristina Giannone

0 1 0 Almawave S.p.A., Via di Casal Boccone , 188-190 00137, Rome, IT 1 University G. Marconi , Rome, IT

In this paper, we present a multimodal model for classifying fake news. The main peculiarity of the proposed model is the cross attention mechanism. Cross-attention is an evolution of the attention mechanism that allows the model to examine intermodal relationships to better understand information from diferent modalities, enabling it to simultaneously focus on the relevant parts of the data extracted from each. We tested the model using textitMULTI-Fake-DetectiVE data from Evalita 2023. The presented model is particularly efective in both the tasks of classifying fake news and evaluating the intermodal relationship.

eol>Transformer fake news classification multimodal classification cross attention

1. Introduction

the text and images it receives as input).

The aim was to find a way to reconcile the two diferent Internet has facilitated communication by enabling rapid, representation embeddings because they are learned sepimmersive information exchanges. However, it is also arately from two diferent corpora, such as text and imincreasingly used to convey falsehoods, so today, more ages, trying to capture their mutual relationships through than ever, the rapid spread of fake news can have se- some interaction between the respective semantic spaces. vere consequences, from inciting hatred to influencing The remainder of the paper is structured as follows: ifnancial markets or the progress of political elections to section 2 presents a brief overview of related work, and endangering world security. For this reason, mitigating section 3 describes the architecture of the proposed the growing spread of fake news on the web has become model. Section 4 discusses an overview of our expera significant challenge. iments. Sections 5 and 6 present the final results and our

Fake news manifests itself on the internet through conclusions, respectively. text, images, video, audio, or, in general, a combination of these modalities, which is a multimodal way. In this article, we took the two, text and image, compo- 2. Related Works nents of news as it proposed, for instance, in a social network. In this work we proposed an approach to auto- The Italian MULTI-Fake-DetectiVE competition [ 2 ] adds matically and promptly identify fake news. We use the to the various datasets and challenges on multimodal dataset MULTI-Fake-DetectiVE1 competition, proposed in fake news recently developed, for instance, Factify [3] EVALITA 20232. The competition aims to evaluate the and Fakeddint [4]. The creation of these competitions truthfulness of news that combines text and images, an shows the interest in this task. The first task of the Italian aim expressed through two tasks: the first, which car- challenge saw three completely diferent systems placed ries out the identification of fake news ( Multimodal Fake on the podium. While the first system POLITO[ 5] with News Detection); the second, which seeks relationships a system based on the FND-CLIP multimodal architecbetween the two modalities text and image by observing ture [6] proposing some ad hoc extensions of CLIP [7] the presence or absence of correlation or mutual implica- including sentiment-based text encoding, image transfortion (Cross-modal relations in Fake and Real News). mation in the frequency domain, and data augmentation

Our approach proposes a Transformer-based model via back-translation. The Extremita system [8], second that focuses on relating the textual and visual embeddings classified, exploited the LLM capabilities, focusing only of the input samples (i.e., the vector representations of on the textual component of each news. They fine-tuned the open-source LMM Camoscio [9] with the textual part of the dataset. The impressive results show how the textual component plays a primary role in identifying fake news. Despite the significant contribution of the textual component to the task, more and more multimodal CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, Dec 04 — 06, 2024, Pisa, Italy * Corresponding author. $ marco.saioni@gmail.com (M. Saioni); c.giannone@unimarconi.it (C. Giannone)

© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License approaches are taking hold. In [10] proposed CNN ar1https://siAtettrsib.gutoioon4g.0leIn.tceronamtio/nual n(CiCpBi.Yit4./0m).ulti-fake-detective chitecture combining texts and images to classify fake 2https://www.evalita.it[ 1 ] news. In that direction, approaches such as CB-FAKE[11] incorporate the encoder representations from the BERT model to extract the textual features and comb them with a model to extract the image features. These features are combined to obtain a richer data representation that helps to determine whether the news is fake or real. Visionlanguage models, in general, have gained a lot of interest also in the last years, in the "large models era". Language Vision Models have been proposed during the previous year, with surprising results in many visual language interaction tasks [12],[13].

3. The proposed Model

The objective was to "engage" specialist models for natural language processing and artificial vision, making them discover and learn bimodal features from text and images collaboratively and harmoniously by applying the teachings of Vaswani et al. [14]: we decided to follow the path indicated by "Attention is all you need" Vaswani et al. very famous paper, following up on the intuition that the Attention mechanism could provide an important added value to the multimodal model of identification of fake news, becoming a Multimodal Attention (hence the title of this article), i.e. an attention mechanism applied between the two textual and visual modes of news. In fact, while Attention or Self Attention (as described in Vaswani et al. paper) takes as input the embeddings of a single modality and transforms them into more informative embeddings (contextualized embeddings), Multimodal Attention takes as input the embeddings of the two diferent modalities by combining them and then transforming them into a single embedding capable of capturing any existing relationships between the two input modes. to each other with the strategy of mutual cross-attention to obtain two embeddings subsequently concatenated to provide the input of the last dense classification layer. 3.1.1. Pre-processing step

As a first step it is necessary to process the data made

available by the organizers of the MULTI-Fake-DetectiVE competition to produce inputs that are compatible and 3.1. Architecture compliant with those expected from the pre-trained models. The choices made for this preparation or for the Multimodal Attention is the heart that supports the pro- pre-processing of the dataset and the data ”personalizaposed model, making it capable of exploring the hidden tion” strategy will then be described in the following aspects of multimodal communication. As shown at a three points: high level in Figure 1, the architecture of the proposed model consists of a hierarchical structure with three lay- • resolution/explosion of 1 : relationships beers preceded by a pre-processing step. In order, there are: tween text and images into times 1 : 1 relaa pre-processing step, an input layer, a cross-modal layer tionships; and a fusion layer. It was decided to propose a network • data augmentation with the creation of an addithat models the consistent information between the two tional image to support the original one already modalities textual and visual starting from State Of The present in each example; Art pre-trained neural networks. In particular, we use a • management of the textual component, truncated BERT [15] pre-trained model to learn the word embed- by BERT or rather by the relevant tokenizer to a dings by the textual component of news and a ResNet ifxed maximum length of tokens. [16] pre-trained model to learn visual embeddings by the visual component. The two embeddings, belonging to two spaces with diferent dimensions, are first projected into a uniform, reduced-dimensional space, then related As decided for the visual and textual components, therefore following processing, for each single sample we move from the original pairs < , + >, where + indicates the ratio 1 : between text in natural language and images in JPEG format, to the triples appropriately visual embedding of size ℎ for each example and which translated into numbers represents the features in a compact and semantic form extracted through convolutions and pooling within the < , , > ResNet network. In fact, to obtain visual embeddings from a pre-trained neural network like ResNet, we usuwhere indicates, for each sample, a first-order ten- ally take the output of the penultimate layer, i.e. global sor with 128 values (token), while and denote pooling. In the proposed model, ResNet50V2 was chothird-order tensors with (224 × 224 × 3) values (pixels). sen which in global pooling reduces the spatial dimenIn fact, the first order tensor is the representation of the sions of the output tensor to 2048 values and therefore text in numerical form according to the default strategy each input image will correspond in output to a vector of the BERT tokenizer, while the third order tensor is the with ℎ = 2048 values, which represents the visual emrepresentation of the images in numerical form according beddings extracted from the network for that specific to the RGB coding for ResNet. image. After obtaining the embeddings for each of the two images, they are concatenated together to obtain 3.1.2. Input layer a single output tensor which will therefore have size 2 × ℎ = 4096. Using the same formalism as the previous text encoder, we have:

This layer receives as input the previously processed

dataset, i.e. the text and the images represented in numerical form, passing it to the pre-trained BERT and ResNet models to obtain the respective embeddings, subsequently projected into a space with small and common dimensions to make them comparable and to allow them to collaborate with each other in the subsequent crossmodal layers.

BERT Encoder Each sample pre-processed and represented in numerical form by the tokenizer is passed as input to the pre-trained BERT model which returns diferent output tensors for each of them. For the purposes of the classification task object of this study, we consider the pooled_output, a compact representation of all the token sequences given as input to the BERT model, obtained via the special token [CLS]. It is therefore a summary of the information extracted from the entire input dataset whose dimensions evidently depend on the number of hidden units of BERT. Since each text supplied as input to BERT will correspond to a tensor with 768 values real, using vector notation we have that: et = BERT(ttrunc)[_] ev = ResNet(v)[_] where ev ∈ Rℎ is the visual embedding vector and v ∈ R× × the input third-order tensor. The indicated equation refers to a single sample but can be extended to the entire batch of examples, therefore indicating the batch with V ∈ R× × × , we will have:

Ev = ResNet(V)[_] where Ev ∈ R× ℎ is the visual embedding matrix learned by the ResNet model. Similar discussion for the second image, for which it will be valid at batch level:

Evaug = ResNet(Vaug)[_] where Evaug ∈ R× ℎ . By concatenating the two embeddings, we will obtain:

Ev ⊕

Evaug = Econcat(v,vaug) ∈ R× 2ℎ .

From this moment and for simplicity of notation, Ev will

refer to Econcat(v,vaug), knowing that this embedding is actually the concatenation of embeddings of an image and the one obtained through random transformations. where et ∈ Rℎ is the word embeddings vector, ttrunc ∈ R is the token input vector and ℎ = 768 is the BERT hidden size. The equation shown refers to a single sample but can be extended to the entire batch of examples processed by BERT. Indicating this batch with Ttrunc ∈ R× , we will have:

Projection The pre-trained models provide embed

dings with diferent sizes. It is, therefore, necessary to transform them into a space with the same dimensionality to obtain comparable representations. The projection Et = BERT(Ttrunc)[_] function carries out this task, introduced both to reduce the dimensions of the two embeddings and reduce the where Et ∈ R× ℎ is the text embedding matrix learned computational load, improving the performance of the by the BERT model. multimodal model and allowing it to learn more complex patterns. The projection of embeddings is particularly ResNet Encoder The two images of each sample pre- useful in cases where you want to compare the semanviously represented in numerical form are passed as in- tic representations of two objects, ensuring that both put to the pre-trained ResNet model, which returns a are aligned in the same reduced semantic space, making them comparable in terms of similarity or distance or facilitating the comparison and analysis of relationships.

For this model, we selected = 128 as the projection size, reducing both embeddings sizes of the input components.

Once you have available the embeddings (textual and

visual) learned unimodally in the network, and the crossattention embeddings learned intermodally, it is necessary to implement a fusion strategy that can best balance their respective contributions in the multimodal classi3.1.3. Cross-modal layer ifcation task. Although the architecture of the model This layer is the heart of the model, which is developed would seem to suggest the implementation of the late taking inspiration from the behavior of human beings fusion strategy, it is necessary to observe how the crosswhen faced with news made up of text and images. Intu- attention of the cross-modal layer is already a fusion stratitively, we try to read in the image what is written in the egy adopted in the network during learning before the text and to represent in the text what is shown by the one explicitly implemented in the next fusion layer: this image. It can be said that cross-modal attention relations allowed the model to learn shared features during trainexist between image and text. This is why, to simulate ing while maintaining the suitable flexibility between the the human process described in a neural model, we relied multimodal components, i.e. without excessively influon the cross attention between the two modalities, a vari- encing the learning process of each modality separately. ant of the standard component of multi-head attention The concatenation preserves each modality’s distinccapable of capturing global dependencies between text tive features, allowing the model to exploit them during and images. learning, unlike the sum which could lead to the loss of

In the proposed model, two blocks of crossed atten- information due to values that can cancel each other out, tion are activated in the two text-image and image-text taking away the model’s descriptive capacity. For these perspectives. In the first case, we consider the textual reasons, the fusion occurs taking into consideration all embeddings as queries for the multi-head attention block, four embeddings learned by the model Et− projected, while the visual ones as key and value. This should allow Ev− projected, Ecross− tv, Ecross− vt, where the first the characteristics of the text to guide the model to focus two provide distinctive unimodal features, while the on regions of the image semantically coherent with the other two provide correlated and mutually ”attentioned” text: in fact, if the textual embeddings are considered cross-modal features. The hybrid fusion strategy then as queries and the visual ones as key and value, then completes the recipe, providing that pinch of flexibility the attention will be applied to the images in based on necessary to give balance to the multimodal classifier. compatibility with the text, which is therefore consid- Formally we have the following equation, which aims to ered the context on which to evaluate the relevance of make the most of both the information provided by the an image. In this way, attention is focused on the images individual modalities as such, and that provided jointly: with respect to how relevant they are to the text, i.e. we try to give importance to the visual features in relation Eglobal = (Et− projected ⊕ Ev− projected)⊕ to the context provided by the text. Conversely, in the second case the visual embeddings are the queries, while Ecross− tv ⊕ Ecross− vt the keys and values are the textual embeddings, and this where Eglobal R× 4 , where is the size of the should allow the visual features to make the model pay batch of examples given as input to the network and attention to those parts of text consistent with the images. = 128.

That is, the same thing as in the previous case applies, The final output of the multimodal model is obtained but the roles between text and image are reversed. by applying a densely connected layer with = 4 units

Wanting to formalize the bidirectional cross-attention and a softmax activation function that returns the probabetween the embeddings of the text Et− projected and bilities of the four classes. Formally: those of the images Ev− projected, we can write: Y = (EglobalW + b) Ecross− tv = Attention(Et− projected, Ev− projected) O = softmax(Y) Ecross− vt = Attention(Ev− projected, Et− projected) with W ∈ R4 × , b ∈ R1× and therefore O ∈ where Ecross− tv represents the attention embeddings of R× is a matrix in which each row is a vector with image information with respect to the text and Ecross− vt = 4 values representing the conditional (estimated) represents attention embeddings of text information com- probability of each class for the relevant sample. pared to images.

In this layer the dimensions of the embeddings are not modified in any way, therefore we remain in R× 128.

4. Experimental Setup 4.1. Split dataset into training and

validation To guarantee that the proportions relating to the classes and sources are maintained uniformly in the two sets, the 1034 samples of the dataset are randomly divided following the 80%-20% proportion between training and validation in a stratified way both with respect to the by the unimodal textual model, but higher than the score labels, as also happens in the baseline model of the com- of the unimodal visual model, indicating that the integrapetition MULTI-Fake-DetectiVE and, with respect to the tion of visual and textual information led to an improvetype of source of the news. ment in performance compared to the model visual, but not enough to outperform the text model. This suggests 4.2. Training and validationn that there may be potential to perform additional optimizations or modality integration strategies to achieve better performance from the multimodal model.

5.2. Proposed model To evaluate the model proposed on the Multimodal Fake

News Detection task, we chose to follow the approach used by the organizers in the notebook of the baseline models, i.e. we performed an ablation study on the proposed model: first a unimodal textual model was trained, then a unimodal visual one, then a multimodal one without cross-bi-attention, finally a multimodal one with cross-biattention. Table 2 reports the respective accuracy and F1-weighted values.

Model Proposed Multi-modal ⊗ Proposed Text-only Proposed Multi-modal ⊕ Proposed Image-only

For our experiment, the model was trained up to 80 epochs with early stopping on using the focal loss [17] function. It is a dynamically scaled loss cross entropy function, where the scaling factor decays to zero as conifdence in the correct class increases. Intuitively, this scaling factor can automatically scale the contribution of easy examples during training and quickly focus the model on dificult examples. For the optimizer we chosed AdamW, given that the models used to analyze text and images were originally pre-trained using this algorithm, which applies weight regularization directly to the model parameters during weight updating, helping to improve the stability and generalization of the model.

5. Results 5.1. Oficial

baseline models

In the notebook provided by the MULTI-Fake-DetectiVE

organizers there is an evaluation strategy on the oficial dataset which is developed by comparing the performance of the unimodal pre-trained models with a multimodal model: The F1-weighted score values of the three baseline models are shown in Table 1. The textual model is therefore the most efective among the three baseline models in classifying fake news and the visual one has lower performance than the textual model. The multimodal model obtained an F1-weighted score lower than that obtained of crossed attention seen from the two text-image and image-text perspectives enriched by the skip connection provided by the simple concatenation of the two diferent embeddings, provides the model with that extra edge that allows it to dig background in the relationships between textual and visual features. By combining bilateral cross-attention and residual connection, tasks of the cross-modal layer and the fusion layer respectively, significant semantic and semiotic interrelations are obtained in favor of the performance of the classifier which becomes more precise and sensitive.

In fact, if on the one hand the cross-modal layer allows the model to learn multimodal semantics between text and images, the fusion layer enhances it by improving its stability, capacity and performance thanks to the skip connection which provides the gradient with a useful direct path during backpropagation to flow without tending to zero, bringing significant and additional information into each layer of the network.

All the results described up to this point are obtained by measuring the model on the Multimodal Fake News Detection task of the competition covered by this work. As mentioned, the organizers also proposed a second task Cross-modal relations in Fake and Real News, aimed at verifying the robustness of the model to changing tasks without any human intervention. Table 4 shows the accuracy and F1-weighted values for the proposed model called to express itself on the Cross-modal relations task, together with the baseline and winner models of the MULTI- competition Fake-DetectiVE. The results show Model Proposed Multi-modal PoliTo - FND-CLIP-ITA

Baseline Multi-modal a clear improvement in performance in solving the task even compared to the winning model of the competition. This is a very important result, because it demonstrates the network’s ability to adapt to changes in tasks and changes in training data, which is not at all a given.

The data preparation strategy in the Pre-processing step

provides the model with more information to learn from, the real strength can be identified in the Cross-modal Layer.

6. Conclusions

The Internet has facilitated the multimodality of communication by enabling rapid information exchanges that are increasingly immersive but increasingly used to convey falsehoods. In this study, a multimodal model for identifying fake news was proposed which is based on the mechanism of cross attention between the representations of the features learned by the network on the textual component of the news and those learned on the visual component associated with it.

Many multimodal models are based on the concatenation of features learned from distinct modalities which, despite having good performance, however, limit the potential of the interaction between the features themselves.

From the experiments carried out, the use of crossattention demonstrated significant improvements in the performance of the model proposed in this work compared to the first two models classified in the MULTIFake-DetectiVE competition for both tasks requested by the organizers, despite the dataset available for training is very small in size and unbalanced both with respect to the categories to be predicted and with respect to the source of the news. Despite the intrinsic complexity of the two tasks, the cross-layer of the proposed model manages to express the representations learned from the text and images of a news story in a harmonious, collaborative and synergistic way, balancing their contributions and preventing one from taking over the other.

Future developments concern the components of the model which could use a Visual Transformer [18] instead of the ResNet in order to relate textual embeddings and visuals both generated by training a Transformer network. detection and verification task, CEUR WORKSHOP org/abs/2307.16456. arXiv:2307.16456. PROCEEDINGS 3473 (2023). URL: https://ceur-ws. [10] I. Segura-Bedmar, S. Alonso-Bartolome, Multiorg/Vol-3473/paper32.pdf. modal fake news detection, Information 13 (2022). [3] S. Suryavardan, S. Mishra, P. Patwa, URL: https://www.mdpi.com/2078-2489/13/6/284.

M. Chakraborty, A. Rani, A. N. Reganti, A. Chadha, [11] B. Palani, S. Elango, V. K, Cb-fake: A multiA. Das, A. P. Sheth, M. Chinnakotla, A. Ekbal, modal deep learning framework for automatic fake S. Kumar, Factify 2: A multimodal fake news news detection using capsule neural network and and satire news dataset., in: A. Das, A. P. Sheth, bert, Multimedia Tools and Applications 81 (2022). A. Ekbal (Eds.), DE-FACTIFY@AAAI, volume 3555 doi:10.1007/s11042-021-11782-3. of CEUR Workshop Proceedings, CEUR-WS.org, 2023. [12] W. Wang, Q. Lv, W. Yu, W. Hong, J. Qi, Y. Wang, URL: http://dblp.uni-trier.de/db/conf/defactify/ J. Ji, Z. Yang, L. Zhao, X. Song, J. Xu, B. Xu, J. Li, defactify2023.html#SuryavardanMPCR23. Y. Dong, M. Ding, J. Tang, Cogvlm: Visual expert [4] K. Nakamura, S. Levy, W. Y. Wang, Fakeddit: A new for pretrained language models, 2024. URL: https: multimodal benchmark dataset for fine-grained //arxiv.org/abs/2311.03079. arXiv:2311.03079. fake news detection, in: N. Calzolari, F. Béchet, [13] H. Liu, C. Li, Y. Li, Y. J. Lee, Improved baselines P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, with visual instruction tuning, 2024. URL: https: H. Isahara, B. Maegaard, J. Mariani, H. Mazo, //arxiv.org/abs/2310.03744. arXiv:2310.03744. A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings [14] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, of the Twelfth Language Resources and Evaluation L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, AtConference, European Language Resources Associ- tention is all you need, 2017. arXiv:1706.03762. ation, Marseille, France, 2020, pp. 6149–6157. URL: [15] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, https://aclanthology.org/2020.lrec-1.755. Bert: Pre-training of deep bidirectional trans[5] L. D’Amico, D. Napolitano, L. Vaiani, L. Cagliero, formers for language understanding, 2019.

Polito at multi-fake-detective: Improving FND- arXiv:1810.04805.

CLIP for multimodal italian fake news detection, in: [16] K. He, X. Zhang, S. Ren, J. Sun, Deep residual M. Lai, S. Menini, M. Polignano, V. Russo, R. Sprug- learning for image recognition, in: 2016 IEEE Connoli, G. Venturi (Eds.), Proceedings of the Eighth ference on Computer Vision and Pattern RecogEvaluation Campaign of Natural Language Pro- nition (CVPR), 2016, pp. 770–778. doi:10.1109/ cessing and Speech Tools for Italian. Final Work- CVPR.2016.90. shop (EVALITA 2023), Parma, Italy, September 7th- [17] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dol8th, 2023, volume 3473 of CEUR Workshop Proceed- lár, Focal loss for dense object detection, 2018. ings, CEUR-WS.org, 2023. URL: https://ceur-ws.org/ arXiv:1708.02002.

Vol-3473/paper35.pdf. [18] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weis[6] Y. Zhou, Q. Ying, Z. Qian, S. Li, X. Zhang, Multi- senborn, X. Zhai, T. Unterthiner, M. Dehghani, modal fake news detection via clip-guided learn- M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, ing, 2022. URL: https://arxiv.org/abs/2205.14304. N. Houlsby, An image is worth 16x16 words: arXiv:2205.14304. Transformers for image recognition at scale, 2021. [7] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, arXiv:2010.11929.

G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from natural language supervision, 2021. arXiv:2103.00020. [8] C. D. Hromei, D. Croce, V. Basile, R. Basili, Extremita at EVALITA 2023: Multi-task sustainable scaling to large language models at its extreme, in: M. Lai, S. Menini, M. Polignano, V. Russo, R. Sprugnoli, G. Venturi (Eds.), Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), Parma, Italy, September 7th-8th, 2023, volume 3473 of CEUR Workshop Proceedings, CEURWS.org, 2023. URL: https://ceur-ws.org/Vol-3473/ paper13.pdf. [9] A. Santilli, E. Rodolà, Camoscio: an italian instruction-tuned llama, 2023. URL: https://arxiv.

[1]

Lai ,

Menini ,

Polignano ,

Russo ,

Sprugnoli , G. Venturi (Eds.), Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023 ), Parma, Italy, September 7th8th , 2023 , volume 3473 of CEUR Workshop Proceedings, CEUR-WS.org , 2023 . URL: https://ceur-ws. org/ Vol- 3473 .

[2]

Bondielli ,

Dell'Oglio ,

Lenci ,

Marcelloni ,

L. C.

Passaro , M.

Sabbatini, Multi-fake-detective at evalita 2023: Overview of the multimodal fake news