Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media

fact-checking
Ramy Baly, Georgi Karadzhov, Abdelrhman Saleh, James Glass, Preslav Nakov
2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Publication year: 2019

In the context of fake news, bias, and propaganda, we study two important but relatively under-explored problems:(i) trustworthiness estimation (on a 3-point scale) and (ii) political ideology detection (left/right bias on a 7-point scale) of entire news outlets, as opposed to evaluating individual articles. In particular, we propose a multi-task ordinal regression framework that models the two problems jointly. This is motivated by the observation that hyper-partisanship is often linked to low trustworthiness, eg, appealing to emotions rather than sticking to the facts, while center media tend to be generally more impartial and trustworthy. We further use several auxiliary tasks, modelling centrality, hyperpartisanship, as well as left-vs.-right bias on a coarse-grained scale. The evaluation results show sizable performance gains by the joint models over models that target the problems in isolation.

Predicting factuality of reporting and bias of news media sources

fact-checking
Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, Preslav Nakov
Proceedings of the Conference on Empirical Methods in Natural Language Processing
Publication year: 2018

We present a study on predicting the factuality of reporting and bias of news media. While previous work has focused on studying the veracity of claims or documents, here we are interested in characterizing entire news media. These are under-studied but arguably important research problems, both in their own right and as a prior for fact-checking systems. We experiment with a large list of news websites and with a rich set of features derived from (i) a sample of articles from the target news medium,(ii) its Wikipedia page,(iii) its Twitter account,(iv) the structure of its URL, and (v) information about the Web traffic it attracts. The experimental results show sizable performance gains over the baselines, and confirm the importance of each feature type.

Fact checking in community forums

fact-checking
Tsvetomila Mihaylova, Preslav Nakov, Lluis Marquez, Barron-Alberto Cedeno, Mitra Mohtarami, Georgi Karadzhov, James Glass
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence
Publication year: 2018

Community Question Answering (cQA) forums are very popular nowadays, as they represent effective means for communities around particular topics to share information. Unfortunately, this information is not always factual. Thus, here we explore a new dimension in the context of cQA, which has been ignored so far: checking the veracity of answers to particular questions in cQA forums. As this is a new problem, we create a specialized dataset for it. We further propose a novel multi-faceted model, which captures information from the answer content (what is said and how), from the author profile (who says it), from the rest of the community forum (where it is said), and from external authoritative sources of information (external support). Evaluation results show a MAP value of 86.54, which is 21 points absolute above the baseline.

We Built a Fake News & Click-bait Filter: What Happened Next Will Blow Your Mind!

fact-checking
Georgi Karadzhov, Pepa Gencheva, Preslav Nakov, Ivan Koychev
Proceedings of the International Conference on Recent Advances in Natural Language Processing
Publication year: 2017

It is completely amazing! Fake news and click-baits have totally invaded the cyberspace. Let us face it: everybody hates them for three simple reasons. Reason #2 will absolutely amaze you. What these can achieve at the time of election will completely blow your mind! Now, we all agree, this cannot go on, you know, somebody has to stop it. So, we did this research on fake news/click-bait detection and trust us, it is totally great research, it really is! Make no mistake. This is the best research ever! Seriously, come have a look, we have it all: neural networks, attention mechanism, sentiment lexicons, author profiling, you name it. Lexical features, semantic features, we absolutely have it all. And we have totally tested it, trust us! We have results, and numbers, really big numbers. The best numbers ever! Oh, and analysis, absolutely top-notch analysis. Interested? Come read the shocking truth about fake news and click-bait in the Bulgarian cyberspace. You won’t believe what we have found!

The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

stylometry
Georgi Karadzhov, Tsvetomila Mihaylova, Yasen Kiprov, Georgi Georgiev, Ivan Koychev, Preslav Nakov
Best of the Labs Track at CLEF-2017
Publication year: 2017

Best-of-labs track paper

Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a text’s writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed
using stylometry, and as a result, the true identity of the author of a piece of text can be revealed even if s/he has tried to hide it. Thus, it could be helpful to design automatic tools that can help a person obfuscate his/her identity when writing text. In particular, here we propose an approach that changes the text so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative. The approach consists of three main steps: first, we calculate the values for some popular stylometric metrics that can indicate authorship; then we apply various transformations to the text, so that these metrics are adjusted towards the average level, while preserving the semantics and the soundness of the text; and finally, we add random noise. This approach turned out to be very efficient and yielded the best performance on the Author Obfuscation task at the PAN-2016 competition.

Fully automated fact checking using external sources

fact-checking
Georgi Karadzhov, Preslav Nakov, Llu Màrquez’is, Barron-Alberto Cedeno, Ivan Koychev
Recent Advances in Natural Language Processing - RANLP
Publication year: 2017

Given the constantly growing proliferation of false claims online in recent years, there has been also a growing research interest in automatically distinguishing false rumours from factually true claims. Here, we propose a general-purpose framework for fully-automatic fact checking using external sources, tapping the potential of the entire Web as a knowledge source to confirm or reject a claim. Our framework uses a deep neural network with LSTM text encoding to combine semantic kernels with task-specific embeddings that encode a claim together with pieces of potentially relevant text fragments from the Web, taking the source reliability into account. The evaluation results show good performance on two different tasks and datasets: (i) rumour detection and (ii) fact-checking of the answers to a question in community question answering forums.

SU@ PAN’2016: Author Obfuscation—Notebook for PAN at CLEF 2016

stylometry
Tsvetomila Mihaylova, Georgi Karadzhov, Preslav Nakov, Yasen Kiprov, Georgi Georgiev, Ivan Koychev
In Conference and Labs of the Evaluation Forum, CLEF, 2016
Publication year: 2016

The anonymity of a text’s writer is an important topic for some domains, such as witness protection and anonymity programs. Stylometry can be used to reveal the true author of a text even if s/he wishes to hide his/her identity. In this paper, we present our approach for hiding an author’s identity by masking their style, which we developed for the Author Obfuscation task, part of the PAN-2016 competition. The approach consists of three main steps: the first one is an evaluation of different metrics in the text that can indicate authorship; the second one is the application of various transformations, so that those metrics of the target text are adjusted towards the average level, while still keeping the meaning and the soundness of the text; as a final step, we are adding random noise to the text. Our system showed the best performance for masking the author style.