What makes you change your mind? An empirical investigation in online group decision-making conversations

dialogue systems
Karadzhov, Georgi and Stafford, Tom and Vlachos, Andreas
arXiv preprint arXiv:2207.12035, 2022
Publication year: 2022

Leaf: Multiple-choice question generation

Inproceedings
Vachev, Kristiyan and Hardalov, Momchil and Karadzhov, Georgi and Georgiev, Georgi and Koychev, Ivan and Nakov, Preslav
In European Conference on Information Retrieval, 2022
Publication year: 2022

The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news

Inproceedings
Nakov, Preslav and Da San Martino, Giovanni and Elsayed, Tamer and Barrón-Cedeno, Alberto and M’iguez, Rubén and Shaar, Shaden and Alam, Firoj and Haouari, Fatima and Hasanain, Maram and Babulkov, Nikolay and others
In European Conference on Information Retrieval, 2021
Publication year: 2021

Generating answer candidates for quizzes and answer-aware question generators

Article
Vachev, Kristiyan and Hardalov, Momchil and Karadzhov, Georgi and Georgiev, Georgi and Koychev, Ivan and Nakov, Preslav
arXiv preprint arXiv:2108.12898, 2021
Publication year: 2021

DeliData: A dataset for deliberation in multi-party problem solving

dialogue systems
Karadzhov, Georgi and Stafford, Tom and Vlachos, Andreas
arXiv preprint arXiv:2108.05271, 2021
Publication year: 2021

What was written vs. who read it: news media profiling using text analysis and social media context

Article
Baly, Ramy and Karadzhov, Georgi and An, Jisun and Kwak, Haewoon and Dinkov, Yoan and Ali, Ahmed and Glass, James and Nakov, Preslav
arXiv preprint arXiv:2005.04518, 2020
Publication year: 2020

SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020)

Article
Zampieri, Marcos and Nakov, Preslav and Rosenthal, Sara and Atanasova, Pepa and Karadzhov, Georgi and Mubarak, Hamdy and Derczynski, Leon and Pitenis, Zeses and cCöltekin, cCaugri
arXiv preprint arXiv:2006.07235, 2020
Publication year: 2020

A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

Article
Rosenthal, Sara and Atanasova, Pepa and Karadzhov, Georgi and Zampieri, Marcos and Nakov, Preslav
{journal}, 2020
Publication year: 2020

Tanbih: Get to know what you are reading

Article
Zhang, Yifan and Martino, Giovanni Da San and Barrón-Cedeno, Alberto and Romeo, Salvatore and An, Jisun and Kwak, Haewoon and Staykovski, Todor and Jaradat, Israa and Karadzhov, Georgi and Baly, Ramy and others
Publication year: 2019

Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media

fact-checking
Ramy Baly, Georgi Karadzhov, Abdelrhman Saleh, James Glass, Preslav Nakov
2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Publication year: 2019

In the context of fake news, bias, and propaganda, we study two important but relatively under-explored problems:(i) trustworthiness estimation (on a 3-point scale) and (ii) political ideology detection (left/right bias on a 7-point scale) of entire news outlets, as opposed to evaluating individual articles. In particular, we propose a multi-task ordinal regression framework that models the two problems jointly. This is motivated by the observation that hyper-partisanship is often linked to low trustworthiness, eg, appealing to emotions rather than sticking to the facts, while center media tend to be generally more impartial and trustworthy. We further use several auxiliary tasks, modelling centrality, hyperpartisanship, as well as left-vs.-right bias on a coarse-grained scale. The evaluation results show sizable performance gains by the joint models over models that target the problems in isolation.

Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search

dialogue systems
Pepa Atanasova, Georgi Karadzhov, Yasen Kiprov, Preslav Nakov, Fabrizio Sebastiani
Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval
Publication year: 2019

In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the” ten blue links” metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking follow-up questions in order to better understand the user’s intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user’s intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily.

BibTex:

@inproceedings{Atanasova:2019:EVM:3331184.3331308,
 author = {Atanasova, Pepa and Karadzhov, Georgi and Kiprov, Yasen and Nakov, Preslav and Sebastiani, Fabrizio},
 title = {Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search},
 booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
 series = {SIGIR'19},
 year = {2019},
 isbn = {978-1-4503-6172-9},
 location = {Paris, France},
 pages = {997--1000},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3331184.3331308},
 doi = {10.1145/3331184.3331308},
 acmid = {3331308},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {chatbots, evaluation measures, mobile search},
} 

Predicting factuality of reporting and bias of news media sources

fact-checking
Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, Preslav Nakov
Proceedings of the Conference on Empirical Methods in Natural Language Processing
Publication year: 2018

We present a study on predicting the factuality of reporting and bias of news media. While previous work has focused on studying the veracity of claims or documents, here we are interested in characterizing entire news media. These are under-studied but arguably important research problems, both in their own right and as a prior for fact-checking systems. We experiment with a large list of news websites and with a rich set of features derived from (i) a sample of articles from the target news medium,(ii) its Wikipedia page,(iii) its Twitter account,(iv) the structure of its URL, and (v) information about the Web traffic it attracts. The experimental results show sizable performance gains over the baselines, and confirm the importance of each feature type.

Fact checking in community forums

fact-checking
Tsvetomila Mihaylova, Preslav Nakov, Lluis Marquez, Barron-Alberto Cedeno, Mitra Mohtarami, Georgi Karadzhov, James Glass
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence
Publication year: 2018

Community Question Answering (cQA) forums are very popular nowadays, as they represent effective means for communities around particular topics to share information. Unfortunately, this information is not always factual. Thus, here we explore a new dimension in the context of cQA, which has been ignored so far: checking the veracity of answers to particular questions in cQA forums. As this is a new problem, we create a specialized dataset for it. We further propose a novel multi-faceted model, which captures information from the answer content (what is said and how), from the author profile (who says it), from the rest of the community forum (where it is said), and from external authoritative sources of information (external support). Evaluation results show a MAP value of 86.54, which is 21 points absolute above the baseline.

We Built a Fake News & Click-bait Filter: What Happened Next Will Blow Your Mind!

fact-checking
Georgi Karadzhov, Pepa Gencheva, Preslav Nakov, Ivan Koychev
Proceedings of the International Conference on Recent Advances in Natural Language Processing
Publication year: 2017

It is completely amazing! Fake news and click-baits have totally invaded the cyberspace. Let us face it: everybody hates them for three simple reasons. Reason #2 will absolutely amaze you. What these can achieve at the time of election will completely blow your mind! Now, we all agree, this cannot go on, you know, somebody has to stop it. So, we did this research on fake news/click-bait detection and trust us, it is totally great research, it really is! Make no mistake. This is the best research ever! Seriously, come have a look, we have it all: neural networks, attention mechanism, sentiment lexicons, author profiling, you name it. Lexical features, semantic features, we absolutely have it all. And we have totally tested it, trust us! We have results, and numbers, really big numbers. The best numbers ever! Oh, and analysis, absolutely top-notch analysis. Interested? Come read the shocking truth about fake news and click-bait in the Bulgarian cyberspace. You won’t believe what we have found!

The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

stylometry
Georgi Karadzhov, Tsvetomila Mihaylova, Yasen Kiprov, Georgi Georgiev, Ivan Koychev, Preslav Nakov
Best of the Labs Track at CLEF-2017
Publication year: 2017

Best-of-labs track paper

Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a text’s writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed
using stylometry, and as a result, the true identity of the author of a piece of text can be revealed even if s/he has tried to hide it. Thus, it could be helpful to design automatic tools that can help a person obfuscate his/her identity when writing text. In particular, here we propose an approach that changes the text so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative. The approach consists of three main steps: first, we calculate the values for some popular stylometric metrics that can indicate authorship; then we apply various transformations to the text, so that these metrics are adjusted towards the average level, while preserving the semantics and the soundness of the text; and finally, we add random noise. This approach turned out to be very efficient and yielded the best performance on the Author Obfuscation task at the PAN-2016 competition.

Fully automated fact checking using external sources

fact-checking
Georgi Karadzhov, Preslav Nakov, Llu Màrquez’is, Barron-Alberto Cedeno, Ivan Koychev
Recent Advances in Natural Language Processing - RANLP
Publication year: 2017

Given the constantly growing proliferation of false claims online in recent years, there has been also a growing research interest in automatically distinguishing false rumours from factually true claims. Here, we propose a general-purpose framework for fully-automatic fact checking using external sources, tapping the potential of the entire Web as a knowledge source to confirm or reject a claim. Our framework uses a deep neural network with LSTM text encoding to combine semantic kernels with task-specific embeddings that encode a claim together with pieces of potentially relevant text fragments from the Web, taking the source reliability into account. The evaluation results show good performance on two different tasks and datasets: (i) rumour detection and (ii) fact-checking of the answers to a question in community question answering forums.

SU@ PAN’2016: Author Obfuscation—Notebook for PAN at CLEF 2016

stylometry
Tsvetomila Mihaylova, Georgi Karadzhov, Preslav Nakov, Yasen Kiprov, Georgi Georgiev, Ivan Koychev
In Conference and Labs of the Evaluation Forum, CLEF, 2016
Publication year: 2016

The anonymity of a text’s writer is an important topic for some domains, such as witness protection and anonymity programs. Stylometry can be used to reveal the true author of a text even if s/he wishes to hide his/her identity. In this paper, we present our approach for hiding an author’s identity by masking their style, which we developed for the Author Obfuscation task, part of the PAN-2016 competition. The approach consists of three main steps: the first one is an evaluation of different metrics in the text that can indicate authorship; the second one is the application of various transformations, so that those metrics of the target text are adjusted towards the average level, while still keeping the meaning and the soundness of the text; as a final step, we are adding random noise to the text. Our system showed the best performance for masking the author style.