Identifying Arabic-Language Political Misinformation on Twitter


Social media platforms like Twitter struggle with political misinformation. One of the main challenges in fighting misinformation is the lack of labelled datasets, especially in low-resource languages.

This workshop will present a unique hybrid approach to NLP topic-modeling. We’ll apply it to a dataset of 36+ million Arabic tweets, tweeted by users that have been flagged by Twitter as part of state-linked information operations.

The hybrid topic clustering model is able to successfully extract the political content from ‘spam’. By applying further clustering algorithms, the model then aims identify the political content that is specifically misinformation. This poses technical challenges that require non-trivial innovative solutions; there’ll be plenty of room for brainstorming and discussion here. So bring your creative hats, we’ll need them!