Flagging hate speech and propaganda on Twitter using AI
1 follower
Bleepr uses natural language processing (NLP) to automatically flag hate speech published on social media. In our first release, we scan Tweets of users with more than 10k followers, automatically label for 7 dimensions of unsafe language, and update daily.
We launched Bleepr to hold social media platforms accountable to the removal of harmful, hateful and propaganda content. We uncover the bleeps posted on social media - many of which are about the current U.S presidential election, with the help of AI. The public is told that the platforms are doing a lot of work to take down abuse, but have few ways of knowing if it is working or not apart from what the platforms self-report. The internet also mainly focuses on the terms of service violations that make the headlines. The purpose of Bleepr is to focus more on the less popular cases of abuse which still exist but are not being highlighted by anyone.
How it works
We scan tweets in bulk every day. We look at around 500 tweets in more detail - and only those which were posted by twitter profiles with more than 10K+ followers. We run our ML algorithms on those tweets. Every tweet which has at least one of the automated labels ((sexism, racism, hate speech, obscenity, political bias, toxicity, insults and threat level)) with a score above a certain threshold is then listed on bleepr.ai. We highlight these labels in red.
Our AI is trained using expert annotators from communities like the International League against Racism and Anti-Semitism (LICRA) and Jugenschutz in Germany, along with other specialist anti-abuse advocacy groups.
Known issues
- Sexism detection flags up higher than it should sometimes for words like ‘bi**h’ and ‘motherf****r’ even in contexts which are not hateful.
- There are a few bleeps which contain swear words but are random and obscure and not truly hate speech
- Sometimes we flag news stories about controversial topics like homophobia, paedophile or sexual abuse, rather than original hate speech
- We sometimes flag Tweets don’t exist anymore, which means Twitter has luckily already taken them down or authors have removed them.
- Sometimes we flag Tweets that talk about political entities in a fairly coherent way, but use the words idiot a lot
Future releases include using human curation of Bleeps using a small curated QA community, adding more social media platforms to Bleepr, improving the models, and having live refresh of likes/shares/retweets.
Help us make social media a safer space. You can do this by clicking on the AGREE/DISAGREE WITH AI buttons which are shown under every bleep on bleepr.ai. We will record your anonymous feedback and use it to make our algorithms better.
If you would like to contribute to our algorithms or join our community to take Bleepr to take it further, please contact us on info@factmata.com. We are just getting started.
@factmata@dghulatifactmata@pondechan this is free for now, but we do charge for the API underlying it for enterprise customers at around $0.05 per post going down to $0.01 for high volumes.
@factmata@dghulatifactmata@jay_bienvenu For example, what about a caption "Check out this little motherf***er! So cute" for an image of a pet dog stealing a treat ? Or a rap lyric? Or a film transcript?
Bleepr
Bleepr
Bleepr
CopyCat AI