Hate speech detection using static BERT embeddings

AI-generated keywords: Hate Speech Social Media Platforms AI Systems Deep Learning BERT

AI-generated Key Points

  • Social media platforms have led to the emergence of hate speech as a major concern
  • Hate speech refers to abusive language that targets specific group characteristics with the intention of inciting violence
  • Some people are deliberately using social media platforms to spread hate by posting, sharing and commenting on hateful content
  • AI systems have been developed to flag such text but reducing false positives is a key challenge
  • Rajput et al. use ETHOS hate speech detection dataset and analyze the performance of a hate speech detection classifier by replacing or integrating word embeddings with static BERT embeddings
  • Neural networks perform better with static BERT compared to using other word embeddings
  • The paper provides a literature review on recent developments in deep learning technology for various applications across domains such as healthcare, image processing and natural language processing
  • The research community has shown keen interest in developing AI-assisted applications for detecting hate speech on social media platforms
  • Various approaches have been proposed for detecting hate speech using deep learning techniques, including convolutional neural networks, FastText, GloVe, CNN and LSTM with AraVec
  • Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based ML technique pre-trained on unlabeled data taken from Wikipedia and BookCorpus
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Gaurav Rajput, Narinder Singh punn, Sanjay Kumar Sonbhadra, Sonali Agarwal

License: CC BY 4.0

Abstract: With increasing popularity of social media platforms hate speech is emerging as a major concern, where it expresses abusive speech that targets specific group characteristics, such as gender, religion or ethnicity to spread violence. Earlier people use to verbally deliver hate speeches but now with the expansion of technology, some people are deliberately using social media platforms to spread hate by posting, sharing, commenting, etc. Whether it is Christchurch mosque shootings or hate crimes against Asians in west, it has been observed that the convicts are very much influenced from hate text present online. Even though AI systems are in place to flag such text but one of the key challenges is to reduce the false positive rate (marking non hate as hate), so that these systems can detect hate speech without undermining the freedom of expression. In this paper, we use ETHOS hate speech detection dataset and analyze the performance of hate speech detection classifier by replacing or integrating the word embeddings (fastText (FT), GloVe (GV) or FT + GV) with static BERT embeddings (BE). With the extensive experimental trails it is observed that the neural network performed better with static BE compared to using FT, GV or FT + GV as word embeddings. In comparison to fine-tuned BERT, one metric that significantly improved is specificity.

Submitted to arXiv on 29 Jun. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2106.15537v1

The increasing popularity of social media platforms has led to the emergence of hate speech as a major concern. Hate speech refers to abusive language that targets specific group characteristics, such as gender, religion or ethnicity, with the intention of inciting violence. With the expansion of technology, some people are deliberately using social media platforms to spread hate by posting, sharing and commenting on hateful content. This has been observed in incidents such as the Christchurch mosque shootings and hate crimes against Asians in the west where convicts were influenced by online hate text. AI systems have been developed to flag such text but one of the key challenges is reducing false positives (marking non-hate as hate) so that these systems can detect hate speech without undermining freedom of expression. In this paper, Rajput et al. use ETHOS hate speech detection dataset and analyze the performance of a hate speech detection classifier by replacing or integrating word embeddings (fastText (FT), GloVe (GV) or FT + GV) with static BERT embeddings (BE). Through extensive experimental trials, they observe that neural networks perform better with static BE compared to using FT, GV or FT + GV as word embeddings. In comparison to fine-tuned BERT, one metric that significantly improved is specificity. The paper also provides a literature review on recent developments in deep learning technology for various applications across domains such as healthcare, image processing and natural language processing. The research community has shown keen interest in developing AI-assisted applications for detecting hate speech on social media platforms. For instance, Badjatiya et al. proposed a deep learning approach validated on a dataset consisting of 16000 tweets marked either racist or sexist using convolutional neural networks and FastText. Rizos et al., on the other hand experimented with short-text data augmentation techniques in deep learning for hate speech classification using substitution-based augmentation (ThreshAug), word position augmentation (PosAug) and neural generative augmentation (GenAug). They achieved their best results by using GloVe + CNN + LSTM + BestAug where BestAug is a combination of PosAug and ThreshAug. Faris et al proposed a deep learning approach to detect hate speech in Arabic language context using CNN and LSTM with AraVec. The paper also introduces Bidirectional Encoder Representations from Transformers (BERT), a transformer-based ML technique pre-trained on unlabeled data taken from Wikipedia and BookCorpus. BERT uses two major strategies for training: masked language modeling (MLM) and next sentence prediction (NSP). MLM involves randomly masking 15% of the words in a sentence while NSP involves predicting if the second sentence in a pair is the subsequent sentence in original document .
Created on 07 May. 2023

Assess the quality of the AI-generated content by voting

Score: 1

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.