Hate speech detection using static BERT embeddings

AI-generated keywords: Hate Speech Social Media Platforms AI Systems Deep Learning BERT

AI-generated Key Points

Social media platforms have led to the emergence of hate speech as a major concern
Hate speech refers to abusive language that targets specific group characteristics with the intention of inciting violence
Some people are deliberately using social media platforms to spread hate by posting, sharing and commenting on hateful content
AI systems have been developed to flag such text but reducing false positives is a key challenge
Rajput et al. use ETHOS hate speech detection dataset and analyze the performance of a hate speech detection classifier by replacing or integrating word embeddings with static BERT embeddings
Neural networks perform better with static BERT compared to using other word embeddings
The paper provides a literature review on recent developments in deep learning technology for various applications across domains such as healthcare, image processing and natural language processing
The research community has shown keen interest in developing AI-assisted applications for detecting hate speech on social media platforms
Various approaches have been proposed for detecting hate speech using deep learning techniques, including convolutional neural networks, FastText, GloVe, CNN and LSTM with AraVec
Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based ML technique pre-trained on unlabeled data taken from Wikipedia and BookCorpus

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Gaurav Rajput, Narinder Singh punn, Sanjay Kumar Sonbhadra, Sonali Agarwal

arXiv: 2106.15537v1 - DOI (cs.CL)

License: CC BY 4.0

Abstract: With increasing popularity of social media platforms hate speech is emerging as a major concern, where it expresses abusive speech that targets specific group characteristics, such as gender, religion or ethnicity to spread violence. Earlier people use to verbally deliver hate speeches but now with the expansion of technology, some people are deliberately using social media platforms to spread hate by posting, sharing, commenting, etc. Whether it is Christchurch mosque shootings or hate crimes against Asians in west, it has been observed that the convicts are very much influenced from hate text present online. Even though AI systems are in place to flag such text but one of the key challenges is to reduce the false positive rate (marking non hate as hate), so that these systems can detect hate speech without undermining the freedom of expression. In this paper, we use ETHOS hate speech detection dataset and analyze the performance of hate speech detection classifier by replacing or integrating the word embeddings (fastText (FT), GloVe (GV) or FT + GV) with static BERT embeddings (BE). With the extensive experimental trails it is observed that the neural network performed better with static BE compared to using FT, GV or FT + GV as word embeddings. In comparison to fine-tuned BERT, one metric that significantly improved is specificity.

Submitted to arXiv on 29 Jun. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2106.15537v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

The increasing popularity of social media platforms has led to the emergence of hate speech as a major concern. Hate speech refers to abusive language that targets specific group characteristics, such as gender, religion or ethnicity, with the intention of inciting violence. With the expansion of technology, some people are deliberately using social media platforms to spread hate by posting, sharing and commenting on hateful content. This has been observed in incidents such as the Christchurch mosque shootings and hate crimes against Asians in the west where convicts were influenced by online hate text. AI systems have been developed to flag such text but one of the key challenges is reducing false positives (marking non-hate as hate) so that these systems can detect hate speech without undermining freedom of expression. In this paper, Rajput et al. use ETHOS hate speech detection dataset and analyze the performance of a hate speech detection classifier by replacing or integrating word embeddings (fastText (FT), GloVe (GV) or FT + GV) with static BERT embeddings (BE). Through extensive experimental trials, they observe that neural networks perform better with static BE compared to using FT, GV or FT + GV as word embeddings. In comparison to fine-tuned BERT, one metric that significantly improved is specificity. The paper also provides a literature review on recent developments in deep learning technology for various applications across domains such as healthcare, image processing and natural language processing. The research community has shown keen interest in developing AI-assisted applications for detecting hate speech on social media platforms. For instance, Badjatiya et al. proposed a deep learning approach validated on a dataset consisting of 16000 tweets marked either racist or sexist using convolutional neural networks and FastText. Rizos et al., on the other hand experimented with short-text data augmentation techniques in deep learning for hate speech classification using substitution-based augmentation (ThreshAug), word position augmentation (PosAug) and neural generative augmentation (GenAug). They achieved their best results by using GloVe + CNN + LSTM + BestAug where BestAug is a combination of PosAug and ThreshAug. Faris et al proposed a deep learning approach to detect hate speech in Arabic language context using CNN and LSTM with AraVec. The paper also introduces Bidirectional Encoder Representations from Transformers (BERT), a transformer-based ML technique pre-trained on unlabeled data taken from Wikipedia and BookCorpus. BERT uses two major strategies for training: masked language modeling (MLM) and next sentence prediction (NSP). MLM involves randomly masking 15% of the words in a sentence while NSP involves predicting if the second sentence in a pair is the subsequent sentence in original document .

- Social media platforms have led to the emergence of hate speech as a major concern
- Hate speech refers to abusive language that targets specific group characteristics with the intention of inciting violence
- Some people are deliberately using social media platforms to spread hate by posting, sharing and commenting on hateful content
- AI systems have been developed to flag such text but reducing false positives is a key challenge
- Rajput et al. use ETHOS hate speech detection dataset and analyze the performance of a hate speech detection classifier by replacing or integrating word embeddings with static BERT embeddings
- Neural networks perform better with static BERT compared to using other word embeddings
- The paper provides a literature review on recent developments in deep learning technology for various applications across domains such as healthcare, image processing and natural language processing
- The research community has shown keen interest in developing AI-assisted applications for detecting hate speech on social media platforms
- Various approaches have been proposed for detecting hate speech using deep learning techniques, including convolutional neural networks, FastText, GloVe, CNN and LSTM with AraVec
- Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based ML technique pre-trained on unlabeled data taken from Wikipedia and BookCorpus

1. Social media platforms can cause people to say mean things about others. 2. When someone says mean things about a group of people with the intention of causing harm, it's called hate speech. 3. Some people use social media to spread hate by posting or commenting on mean content. 4. Computers can help find and flag hateful words, but sometimes they make mistakes. 5. Scientists are trying to use a special computer program called BERT to better detect hate speech. Definitions- Hate speech: saying hurtful things about a group of people with the intention of causing harm - Social media: websites or apps where people can share information and communicate with each other online - AI systems: computers that can learn and make decisions like humans do - False positives: when a computer program thinks something is bad (like hate speech) but it's actually okay - Word embeddings: a way for computers to understand how different words relate to each other in language - BERT: a special type of computer program that helps computers understand language better

Hate Speech Detection on Social Media Platforms Using AI

AI Systems for Detecting Hate Speech

AI systems have been developed to flag such text but one of the key challenges is reducing false positives (marking non-hate as hate) so that these systems can detect hate speech without undermining freedom of expression. In this paper, Rajput et al. use ETHOS hate speech detection dataset and analyze the performance of a hate speech detection classifier by replacing or integrating word embeddings (fastText (FT), GloVe (GV) or FT + GV) with static BERT embeddings (BE). Through extensive experimental trials, they observe that neural networks perform better with static BE compared to using FT, GV or FT + GV as word embeddings. In comparison to fine-tuned BERT, one metric that significantly improved is specificity.

Recent Developments in Deep Learning Technology

The research community has shown keen interest in developing AI-assisted applications for detecting hate speech on social media platforms. For instance, Badjatiya et al proposed a deep learning approach validated on a dataset consisting of 16000 tweets marked either racist or sexist using convolutional neural networks and FastText. Rizos et al., experimented with short-text data augmentation techniques in deep learning for hate speech classification using substitution-based augmentation (ThreshAug), word position augmentation (PosAug) and neural generative augmentation (GenAug). They achieved their best results by using GloVe + CNN + LSTM + BestAug where BestAug is a combination of PosAug and ThreshAug. Faris et al proposed a deep learning approach to detect hate speech in Arabic language context using CNN and LSTM with AraVec . The paper also introduces Bidirectional Encoder Representations from Transformers (BERT), a transformer-based ML technique pre-trained on unlabeled data taken from Wikipedia and BookCorpus which uses two major strategies for training: masked language modeling (MLM) and next sentence prediction(NSP). MLM involves randomly masking 15%of words while NSP involves predicting if second sentence is subsequent sentence in original document .

Conclusion

In conclusion , there have been various developments over recent years towards building AI models capable enough to detect hateful content on social media platforms without compromising freedom of expression . Rajput et al’s research shows promising results when static BERT embedding was used instead other word embedding methods like fastText ,GloVe etc . However more work needs be done towards improving accuracy metrics like precision , recall & F1 score while minimizing false positives rate along with exploring different approaches such as data augmentation techniques & transfer learning methods which could help improve performance even further .

Created on 07 May. 2023

Assess the quality of the AI-generated content by voting

Score: 1

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

60.9%

BERT: A Review of Applications in Natural Language Processing and Understandi…

cs.CL

59.5%

Spam Review Detection Using Deep Learning

cs.CL

57.2%

Augmenting Interpretable Models with LLMs during Training

cs.AI

55.5%

BotTriNet: A Unified and Efficient Embedding for Social Bots Detection via Me…

cs.AI

55.4%

data2vec: A General Framework for Self-supervised Learning in Speech, Vision …

cs.LG

55.2%

ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language …

cs.CL

54.3%

BERT-DRE: BERT with Deep Recursive Encoder for Natural Language Sentence Matc…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.