Domain adaptation-based method for improving generalization of hate speech detection models
Today, with the growth of activity in social media, we see an increase in hate speech online and for this reason, the issue of recognizing hate in cyberspace is important. Also, domain adaptation is one of the important challenges in this task and in general in the field of natural language processing. In many issues, while changing the domain, we face a drop in performance, which is also true in the task hate speech. In this research, we try to increase the generalizability of hate detection models by using domain adaptation methods. For this purpose, we use Transformer-based methods, including domain adversarial training and mixture of experts, and we also use multi-source training. Experiments are conducted using four datasets in the domain of hate. At first, we evaluate the models in an in-domain and single-source manner. In the next step, by adding other domains to the education section, we see a drop in results and a negative transfer. Then we perform the out-of-domain tests first as a single source with the DistilBERT model, which significantly reduces the results by changing the domain. In order to increase the power of domain adaptation of the model in the out-of-domain part, we perform the training on several sources, leads to improve the results in about half of the cases, which is not significant. In the following, we try to increase the domain adaptation power of the models, using transformer-based methods including domain adversarial training and the mixture of experts, which leads to increase in performance in 87% of multi-source out-of-domain tests. Of course, these methods are also effective in the performance of in-domain tests. An important issue that sometimes causes a significant drop in results is datasets. The similarity of the data and the similarity of the distribution of some domains increase the power of domain adaptation of the model and on the contrary.