loading page

Identifying Machine-Paraphrased Plagiarism
  • +2
  • Jan Philip Wahle,
  • Terry Ruas,
  • Tomas Foltynek,
  • Norman Meuschke,
  • Bela Gipp
Jan Philip Wahle
Mendel University in Brno, University of Wuppertal

Corresponding Author:[email protected]

Author Profile
Terry Ruas
Tomas Foltynek
Norman Meuschke
Bela Gipp

Abstract

Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity. To enable the detection of machine-paraphrased text, we evaluate the effectiveness of five pre-trained word embedding models combined with machine learning classifiers and state-of-the-art neural language models. We analyze preprints of research papers , graduation theses, and Wikipedia articles, which we paraphrased using different configurations of the tools SpinBot and SpinnerChief. The best performing technique, Longformer, achieved an average F1 score of 80.99% (F1=99.68% for SpinBot and F1=71.64% for Spinner-Chief cases), while human evaluators achieved F1=78.4% for SpinBot and F1=65.6% for SpinnerChief cases. We show that the automated classification alleviates shortcomings of widely-used text-matching systems , such as Turnitin and PlagScan. To facilitate future research, all data 3 , code 4 , and two web applications 56 showcasing our contributions are openly available.