loading page

Towards a systematic approach to manual annotation of code smells
  • +4
  • Nikola Luburić ,
  • Simona Prokić ,
  • Katarina-Glorija Grujić ,
  • Jelena Slivka ,
  • Aleksandar Kovačević ,
  • Goran Sladić ,
  • Dragan Vidaković
Nikola Luburić
Author Profile
Simona Prokić
Author Profile
Katarina-Glorija Grujić
Author Profile
Jelena Slivka
Faculty of Technical Sciences, Faculty of Technical Sciences, Faculty of Technical Sciences, Faculty of Technical Sciences

Corresponding Author:[email protected]

Author Profile
Aleksandar Kovačević
Author Profile
Goran Sladić
Author Profile
Dragan Vidaković
Author Profile

Abstract

This is a preprint of an article published in the Science of Computer Programming. The final peer-reviewed publication is available online at:
https://doi.org/10.1016/j.scico.2023.102999
Code smells are structures in code that indicate the presence of maintainability issues. A significant problem with code smells is their ambiguity. They are challenging to define, and software engineers have a different understanding of what a code smell is and which code suffers from code smells.
A solution to this problem could be an AI digital assistant that understands code smells and can detect (and perhaps resolve) them. However, it is challenging to develop such an assistant as there are few usable datasets of code smells on which to train and evaluate it. Furthermore, the existing datasets suffer from issues that mostly arise from an unsystematic approach used for their construction.
Through this work, we address this issue by developing a procedure for the systematic manual annotation of code smells. We use this procedure to build a dataset of code smells. During this process, we refine the procedure and identify recommendations and pitfalls for its use. The primary contribution is the proposed annotation model and procedure and the annotators’ experience report. The dataset and supporting tool are secondary contributions of our study. Notably, our dataset includes open-source projects written in the C# programming language, while almost all manually annotated datasets contain projects written in Java.
Aug 2023Published in Science of Computer Programming volume 230 on pages 102999. 10.1016/j.scico.2023.102999