Previous research work \cite{Apel_2011} shows that semistructured merge detects the smaller amount of conflicts but it does not mean that this technique can necessarily increase the productivity of developers.  The reproducing more than 30,000 merge scenarios of 50 projects illustrates that semistructured merge can notably decrease false positive so a large portion of conflicts is actually correctly detected. However,  this method leads false negatives that are much harder to detect and resolve.  To reduce the false positive and false negative cases, further merge handler are introduced in this paper. (1) Renaming handler is for decreasing false positives. Also for false negatives,(2) type ambiguity error handler, (3) new element referencing edited one handler, and (4) initialization block handler are added to the essential technique.

GITCoP: A Machine Learning Based Approach to Predicting Merge Conflicts from Repository Metadata \cite{ziegler2017}

This MSc thesis aims to predict merge conflicts by using machine learning techniques. They use three datasets for their works, jdime-dataset, and two self-mined datasets by crawling GitHub (in C and Java). THey use the features of each branch and the conflict features separately and find out that the combination was more effective. They employ Decision Trees, Support Vector Machines, Naive Bayes, Logistic Regression, and Random Forest as classifiers and use AdaBoost to increase the classification performance. The validation process is quite acceptable since they use Accuracy, Precision, Recall, and F1-score altogether. Using all essential performance measures are especially important for this problem due to being imbalance. However, the feature selection and extraction could be better. First, only a few number of hand-picked features are employes. Besides, the code features are ignored. Finally, the features employed without any preprocessing or extraction process. As a suggestion, Principal Component Analysis can be employed to reduce the noise and increase the status of discriminative. From the classification point of view, the employed classifiers are basic and using state-of-the-art models may increase the performance.