Semistructured Merge: Rethinking Merge in Revision Control System \cite{Apel_2011}
Unstructured merge considers the lines of code to combine two versions of one code while structures merge aim to merge them by considering further information - as a tree - from the specific programming language and bring expressiveness to merging. This paper aims to find a proper point in the middle and bring expressiveness to the merge while preserving it general. Most of the unresolved conflicts are renaming conflicts which can be detected easily since the orders are permuted safety most of the programming languages. In this paper, specific merge handlers are added to grammar to define rules for merging. For instance, the order of methods is not important in Java. Therefore, if two different methods are added id different branches, this method simply add both of them. The authors comprehensively investigate the merge conflicts and found out that the large portion of merge conflicts are renaming conflicts. Using language-specific knowledge give the capability of handling such conflicts to their code and besides, its performance is still acceptable. The proposed method is implemented in C#, Java, and Python. Also, adding new languages is also easy.
GITCoP: A Machine Learning Based Approach to Predicting Merge Conflicts from Repository Metadata \cite{ziegler2017}
This MSc thesis aims to predict merge conflicts by using machine learning techniques. They use three datasets for their works, jdime-dataset, and two self-mined datasets by crawling GitHub (in C and Java). THey use the features of each branch and the conflict features separately and find out that the combination was more effective. They employ Decision Trees, Support Vector Machines, Naive Bayes, Logistic Regression, and Random Forest as classifiers and use AdaBoost to increase the classification performance. The validation process is quite acceptable since they use Accuracy, Precision, Recall, and F1-score altogether. Using all essential performance measures are especially important for this problem due to being imbalance. However, the feature selection and extraction could be better. First, only a few number of hand-picked features are employes. Besides, the code features are ignored. Finally, the features employed without any preprocessing or extraction process. As a suggestion, Principal Component Analysis can be employed to reduce the noise and increase the status of discriminative. From the classification point of view, the employed classifiers are basic and using state-of-the-art models may increase the performance.