Preprocessing
  1. Category variables cabin_flown  and   type_traveller  were converted using get_dummies() method of pandas library which basically does one hot encoding of each category.
  2. NaN column created as a result of  one hot  encoding of  cabin_flown  was renamed to avoid conflict with  type_traveller  
  3. Missing values were set to -99999 so that the algorithm can identify them as missing values
Strategy for selection:
Crossvalidation:In order to select the best model  train_test_split  method from sklearn.cross_validation was used to split the training data into training and test data where 70% of data was used for testing while 30% of the data was used for testing.
Metrics:
Metrics such as  Accuracy score ,Cohen’s kappa and roc_auc_score were calculated using all the models namely naive Bayes, decision tree classification for multiple crosses and  KNeighborsClassifier was found to have the maximum accuracy and therefore was selected.
For selecting the ideal value of k cross validation was again used and a plot was made between accuracy and value of k and 14 was found to have the maximum accuracy during multiple cross validations.
Boosting and scaling of data was tried but did not yield and improvement in accuracy.
Techniques for Increasing accuracy: