Introduction
In today's world, correct diagnosis of disease has become an essential part in medical science. Incorrect diagnosis of diseases lead to endangerment of lives on a daily basis. So, computer based diagnostic machine systems can play an important role in accurate diagnosis of diseases. Also, there is a huge availability of data of patients - from the initial reports to the treatments to the prescriptions and the follow-ups.
However, there is a very improper organization of huge amount of data which is affecting the quality of decision making. This increase in the volume of bulk of data requires some way in which it can be organized, extracted and processed efficiently.
Health care industry today provide a lot of applications like treatment effectiveness, Health-Care management, Customer Relationship management and Pharmaceutical management among other things. One such application is the discovery of patterns and relationships among clinical and diagnostic data using machine learning techniques.
In the past decade, machine learning has paved the way for a lot of features like self-driving cars, practical speech recognition, effective web searches, and a vastly improved understanding of the human genome. They are data driven approaches, mainly designed to discover statistical patterns in high dimensional, multivariate data sets, frequently found in electronic health records. Pattern identification in machine learning makes it a powerful tool for predictions and decision making process for diagnosis and treatment planning.
The classification learning algorithm in most of diagnostic cases are composed of two main components : training and classification. The training phase involves a formation of model based on some previously trained examples of that domain. The classification phase, using this model, tries to predict the class that a new instance belongs to. The main requirement of such a system is prediction accuracy. The time taken by such a system to predict classes should also have a short training and prediction time. Such a system should also be robust to noisy training instances. Both training and test instances may have some missing values. That should also be dealt with by that system. The features that are used to encode the instances may also have different levels of relevancy to the domain. There are many more factors that needs to be dealt with by that system.
Moreover, the use of classification algorithm in case of diagnosis of diseases is two fold. First, the actual doctors can check and verify the learned classification knowledge before it is deployed into the real world. Moreover, it may shed some light on some previously unknown pattern or fact, leading to newer discoveries in the field.