THE PREDICTION OF OUTCOMES RELATED TO THE USE OF NEW DRUGS IN THE REAL WORLD THROUGH ARTIFICIAL ADAPTIVE SYSTEMS.
Enzo Grossi & Massimo Buscema
Semeion Research Centre
Research Centre of Sciences of Communication
Via Sersale 117, Rome, 00128, Italy
In this brief essay we will focus three main problems related to the use of new medications in the real world: 1) the prediction of drug response in individual patients; 2) the prediction of rare unwanted events after introduction of the new drug on the market; 3) the passage from preclinical phase to Phase I in human beings. The first problem is specifically felt by medical doctors who are asked to treat their patients as individuals rather than as statistics, but we have to note that, with the advent of extremely costly new drugs, also health authorities or private insurance organizations are looking for potent tools to personalize treatment plans. The second problem is typically sensed by drug agencies which sometimes are forced to withdraw marketing authorization as a bunch of deaths drug related creates rumors and disappointments at media level, while the third problem is specifically felt by Pharmaceutical Companies and Institutional Review Boards releasing the clearance for first in man trials.
1.Prediction of drug response in individual patient
Making predictions for specific outcomes (diagnosis, risk assessment, prognosis) represents a fascinating aspect of medical science. Different statistical approaches have been proposed to define models to identify factors that are predictive for the outcome of interest. Studies have been performed to define the clinical and biological characteristics that could be helpful in predicting who will benefit from an antiobesity drug for example, but results have been limited (1).
Traditional statistical approaches encounter problems when the data show big variability and not easily normalized for inherent nonlinearity. More-advanced analysis techniques, such as dynamic mathematical models, can be useful because they are particularly suitable for solving nonlinear problems frequently associated with complex biological systems.
Use of ANNs in biological systems has been proposed for different purposes, including studies on deoxyribonucleic acid sequencing (2) and protein structure (3).
ANNs have been used in different clinical settings to predict the effectiveness of instrumental evaluation (echocardiography, brain single photon emission computed tomography, lung scintigram, prostate biopsy) in increasing diagnostic sensitivity and specificity and in laboratory medicine in general (4). Also, they have proven effective in identifying gastro-oesophageal refux patients on the sole basis of clinical data (5). But the most promising application of ANNs relates to prediction of possible clinical outcomes with specific therapy. ANNs have proven effective in detecting responsiveness to methadone treatments of drug addicts (6), to pharmacological treatment in Alzheimer disease (7), to clozapine in schizophrenic patients (8) and in various fields of psychiatric research (9).
The use of ANNs for predictive modelling in obesity dates back to a decade ago, where it was proposed to model the waist-hip ratio from 13 other health parameters (10). Later, it has been proposed as a tool for body composition research (11).
One of the main factors preventing a more efficient use of new pharmacological treatments for chronic diseases like for example hypertension, cancer, Alzheimer disease or obesity is represented by the difficulty of predicting “a priori” the chance of response of the single patient to a specific drug. A major methodological setback in drawing inferences and making predictions from data collected in the real world setting, such as observational studies, is that variability in the underlying biological substrates of the studied population and the quality and content of medical intervention influence outcomes. Because there is no reason to believe that these, like other health factors, work together in a linear manner, the traditional statistical methods, based on the generalized linear model, have limited value in predicting outcomes such as responsiveness to a particular drug.
Most studies have shown that up to 50% of patients treated with new molecules given in monotherapy or as an adjunct to standard treatments may show an unsatisfactory response. As a matter of fact, when time comes for the physician to decide about type of treatment, there is very little evidence that can help her/him in drug treatment choice. Take for example obesity. Here only scanty data are available on predictive factors to the specific treatment, and attempts at developing models for predicting response to the drug by using traditional techniques of multiple regression have showed an unsatisfactory predictive capacity (i.e. inferior to 80% of total variance). (12, 13). A possible explanation could be that obesity is a so-called complex disease, where different factors interact with multiple interactions among variables, positive and negative feedback loops, and non-linear system dynamics. Another good example is Alzheimer Disease.
Clinical trials have established the efficacy of cholinesterase inhibitor drugs (ChEI), such as tacrine,
(14) donepezil, (15) and rivastigmine (16) based on improvement in cognitive aspects and in overall functioning using the Alzheimer’s Disease Scale—Cognitive subscale (ADAS-Cog) and the
Clinician’s Interviewed Based Impression of Change (CIBIC) , respectively. Although the mean score of treated patients in both scales was significantly higher than the placebo group, many subjects under active treatment showed little or no improvement (nonresponders).
However it is not possible to estimate which patients are likely to respond to pharmacological therapy with ChEI. This prediction would be an important decision-making factor in improving the use of healthcare resources.
A major methodological setback in drawing inferences and making predictions from data collected in the real world setting, such as observational studies, is that variability in the underlying biological substrates of the studied population and the quality and content of medical intervention
influence outcomes. Because there is no reason, a priori, to believe that these, like other health factors, work together in a linear manner, the traditional statistical methods, based on the generalized linear model, have limited value in predicting outcomes such as responsiveness to
a particular drug.
A possible alternative approach to the solution of the problem is represented by the use of Neural Networks. Artificial Neural Networks (ANN) represent computerized algorithms resembling interactive processes of human brain. They allow to study very complex non-linear phenomena like biological systems. Like the brain, ANNs recognize patterns, manage data, and, most significantly, learn. These statistical-mathematical tools can determine the existence of a correlation between series of data and a particular outcome and when “trained” can predict output data once given the input. They work well in pattern recognition and discrimination tasks.
Although ANNs have been applied to various areas of medical research, they have not been employed in obesity clinical pharmacology.
ANN proved to be a useful method to discriminate between responders and non-responders, better than traditional statistical methods in our three experimental studies carried out with donepezil in Alzheimer disease and with sibutramine in obesity and with infliximab in Crohn disease.
In a paper published in 2002 (7) we have evaluated the accuracy of artificial neural networks compared with discriminant analysis in classifying positive and negative response to the cholinesterase inhibitor donepezil in a opportunistic group of 61 old patients of both genders affected by Alzheimer’s disease (AD) patients in real world setting along three months follow-up.
Accuracy in detecting subjects sensitive (responders) or not (nonresponders) to therapy was based on the standard FDA criterion standard for evaluation of efficacy: the scores of Alzheimer’s Disease Assessment Scale—Cognitive portion and Clinician’s Interview Based Impression of Change—plus scales. In this study ANNs were more effective in discriminating between responders and nonresponders than other advanced statistical methods, particularly linear discriminant analysis. The total accuracy in predicting the outcome was 92.59%.
In a second study we evaluated the use of artificial neural networks in predicting response to infliximab treatment in patients with Crohn's disease(18).
In this pilot study , different ANN models were applied to a data sheet with demographic and clinical data from 76 patients with steroid resistant/dependant or fistulizing CD treated with Infliximab to compare accuracy in classifying responder and non responder subjects with that of linear discriminant analysis.
Eighty one outpatients with CD (31 men, 50 women; mean age± standard deviation 39.9 ± 15 range: 12-81 ) partecipating to an Italian Multicentric Study (17) , were enrolled in the study. All patients were treated, between April 1999 and December 2003, with a dose of Infliximab 5 mg/kg of body weight for luminal refractory (CDAI > 220–400) (43 patients), fistulizing CD (19 patients) or both of them (14 patients) .
The final data sheet consisted of 45 independent variables related to the anagraphic and anamnestic data (sex, age at diagnosis, age at infusion , smoking habit, previous Crohn’s related abdominal surgery [ileal or ileo-cecal resections] and concomitant treatments including immunomodulators and corticosteroids ) and to clinical aspects ( location of disease, perianal disease, type of fistulas, extraintestinal manifestations, clinical activity at the first infusion [CDAI], indication for treatment). Smokers were defined as those smoking a minimum of 5 cigarettes per day for at least 6 months before their first dose of Infliximab. Non smokers were defined as those who had never smoked before, those who had quit smoking at least 6 months before their first dose of Infliximab, or those who smoked fewer than 5 cigarettes per day. Concomitant immunosuppressive use was defined as initiation of methotrexate before their first Infliximab infusion or initiation of 6-mercaptopurine (6-MP) or azathioprine more than 3 months before their first Infliximab infusion.
Assessment of response was determined by clinical evaluation 12 weeks after the first infusion for all patients. Determination of response in patients with inflammatory CD was based on the Crohn’s Disease Activity Index (CDAI). For clear - cut estimate clinical response was evaluated as complete response or partial / no response.
Complete response was defined as (a) clinical remission (CDAI < 150) in luminal refractory disease and (b) temporary closure of all draining fistulas at consecutive visits in the case of enterocutaneous and perianal fistulas; entero-enteric fistulas were evaluated by small bowel barium enema and vaginal vescical fistula by lack of drainage at consecutive visits . For patients with both indications the outcome was evaluated independently for each indication.
Two different experiments were planned following an identical research protocol. The first one included all 45 independent variables including frequency and intensity Crohn disease symptoms, plus numerous other social and demographic characteristics, clinical features and history. In the second experiment the IS system coupled to the T&T system automatically selected the most relevant variables and therefore 22 variables were included in the model.
Discriminant analysis was also performed on the same data sets to evaluate the predictive performance of this advanced statistical method by a statistician blinded to ANN results. Different models were assessed to optimise the predictive ability. In each experiment the sample was randomly divided into two sub-samples, one for the training phase and the other for the testing phase, with the same record distributions used for ANN validation.
ANNs reached an overall accuracy rate of 88% while LDA performance was only of 72%.
Finally in a third study we evaluated the performance of ANN in predicting response to Warfarin(17).
A total of 377 patients were included in the analysis. The most frequent clinical indication for anticoagulation was atrial fibrillation (69%); other indications included heart valve prosthesis (10%) and pulmonary embolism (8%). The large majority of patients, 325, 86%) were on concurrent drug treatment: on average, they were taking 3 (IQR 1-4) medications potentially interacting with warfarin. The median weekly maintenance dose (WMD) of warfarin was 22.5 mg (IQR 16.3-28.8mg). Thirteen patients whose INR values were not within thetherapeutic range were erroneously included in the analysis: their median weekly maintenance dose was 21.4 mg (IQR 12.2-30.0 mg), the INR was higher than 3.0 (INR 3.7 and 4.3) in 2, and lower than 2.0 in 11 (median INR 1.5, IQR 1.5-1.7).
Demographic, clinical and genetic data (CYP2C9 and VKORC1 polymorphisms) were used. The final prediction model was based on 23 variables selected by TWIST® system within a bipartite division of the data set (training and testing) protocol.
TWIST system is based on a population of n ANNs, managed by an evolutionary system able to extracts from the global dataset the best training and testing sets and to evaluate the relevance of the different variables of the dataset in a sophisticated way, slecting the most relevant for the problem on study.
The ANN algorithm reached high accuracy, with an average absolute error of 5.7 mg of the warfarin maintenance dose. In the subset of patients requiring ≤21 mg and 21-49 mg (45 and 51% of the cohort, respectively) the absolute error was 3.86 mg and 5.45 with a high percentage of subjects being correctly identified (71 and 73%, respectively). This performance is higher than those obtained in different studies carried out with traditional statistical techniques. In conclusion ANN appears to be a promising tool for vitamin K antagonist maintenance dose prediction.
1.1 Selection of informative variables: how evolutionary algorithms work
To include only the most informative of the available variables we used a genetic algorithm, called the Genetic Doping Algorithm , which uses the principles of evolution to optimize the training and testing sets and to select the minimum number of variables capturing the maximum amount of available information in the data. Contrary to statistical linear models using indicator variables, TWIST does not require the omission of a reference category. This is due to the focus of the artificial neural network on prediction rather than estimation. If some of the indicator variables can completely account for the predictive ability of the others, those will be excluded by the algorithm during the selection process. The method is called the TWIST protocol and has been previously applied successfully in similar problems [20,21]. The advantages of the approach are the sub-setting of the data in two representative sets for training and testing, which is problematic in small datasets, and the use of a combination of criteria to determine the fit of the model. TWIST is comprised of two systems, the T&T for resampling of the data and the IS for feature selection, both using artificial neural networks (ANNs). The T&T system splits the data into training and testing sets in such a way that each subset is statistically representative of the full sample. This non-random selection of subsets is crucial when small samples are considered and the selection of non-characteristic and extreme subsets is likely. The IS system uses the training and testing subsets produced to identify a vector of 0s and 1s, describing the absence or presence of an indicator variable, that is able to optimize the categorization of the individuals in cases and controls compared to their observed status. For this, a population of vectors, with each vector a combination of the indicator variables, is allowed to “evolve” through a number of generations in order optimize the prediction of target variable, as a natural population evolves to optimize fitness under a specific set of environmental conditions. The vectors with the best predictive ability are overrepresented in the next generation while a smaller number of sub-optimal vectors are maintained to give rise to the following generation. Some instability, in the form of low predictive ability vectors, is introduced in the process to avoid the problem of finding a solution which is optimal under a narrow set of conditions, also known as a local optimum. This step ensures that the attributes do not include redundant information or noise variables that will decrease the accuracy of the map and increase both the computing time and the amount of examples necessary during learning. In addition, feature selection permits the easier interpretation of the graph of relationships between the variables.
2. Prediction of rare unwanted events
Drug-induced injuries are a growing concern for health authorities. The world population is continuously growing older because of an increased life expectancy and is thus using more and more drugs, whether prescription or over-the-counter drugs.
Therefore, chances of drug-induced injuries are rising. Over the years, a number of postmarketing labelling changes or drug withdrawals from the market due to postmarketing discoveries have occurred. Even the best planned and carefully designed clinical studies have limitations. To detect all potential adverse drug reactions, you need quite a large number of subjects exposed to the drug and the number of subjects participating in the clinical studies might not be large enough to detect especially rare adverse drug reactions.
To minimise the risk of postmarketing discoveries such as unrecognised adverse drug reactions, certain risk factors, e.g. laboratory or ECG abnormalities, are subject of increased regulatory review.
The most frequent cause of safety-related withdrawal of medications (e.g. bromfenac, troglitazone) from the market and for FDA non-approval is the drug-induced liver injury (DILI). Different degrees of liver enzyme elevations after drug intake can result in hepatotoxicity, which can be fatal due to the irreversible damage to the liver. Since animal models cannot always predict human toxicity, drug-induced hepatotoxicity is often detected after market approval. In the United States, DILI is contributing to more than 50% of acute liver failure cases (data from WM Lee and colleagues from the Acute Liver Failure Study Group).
The second leading cause for withdrawing approved drugs from the market is QT interval prolongation, which can be measured during electrocardiogram (ECG). Some non-cardiovascular drugs (e.g. terfenadine) have the potential to delay cardiac repolarisation and to induce potentially fatal ventricular tachyarrhythmias such as Torsades de Pointes.
Drug toxicity is also a common cause of acute or chronic kidney injury and can be minimised or prevented by vigilance and early treatment. NSAIDs, aminoglycosides, and calcineurin inhibitors are for example some drugs that are known to induce kidney dysfunction. Most events are reversible, with kidney function returning to normal when the drug is discontinued.
Consequently, the pharmaceutical industry has a strong interest to identify drugs bearing the risk of causing adverse drug reactions as early as possible in order to improve the drug development programme.
A patient developing a severe side effect to a particular medication can be considered an outlier. Suppose that you are deriving probabilities of future occurrences of severe side effects from the data collected in large clinical trials carried out before the commercialization of your product. These trials provide healthy authorities that your product is effective and safe and so that its deserves a registration or marketing authorization.
Now, say that you estimate that an event happens every 1,000 patients treated. You will need a lot more data than 1,000 patients to ascertain its frequency, say 3,000 patients. Now, what if the event happens once every 5,000 patients? The estimation of this probability requires some larger number, 15,000 or more. The smaller the probability, the more observations you need, and the greater the estimation error for a set number of observations. Therefore, to estimate a rare event you need a sample that is larger and larger in inverse proportion to the occurrence of the event.
If small probability events carry large impacts ( in this example death), and (at the same time) these small probability events are more difficult to compute from past data itself, then our empirical knowledge about the potential contribution—or role—of rare events (probability × consequence) is inversely proportional to their impact. The future challenge in this particular setting will be to derive from the limited amount of information available in the pre registration phase of drug development subtle, weak but true signals that something will go bad in the future after the marketing approval, when the new drug will be exposed in the real world to hundreds of thousand of subjects, a twofold increase in order of magnitude in comparison with pre-registration experience. These patients of the real world will be very different from patients encountered in the phase 3 clinical trial, generally speaking “clean patients” i.e. no concomitant disease, few concomitant treatments, age not beyond a certain value, good compliance, an so on. On the other hand in the post marketing phase the new drug will be exposed to “dirty patients” i.e subjects with substantial co-morbidity, many concomitant treatments, extreme age, poor compliance ( which mean also taking by mistake or intentionally excess of drug in the attempt to compensate for missed assumptions of the drug). Some artificial adaptive systems based on a new mathematics, could be able to learn from a large phase 3 study the hidden links among rare events and a particular profile of a patients even if no patients with such a particular profile actually exists in the data set.
There are basically two possibilities: the first is to use an “associative memory” or autoassociative artificial neural network able to navigate in the hypersurface of a dataset in search of rare occurrence linked to a particular assembly of variables; the second is to use a pseudo-inverse function coupled with an evolutionary algorithms able to repopulate a specific probability density function distribution with virtual records enabling the search of rare events, not available in the original data set.
2.1 Autoassociative artificial neural networks.
The NR-NN is a new recurrent network provided with a new powerful algorithm (“Re-Entry” by the Semeion Research Centre), that can dynamically adapt its trajectory to answer according to the different questions, during the recall phase.
This new artificial neural network, developing an associative memory, can identify the best possible connection between variables and can generate alternative data scenarios to follow the dynamic effects. During the training phase, the algorithm optimizes the weight of all the possible interconnections between variables in order to minimize the error. The training phase is followed by a rigorous validation protocol which foresees the correct reconstruction of variables that are randomly deleted by each record.
During the querying phase of the database, the NR-NN can answer the following questions:
• prototypical question (the characteristic prototype of a patient with a particular side effect or without a particular side effect),
• virtual question (the prototypical profile of a patient having a side effect with specific characteristics, even if no subject with these variables is actually present in the data set),
These special dynamics of NR-NN allow us to distinguish 3 types of variables:
• Discriminant variables: variables that are "switched on" only for a specific prototype;
• Indifferent variables: variables that are "switched off" for both prototypes;
• Metastable variables: variables that are "switched on" for both prototypes; in other words t