Big Data promises to advance science through data-driven discovery. However, many standard lab protocols rely on manual examination, which is not feasible for large-scale datasets. Meanwhile, automated approaches lack the accuracy of expert examination. We propose to 1) start with expertly labeled data, 2) amplify labels through web applications that engage citizen scientists, and 3) train machine learning on amplified labels, to emulate the experts. Demonstrating this, we developed a system to quality control brain magnetic resonance images. Expert-labeled data were amplified by citizen scientists through a simple web interface. A deep learning algorithm was then trained to predict data quality, based on citizen scientist labels. Deep learning performed as well as specialized algorithms for quality control (AUC=0.99). Combining citizen science and deep learning can generalize and scale expert decision making; this is particularly important in disciplines where specialized, automated tools do not yet exist.
Big Data promises to advance science through data-driven discovery. However, many standard lab protocols rely on manual examination, which is not feasible for large-scale datasets. Meanwhile, automated approaches lack the accuracy of expert examination. We propose to 1) start with expertly labelled data, 2) amplify labels through web applications that engage citizen scientists, and 3) train machine learning on amplified labels, to emulate the experts. Demonstrating this, we developed a system to quality control brain magnetic resonance images. Expert-labeled data were amplified by citizen scientists through a simple web interface. A deep learning algorithm was then trained to predict data quality, based on citizen scientist labels. Deep learning performed as well as specialized algorithms for quality control (AUC=0.99). Combining citizen science and deep learning can generalize and scale expert decision making; this is particularly important in disciplines where specialized, automated tools do not yet exist.
OVERVIEW There is a disconnect between clinical disability in multiple sclerosis (MS) and structural damage seen on MRI, called the clinico-radiological paradox . Even though focal white matter lesions seen on MRI largely characterize multiple-sclerosis, lesion volumes are not strongly correlated with clinical motor disability . Possible reasons for this paradox include lesion location and gray matter atrophy , however the correlations with disability are modest (r=0.3). Another hypothesis is that functional adaptation plays a role, where brains adapt to the damage caused by MS in order to minimize disability. My preliminary results have shown that changes in functional MRI network connections correlate with performance on a complex motor dexterity task, even after accounting for structural damage. However, poor performance on a complex motor task may not be attributable to motor network damage and reorganization alone. For example, damage to the visual pathway involved in a complex task may confound results. Therefore, I propose to study how performance on simpler motor tasks relate to functional network connectivity changes, and develop a functional biomarker to predict motor performance. I intend to measure the central motor conduction time (CMCT), which is sensitive to corticospinal tract damage, by measuring motor evoked potentials (MEP) using transcranial magnetic stimulation (TMS). Additionally, finger tapping speed (FT) will be collected on MS patients, which has been shown to be more impaired in MS patients compared to measures of manual dexterity. Functional biomarkers will be developed using a traditional, hypothesis driven approach, followed by a functional dynamic network analysis focused on the posteriomedial cortex (PMC). Features of the functional network will be extracted based on CMCT and FT. This will result in a biomarker that reflects the ability of a subject to functionally adapt to MS-related damage to the motor system, which could lead to personalized medical treatment of their disease. Specific Aim 1: Develop an fMRI metric that relates to CMCT and FT using a hypothesis driven analysis Specific Aim 2: Improve on the prediction of simple and complex motor tasks by developing an fMRI metric based on dynamic functional connectivity of the PMC
Tissue classification plays a crucial role in the investigation of normal neural development, brain-behavior relationships, and the disease mechanisms of many psychiatric and neurological illnesses. Ensuring the accuracy of tissue classification is important for quality research and, in particular, the translation of imaging biomarkers to clinical practice. Assessment with the human eye is vital to correct various errors inherent to all currently available segmentation algorithms. Manual quality assurance becomes methodologically difficult at a large scale - a problem of increasing importance as the number of data sets is on the rise. To make this process more efficient, we have developed Mindcontrol, an open-source web application for the collaborative quality control of neuroimaging processing outputs. The Mindcontrol platform consists of a dashboard to organize data, descriptive visualizations to explore the data, an imaging viewer, and an in-browser annotation and editing toolbox for data curation and quality control. Mindcontrol is flexible and can be configured for the outputs of any software package in any data organization structure. Example configurations for three large, open-source datasets are presented: the 1000 Functional Connectomes Project (FCP), the Consortium for Reliability and Reproducibility (CoRR), and the Autism Brain Imaging Data Exchange (ABIDE) Collection. These demo applications link descriptive quality control metrics, regional brain volumes, and thickness scalars to a 3D imaging viewer and editing module, resulting in an easy-to-implement quality control protocol that can be scaled for any size and complexity of study.
Rationale Advances in MRI technology and image segmentation algorithms have enabled researchers to begin to understand the mechanisms of healthy brain development and neurological disorders, such as multiple sclerosis . Due to the wide variability of brain morphology, coupled with a pathological process in the case of neurological disorders, increasingly large sample sizes are necessary to confidently answer the progressively complex biomedical questions the research community is interested in. Automated algorithms have been developed to reduce information-rich 3D MRI images to 1-dimensional summary measures that describe tissue properties and are easy to interpret, such as total gray matter volume. Automated segmentation algorithms save considerable time, compared to manual human inspection, but lack the advanced visual system of humans. As a result, these algorithms often make systematic errors, especially when analyzing brains with pathology or those in the early stages of development. Data science is poised to facilitate complex neuroscience research by fusing a crowdsourcing strategy with machine learning methods; automatic quantification can perform the bulk of the work efficiently and errors can be resolved by non-expert “citizen-scientists” with the advantage of the human visual system. Crowdsourcing has been successful in many other disciplines , including mathematics , astronomy , and biochemistry . Recently, over 200,000 “citizen-neuroscientists” from over 147 countries helped identify neuronal connections in a mouse retina through the Eyewire game . This crowdsourced game led to a new understanding of how mammalian retinal cells detect motion. I propose to implement three key features of the EyeWire paradigm and adapt them for the segmentation of MRI data. First, by breaking up the problem into smaller “micro-tasks”, Eyewire scientists were able to access a much larger user-pool of non-experts. In a similar vein, 3D MRI data can be divided into 2D slices to be segmented by users. Second, machine learning algorithms were trained to help with the task, which improved the speed of manual neuronal tracing and validated non-expert input in the Eyewire game. Deep learning methods have already shown to be successful at segmenting MRI data, and similar models could be built to support manual segmentation. Lastly, EyeWire transformed a dull, monotonous task for experts into a fun, competitive game that trained non-experts and acquired valuable scientific data. The University of Washington is an ideal place to develop a similar game platform for MRI segmentation, using the resources at the Center for Game Science, led by Zoran Popovic. I propose to create an open-source platform for efficiently crowdsourcing brain tissue classification problems in order to answer neuroscience research questions with more precision. Specific Aims 1. SCALEABLE AND SECURE MICRO-TASKS: A scaleable database system and server backend that keeps data private by dividing it into small “micro-tasks” 2. LEARNING BY EXAMPLE: Machine learning algorithm that learns from human curation to improve efficiency of manual tasks 3. TRAINING THROUGH GAMIFICATION: User interface that trains users to solve a specific problem, and keeps them engaged through a reward system
We would like to thank the reviewers for their insightful comments. The major points that have been addressed are as follows: 1. It was not our intention to give the impression that one needs to scan human calibration phantoms at each site to properly power a multisite study with nonstandardized parameters, which is very costly. The statistical model which takes MRI bias into account has been emphasized instead. The bias that was measured and validated via calibration served to corroborate the scaling assumption of the statistical model. For other researchers planning multisite studies, the statistical model we proposed with the biases we reported should help plan and power a study. 2. Our measurements have been compared with other harmonization efforts, specifically and . 3. The scanning parameters of our consortium have been better specified. 4. The independence assumption between the unobserved effect and the scaling factor for a particular site have been addressed. Specifically, we emphasized that this assumption could hold for MS patients based on our experiment. The need to validate this assumption for other situations by scanning human phantoms was recommended, and the equation of variance without the independence assumption has been provided for the readers.
REVIEWER 1 - I am pleased with the changes in the paper, however I would like a more direct answer to the previously stated question on repositioning consistency, since gradient-nonlinearity induced volume changes are dependent on positioning inside scanner. The previously stated question was: - It is mentioned in the Methods section that repositioning consistency of each site’s scanning procedure was captures, however there is no metric in the results section focusing explicitly on that (i.e consistency of the subject position). It would be important to compare it to the consistency of subject positioning in Caramanos 2010 , since it will affect assumption on ability to correct gradient-distortions caused variations with a scaling factor derived from different acquisition. We would like to thank Reviewer 1 for the feedback and will address this concern by reporting the consistency of Z-positioning in relation to . In , researchers found that variations in z-position affected the percent brain volume change (PBVC) measurements of a longitudinal SIENA pipeline significantly. Specifically, they compared results of “accurate as possible” repositioning with a 50mm displacement repositioning, and found significant differences between measurements. When comparing the “accurate as possible” repositioning to the phantom-corrected result, the absolute error was much lower than that of the 50mm displacement. They calculated an average Z-displacement of 4.3 mm (-9.0 to 21.1). We ran rigid-body-registration to calculate the Z-translations for each subject at each site in our dataset. Overall, our average absolute Z-displacement across all sites was 3.5mm ± 3.7mm, which falls within the range of the “as accurate as possible” repositioning from . Our average Z-shift, for each site separately is provided in the supplemental materials. The following text was added to the methods: _“By repositioning in our study, a realistic measure of test-retest variability, which includes the repositioning consistency of each site’s scanning procedure, was captured. Because gradient distortion effects correspond to differences in z-positioning , the average translation in the Z-direction between the two runs of each subject at each site was estimated with a rigid body registration.”_ And in the results we report: _“In addition, the average translation in the Z-direction across all sites was 3.5mm ± 3.7mm, which falls within the accuracy range reported by . The repositioning Z-translation measurements for each site separately is reported in the supplemental materials.”_