MindGames: A Crowd-Sourcing Game Platform for Brain MRI Segmentation
Advances in MRI technology and image segmentation algorithms have enabled researchers to begin to understand the mechanisms of healthy brain development (Giedd 1999) and neurological disorders, such as multiple sclerosis (Bakshi 2008). Due to the wide variability of brain morphology, coupled with a pathological process in the case of neurological disorders, increasingly large sample sizes are necessary to confidently answer the progressively complex biomedical questions the research community is interested in. Automated algorithms have been developed to reduce information-rich 3D MRI images to 1-dimensional summary measures that describe tissue properties and are easy to interpret, such as total gray matter volume. Automated segmentation algorithms save considerable time, compared to manual human inspection, but lack the advanced visual system of humans. As a result, these algorithms often make systematic errors, especially when analyzing brains with pathology or those in the early stages of development. Data science is poised to facilitate complex neuroscience research by fusing a crowdsourcing strategy with machine learning methods; automatic quantification can perform the bulk of the work efficiently and errors can be resolved by non-expert ”citizen-scientists” with the advantage of the human visual system.
Crowdsourcing has been successful in many other disciplines (Wiggins 2011), including mathematics (Cranshaw 2011), astronomy (Lintott 2008), and biochemistry (Eiben 2012) . Recently, over 200,000 ”citizen-neuroscientists” from over 147 countries helped identify neuronal connections in a mouse retina through the Eyewire game (Kim 2014). This crowdsourced game led to a new understanding of how mammalian retinal cells detect motion. I propose to implement three key features of the EyeWire paradigm and adapt them for the segmentation of MRI data. First, by breaking up the problem into smaller ”micro-tasks”, Eyewire scientists were able to access a much larger user-pool of non-experts. In a similar vein, 3D MRI data can be divided into 2D slices to be segmented by users. Second, machine learning algorithms were trained to help with the task, which improved the speed of manual neuronal tracing and validated non-expert input in the Eyewire game. Deep learning methods have already shown to be successful at segmenting MRI data, and similar models could be built to support manual segmentation. Lastly, EyeWire transformed a dull, monotonous task for experts into a fun, competitive game that trained non-experts and acquired valuable scientific data. The University of Washington is an ideal place to develop a similar game platform for MRI segmentation, using the resources at the Center for Game Science, led by Zoran Popovic. I propose to create an open-source platform for efficiently crowdsourcing brain tissue classification problems in order to answer neuroscience research questions with more precision.
Scaleable and Secure Micro-Tasks: A scaleable database system and server backend that keeps data private by dividing it into small ”micro-tasks”
Learning by Example: Machine learning algorithm that learns from human curation to improve efficiency of manual tasks
Training through Gamification: User interface that trains users to solve a specific problem, and keeps them engaged through a reward system
This Aim will address two key challenges: 1) Partitioning 3D data into micro-tasks that keep data private, 2) serving micro-tasks at scale. While there are many large-scale open-source data collection efforts, many datasets are kept private within research institutions due to IRB restrictions, so presenting a full 3D MRI volume to the public would be a violation. Serving smaller ”chunks” of data serves two purposes: it allows us to keep data private (because you cannot see the whole brain), and it reduces the fatigue of non-experts (because you only need to edit a small section), which enables us to engage a larger user base. A scaleable server will be implemented on a commercial cloud computing platform, with an API that allows researchers to upload MRI micro-tasks to the server database, and serves micro-tasks to users. Researchers will be asked to provide the following to the API: 1) an initial segmentation file from an automated algorithm 2) any original images (T1, T2, PD) that users need to properly edit the segmentation 3) directions on how the images should be sliced into micro-tasks (including the slicing plane and the number of slices). Additionally, researchers must provide a validation dataset, which includes ”correctly” segmented images, which will be used to train non-experts in Aim 3. The resources and faculty at the eScience Institute will help me implement state-of-the-art database and cloud computing technologies in order to increase the delivery of micro-tasks to ”citizen-scientists.”
This Aim will address three challenges: 1) Resolve user input to create a final 3D volume, 2) Prioritize serving micro-tasks based on user consensus and 3) Predict the user-edited segmentation image. To reconstruct the micro-tasks back into a 3D image, a weighted consensus map will be computed, based on how accurately each user performed edits on training data. Micro-tasks with lower consensus scores will be served more frequently to users, until the consensus is high. Participants will also be scored based on how well their segmentations match with other users on the same image, and this will be used to reward users in Aim 3. Finally, improving automated segmentation algorithms based on human input will save time and reduce the number of editors assigned to each micro-task. For example, a dataset of 100 3D volumes could be broken into 20,000 patches, each of which would need to be manually edited. Alternatively, convolutional neural networks (CNNs) have been very successful at pattern recognition when trained on similarly large sample sizes, and could reduce the time spent editing each patch. I propose to build a CNN using existing architecture, such as Tensorflow or Theanet, to predict segmentation results, under the guidance of the machine learning experts at the eScience institute.
For individuals with minimal neuroamatomy knowledge, the difficulty of manual neuroimaging segmentation will depend on the contrast of the image as well as the location/complexity of the target structure. An example of an easy task would be the segmentation of brain tissue from non-brain tissue, whereas a more difficult task would be the segmentation of multiple sclerosis lesions. This Aim will address simple as well as challenging problems through varying levels of training and rewards. A web application will be developed that hooks into the server developed in Aim 1. The app will include an in-browser brain editor (similar to the Mindcontrol application (Keshavan 2016)), a reward structure and a scoreboard for the top users, and an optional link to the Amazon Turk engine, where users can be paid (in micro-payments) for completing micro-tasks. Initially, the user will only be presented with training tasks until they reach an adequate accuracy score. Next, the training tasks will be interspersed with new tasks, in order to detect performnce drift. The frequency of training tasks will increase based on the researcher’s specification of task difficulty. The reward structure will be based on 1) how well the user edits training data, 2) how well the user segmentations match those of other users, and 3) how many voxels are edited by the user. The time spent on the task along with the number of edited voxels will also be used to validate whether or not the user completed the task with some thought. For example, a user’s score would be penalized if a large number of voxels were edited too quickly for a difficult task. I plan to collaborate with the data scientists at the eScience Institute to build an intuitive and engaging crowd-sourcing user interface on the Amazon Turk Platform.