AbstractA recent editorial, “Giving Software its Due,”1 described challenges related to the development of research software and highlighted, in particular, the challenge of software publication and citation. Here, we call attention to a system that we have developed enabling community-driven software review, publication, and citation. The Journal of Open Source Software (JOSS) is an open-source project and an open-access journal providing a lightweight publishing process for research software. Focused on and based in open platforms and on a community of contributors, JOSS satisfies a pressing need, evident in having already published more than 500 articles in approximately three years of existence.
Big Data promises to advance science through data-driven discovery. However, many standard lab protocols rely on manual examination, which is not feasible for large-scale datasets. Meanwhile, automated approaches lack the accuracy of expert examination. We propose to 1) start with expertly labeled data, 2) amplify labels through web applications that engage citizen scientists, and 3) train machine learning on amplified labels, to emulate the experts. Demonstrating this, we developed a system to quality control brain magnetic resonance images. Expert-labeled data were amplified by citizen scientists through a simple web interface. A deep learning algorithm was then trained to predict data quality, based on citizen scientist labels. Deep learning performed as well as specialized algorithms for quality control (AUC=0.99). Combining citizen science and deep learning can generalize and scale expert decision making; this is particularly important in disciplines where specialized, automated tools do not yet exist.
Big Data in neuroimaging holds promise for answering important questions about the brain. However, many standard lab protocols that rely on experts examining each one of the samples no longer work with large-scale datasets, because they are difficult to scale, and because automated approaches lack the accuracy of highly trained scientists. Our proposed solution is to 1) start with a small, expertly labelled dataset, 2) amplify labels through citizen science via web-based tools, and 3) train machine learning on amplified labels to emulate expert decision making. As a proof of concept, we developed a system to quality control over 700 T1-weighted images from the Healthy Brain Network. An initial expertly labelled dataset (of 200 images) was amplified by citizen scientists to the entire dataset (724) with over 60,000 ratings through a simple web interface. A deep learning algorithm was trained to predict data quality with the aggregate citizen scientist labels in a subset of the data. In an ROC analysis on left out test data, the deep learning network performed as well as a state-of-the-art, specialized algorithm (MRIQC) for T1-weighted images, each with an area under the curve of 0.99. Therefore, we assert that combining citizen science and deep learning can generalize and scale neuroimaging expert decision making; this is particularly important in the cases where specialized, automated tools do not already exist. Finally, as a specific practical application of the method, we explore how brain image quality relates to the replicability of a well established relationship between brain volumes and age over development.