Collaborative Data Science Proposal - The Faces of Microbial Communities


A single microbial community can be composed of many thousands of species, and the tools most commonly used (pie charts and stacked bar graphs) to visualize the relative abundances of species in communities are inadequate. The human brain is not adept at estimating the areas of wedges in a pie or rectangles in a bar, and if it were, the color palette and graph size required to faithfully represent the relative abundances of thousands of species of even a single community would be prohibitively large.

There is a great need to develop more intuitive visualization tools, especially for comparing microbial community composition across a large number of samples. Fortunately, human evolution, via natural selection has engineered a solution to this problem. The human brain has a region, the fusiform face area, that is entirely devoted to facial recognition. This region of the brain allows us to process a very complex image in an instant, requiring minimal decomposition into component parts. Instead, faces are perceived holistically, as a gestalt. Faces are infinitely variable, and we can quickly pick up on even very subtle differences and similarities between them.

Preliminary Work and Potential Directions

MakeHuman is a computer graphics software tool that allows the user to create 3D models of human bodies. It is used for the generation of models for video games. MakeHuman is written in open source Python and released to the public domain under CC0. MakeHuman has a GUI with sliders that allow the user to vary a great number of facial parameters. As an attempt at proof of concept, I have hardcoded some MakeHuman parameter files to produce facial models to represent some microbial community data. I chose 8 facial parameters that are easy to pick out (see Figure 1). I used some of my own data, representing three communities each of marine sediment and water column communities (see Figure 2.) I used scaled relative abundances of the 8 most variable organisms between the two sample types to define the 8 facial parameters (see Figure 3.)

I think the results are pretty interesting! You can easily pick out a facial feature and see how it maps back to the bar chart. What I think is more interesting is how easy it is to see and remember abundance patterns between the sample types. For example, it is obvious that arched eyebrows and a down-turned mouth are characteristic of the water samples, and if a new sample face comes along that has those qualities, you could easily classify it, at a glance, as a water sample, without even having to refer back to the original lineup of faces. Without a photographic memory, that would not be possible with a new stacked bar. Imagine visualizing time course data, where facial features gradually change over time to look nothing like the original, or converge on a similar structure. Or, scanning a page with 100 faces on it, pulling out the ones that have a big left eye and a smile, and then asking what metadata features they share. Finally, I could imagine working with an additional tool, Blender, that will allow rich customization of auxiliary features like clothing and hairstyles, that could be used to visualize sample metadata concurrently with facial structure. Everytime I pitch this idea to someone, a new possibility emerges.

Why do I need the DSI's help?

I do not have the programming chops to even feed the abundance data into a parameter file. I also have no idea how to produce many faces and have them all displayed simultaneously. I certainly don't have the skills to build a GUI that would allow me to group and sort faces or to adjust which taxa are represented by various facial features. I'd like to be able to write a grant proposal to fund developing a tool that could be used by others, but I need a collaborator who could do the coding.

Goals and Timeline

This project would involve building a tool that will allow the user to 1) upload microbial community data: a matrix of samples by species, populated by species abundance, and a text file to map metadata to samples, 2) transform the species abundance data into facial parameters, and 3) tweak some parameters on the fly.

I imagine that the minimal goal of allowing me to load abundance data and produce faces could be achieved during the course of a single quarter. Perhaps in that time, enough progress could be made to allow for a competitive grant proposal to be written. I have no idea how long it would take to develop a fully-functional tool with a GUI.

Required Data and Tools

User input data are in the form of 1) a sample X species abundance matrix and 2) a sample X metadata matrix.

MakeHuman ( is written in open source Python and released to the public domain under CC0. Blender ( is "cross platform, with an OpenGL GUI that is uniform on all major platforms (and customizable with Python scripts)."