Figure 2: Strategies used to find optimal environmental
conditions. (a) Schematic showing five environmental
components that can be combined (1-present, 0-absent) to design an
optimal environment that maximizes a target microbial community
function. (b) Map between environmental composition and
function. The diagram shows all 32 (25) combinations
that can be constructed from five environmental components. Color
gradient from white to purple indicates increasing function level.(c-f) We show different strategies used to design the
environment that maximizes a function of interest. (c) Genetic
Algorithms (GA) begin by empirically quantifying the function
(‘fitness’) in a subset of environments selected at random. Environments
are then ranked by decreasing function. Environments mapping to highest
function values are used to design a new ‘population’ of environments.
Top environments are carried over to the new population and more
variation is added by recombining and mutating those top environments. A
new round of function quantification and selection of top environments
begins. The whole process can be repeated until the function no longer
increases or for a fixed number of predefined generations. (d)In fractional factorial design, a carefully selected subset of
environments is used to quantify function. The results are then used to
build regression models which allow predicting the function of
out-of-sample environments. (e) Full factorial design consists
in quantifying function in every possible environmental combination,
therefore revealing the full composition-function map. This method is
restricted to environments made up of few components as the number of
environments increases exponentially with the number of environmental
components. (f) Patterns of global epistasis observed in
genetics allow predicting the effect a mutation has on organismal
fitness, knowing the fitness of the background where the mutation
occurred (see
(Diaz-Colunga et al.,
2023; Johnson et al., 2023) and references therein) and. An ecological
parallel to global epistasis has also been demonstrated, where the
effect of a species addition on a community-level function is
predictable from the function of the original community without the
species (Diaz-Colunga
et al., 2023; Ruiz et al., 2023; Sanchez et al., 2023). Here we propose
that global patterns akin to global epistasis may also allow us to
predict the change in function induced by the addition of a new
component to an environment. If these patterns did exist, this would
open up the opportunity for a new method for predicting function.
Genetic algorithms can find optimal environments that
maximize desirable traits of clonal and multi-genotype communities.
One approach to identify optimal combinations of environmental factors
is the use of genetic algorithms (GA) to explore their combinatorial
functional landscape (Fig. 2c)
(Kucharzyk et al.,
2012; Pacheco et al., 2021; Vandecasteele et al., 2008). As an example,
Kucharzyk et al demonstrated the utility of a generational GA for
identifying optimal environmental conditions for the degradation of
perchlorate in both enrichment communities and in pure cultures ofDechlorosoma sp. Strains
(Kucharzyk et al.,
2012). In this study, the authors formed a multidimensional functional
landscape including environmental variables such as the pH, salinity
([KCl]), buffering capacity ([NaH2PO4] and [NaHCO3]),
concentration of electron donor (acetate) and electron acceptor (the
perchlorate itself), and the concentrations of other microbial nutrients
such as vitamins and trace minerals. Environments were represented by a
9-string vector, containing in each entry the concentration of each
variable, and the fitness of each individual environment (i.e. its
probability to be selected for reproduction in the next generation) was
given by the perchlorate degradation rate, which was determined
empirically. The strings that made up the next generation were derived
by recombination (via uniform crossover with probability 0.5) from those
that were selected. In the daughter strings, each environmental variable
could also randomly “mutate”, i.e. increase or decrease by a magnitude
equal to a pre-determined step size. The mutation rate was set so that,
on average, one variable would change per individual per generation. The
GA was applied for 11 generations, and its outcome was remarkably
successful. The authors found environmental conditions that increased
perchlorate degradation by over 16-fold for individual strains and over
5-fold for the consortia.
This work followed previous attempts to use genetic algorithms to
optimize community functions through environmental manipulations. In the
earliest such study we are aware of, Vandecasteele et al used a
microbial community derived from human saliva as an inoculum, and built
a genetic algorithm to optimize a collective function consisting of azo
dye decoloration
(Vandecasteele et al.,
2008). Rather than manipulating the concentration of a set of
resources, the environmental space that was explored in this work was
the presence or absence of 10 different chemical supplements, including
nutrients (i.e. glucose or glycerol) as well as various buffers, acids,
and bases. Each combination of chemicals was added to an 12.5x dilution
of a saliva sample and the fitness of the environment was given by the
amount of dye decoloration over 24hr of culture. Dye decoloration
increased after 15 optimization steps, and the authors convincingly
demonstrated that besides the average increase in the metapopulation,
the function of the best environments in each generation also responded
to selection (an important benchmark in artificial community-level
selection experiments
(Chang et al.,
2021)). A strength of this study is that the authors examined the
ability of their approach to find a better ecosystem than the one they
started with.
An exciting prospect for manipulating microbial community functions
through their metabolic environment consists of combining evolutionary
algorithms with genome-scale metabolic models. In pioneering work
(Harcombe et al.,
2014; Pacheco and Segrè, 2021), Pacheco and Segre used dynamical flux
balance analysis
(Dukovski et al.,
2021) to model the growth and other metabolic properties of in
silico microbial communities. Using this computational platform, they
were able to combinatorially generate thousands of different
environments, each containing combinations of up to 20 different
resources, which were then inoculated with the same microbial consortium
consisting of 13 microorganisms. The same approach was then expanded to
a larger combinatorial space of over 150 limiting carbon sources.
Similar to previous work, the authors implemented a genetic algorithm
where a subset of environments were selected with a fitness score
determined by their proximity to a community-level objective function.
The selected environments were then used to generate the next generation
through cross-over recombination and mutation (addition of new
metabolites or removal of existing ones). Among the objective functions,
the authors included compositional metrics such as community evenness,
the abundance of target bacteria, as well as metabolic traits such as
the secretion of particular metabolites or the degree of metabolic
exchange among coexisting species
(Pacheco and Segrè,
2021).
Altogether, these experiments support the feasibility and promise of
evolutionary engineering approaches to explore the combinatorial space
of environmental factors, in search for optimal culture conditions for
microbial consortia. Importantly, these approaches do not require us to
have a mechanistic understanding of environmental interactions. As we
will see, however, bottom-up models that are built up from these
interactions can be promising too.
Learning the landscape of environmental effects to infer
optimal habitats (e.g. diets) for microbial communities. An
alternative approach to finding optimal environments is to statistically
learn the relationship between environment and community function for a
given inoculum community. This is essentially the reverse problem to
that of inferring the relationship between composition and function in a
given environment (Eng
and Borenstein, 2019; Sanchez et al., 2023). Because both problems are
so similar, they are plagued by the same problems (chiefly, the presence
of interactions between components, which we overview in previous
sections), and the approaches that have been followed to solve them are
similar also (Eng and
Borenstein, 2019). Due to their combinatorial nature, a full factorial
assessment of environmental factors (Fig. 2d-e ) has been
challenging to execute experimentally. Most studies have instead focused
on fractional factorial design (Fig. 2d ), where a subset of all
possible environmental combinations is used to train a statistical model
(Chen et al., 2009;
Jiménez et al., 2014; Kikot et al., 2010; Skonieczny and Yargeau, 2009;
Zhou et al., 2023). Typically, these models consisted of linear
regression to either the presence/absence or the magnitude of different
environmental factors, with the occasional inclusion of interaction
terms. Once a statistical model of the community function landscape is
available, it can be used to predict out of sample and thus locate
environments that optimize the target function.
This approach was employed successfully by various authors. In the
aforementioned work by Jimenez et al, for instance the authors were able
to successfully predict the effect of the concentrations of three input
substrates in the methanogenic activity of an anaerobic digester, thus
identifying an optimal operation point
(Jiménez et al.,
2014). The other studies employing this strategy also focused on small
combinatorial spaces with up to six different variables. Ideally, one
would like to extend the approach to larger combinatorial spaces, but as
the dimensionality increases so does the number of potential
interactions, and therefore the number of measurements one must make to
estimate their effect. A potential approach to handle this combinatorial
explosion of interactions is to, once again, draw inspiration from
genetics (Sanchez et
al., 2023). In genetics, it has been found that simple quantitative
patterns often emerge from myriad microscopic interactions between
genes, allowing us to predict the fitness effect of a mutation without
having to first parameterize its pairwise and higher-order epistatic
interactions (see
(Diaz-Colunga et al.,
2023; Johnson et al., 2023) and references therein). We have recently
shown that this idea extends as well to ecological communities, so that
the functional effect of a species often follows simple quantitative
patterns that do not require one to parameterize all possible
interactions with every other member of the consortia
(Diaz-Colunga et al.,
2023; Ruiz et al., 2023; Sanchez et al., 2023). Future work will have
to determine whether these global epistasis - like patterns also
describe the effects of environmental factors in different contexts
(Fig. 2f ).