[1907.08226] Who is Afraid of Big Bad Minima? Analysis of Gradient-Flow in a Spiked Matrix-Tensor Model
Gradient-based algorithms are effective for many machine learning tasks, but despite ample recent effort and some progress, it often remains unclear why they work in practice in optimising high-dimensional non-convex functions and why they find good minima instead of being trapped in spurious ones.
Here we present a quantitative theory explaining this behaviour in a spiked matrix-tensor model.
Our framework is based on the Kac-Rice analysis of stationary points and a closed-form analysis of gradient-flow originating from statistical physics. We show that there is a well defined region of parameters where the gradient-flow algorithm finds a good global minimum despite the presence of exponentially many spurious local minima.
We show that this is achieved by surfing on saddles that have strong negative direction towards the global minima, a phenomenon that is connected to a BBP-type threshold in the Hessian describing the critical points of the landscapes.
machine-learning  optimization  fitness-landscapes  metaheuristics  representation  to-understand  topology  dynamical-systems  consider:performance-measures  consider:better-faster
august 2019 by Vaguery
[1902.04724] A characterisation of S-box fitness landscapes in cryptography
Substitution Boxes (S-boxes) are nonlinear objects often used in the design of cryptographic algorithms. The design of high quality S-boxes is an interesting problem that attracts a lot of attention. Many attempts have been made in recent years to use heuristics to design S-boxes, but the results were often far from the previously known best obtained ones. Unfortunately, most of the effort went into exploring different algorithms and fitness functions while little attention has been given to the understanding why this problem is so difficult for heuristics. In this paper, we conduct a fitness landscape analysis to better understand why this problem can be difficult. Among other, we find that almost each initial starting point has its own local optimum, even though the networks are highly interconnected.
evolutionary-algorithms  fitness-landscapes  rather-interesting  hard-problems  constructibility  representation  to-write-about  consider:looking-to-see
april 2019 by Vaguery
Learning from protein fitness landscapes: a review of mutability, epistasis, and evolution - ScienceDirect
Proteins carry out many diverse functions in nature and are increasingly used in non-native contexts, such as in medical or industrial applications. A wide array of synthetic biology techniques can be used both to study proteins in their native context and to identify new variants with useful properties for non-native functions. High-resolution protein fitness landscapes, generated via deep scanning mutagenesis, are an emerging technology that can be used to model evolution and identify useful variants. Interestingly, many differences exist between mutability quantified by evolutionary studies and deep scanning mutagenesis. Here, we review several contributing factors to this difference, highlighting epistasis, binding partners, and selection conditions as key contributors. Through this lens, we describe what can be learned, both about evolution and protein function more broadly, from fitness landscape studies.
cannot-access  dammit  fitness-landscapes  scanning-mutagenesis  bioinformatics  theoretical-biology  looking-to-see
march 2019 by Vaguery
[1804.02045] Approximating Functions on Boxes
The vector space of all polynomial functions of degree k on a box of dimension n is of dimension (nk). A consequence of this fact is that a function can be approximated on vertices of the box using other vertices to higher degrees than expected. This approximation is useful for various biological applications such as predicting the effect of a treatment with drug combinations and computing values of fitness landscape.
approximation  fitness-landscapes  dimension-reduction  rather-interesting  statistics  modeling  bioinformatics
february 2019 by Vaguery
[1812.06162] An Empirical Model of Large-Batch Training
In an increasing number of domains it has been demonstrated that deep learning models can be trained using relatively large batch sizes without sacrificing data efficiency. However the limits of this massive data parallelism seem to differ from domain to domain, ranging from batches of tens of thousands in ImageNet to batches of millions in RL agents that play the game Dota 2. To our knowledge there is limited conceptual understanding of why these limits to batch size differ or how we might choose the correct batch size in a new domain. In this paper, we demonstrate that a simple and easy-to-measure statistic called the gradient noise scale predicts the largest useful batch size across many domains and applications, including a number of supervised learning datasets (MNIST, SVHN, CIFAR-10, ImageNet, Billion Word), reinforcement learning domains (Atari and Dota), and even generative model training (autoencoders on SVHN). We find that the noise scale increases as the loss decreases over a training run and depends on the model size primarily through improved model performance. Our empirically-motivated theory also describes the tradeoff between compute-efficiency and time-efficiency, and provides a rough model of the benefits of adaptive batch-size training.
machine-learning  algorithms  fitness-landscapes  feature-construction  hardness  to-write-about
january 2019 by Vaguery
Differential Strengths Of Molecular Determinants Guide Environment Specific Mutational Fates | bioRxiv
Under the influence of selection pressures imposed by natural environments, organisms maintain competitive fitness through underlying molecular evolution of individual genes across the genome. For molecular evolution, how multiple interdependent molecular constraints play a role in determination of fitness under different environmental conditions is largely unknown. Here, using Deep Mutational Scanning (DMS), we quantitated empirical fitness of ~2000 single site mutants of Gentamicin-resistant gene (GmR). This enabled a systematic investigation of effects of different physical and chemical environments on the fitness landscape of the gene. Molecular constraints of the fitness landscapes seem to bear differential strengths in an environment dependent manner. Among them, conformity of the identified directionalities of the environmental selection pressures with known effects of the environments on protein folding proves that along with substrate binding, protein stability is the common strong constraint of the fitness landscape. Our study thus provides mechanistic insights into the molecular constraints that allow accessibility of mutational fates in environment dependent manner.
contingency  fitness-landscapes  biophysics  evolutionary-algorithms  structure-function-relations  climb-the-citation-tree  to-write-about  to-understand
september 2017 by Vaguery
[1504.04909] Illuminating search spaces by mapping elites
Many fields use search algorithms, which automatically explore a search space to find high-performing solutions: chemists search through the space of molecules to discover new drugs; engineers search for stronger, cheaper, safer designs, scientists search for models that best explain data, etc. The goal of search algorithms has traditionally been to return the single highest-performing solution in a search space. Here we describe a new, fundamentally different type of algorithm that is more useful because it provides a holistic view of how high-performing solutions are distributed throughout a search space. It creates a map of high-performing solutions at each point in a space defined by dimensions of variation that a user gets to choose. This Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) algorithm illuminates search spaces, allowing researchers to understand how interesting attributes of solutions combine to affect performance, either positively or, equally of interest, negatively. For example, a drug company may wish to understand how performance changes as the size of molecules and their cost-to-produce vary. MAP-Elites produces a large diversity of high-performing, yet qualitatively different solutions, which can be more helpful than a single, high-performing solution. Interestingly, because MAP-Elites explores more of the search space, it also tends to find a better overall solution than state-of-the-art search algorithms. We demonstrate the benefits of this new algorithm in three different problem domains ranging from producing modular neural networks to designing simulated and real soft robots. Because MAP- Elites (1) illuminates the relationship between performance and dimensions of interest in solutions, (2) returns a set of high-performing, yet diverse solutions, and (3) improves finding a single, best solution, it will advance science and engineering.
september 2017 by Vaguery
[1705.01568] Adaptive Fitness Landscape for Replicator Systems: To Maximize or not to Maximize
Sewall Wright's adaptive landscape metaphor penetrates a significant part of evolutionary thinking. Supplemented with Fisher's fundamental theorem of natural selection and Kimura's maximum principle, it provides a unifying and intuitive representation of the evolutionary process under the influence of natural selection as the hill climbing on the surface of mean population fitness. On the other hand, it is also well known that for many more or less realistic mathematical models this picture is a sever misrepresentation of what actually occurs. Therefore, we are faced with two questions. First, it is important to identify the cases in which adaptive landscape metaphor actually holds exactly in the models, that is, to identify the conditions under which system's dynamics coincides with the process of searching for a (local) fitness maximum. Second, even if the mean fitness is not maximized in the process of evolution, it is still important to understand the structure of the mean fitness manifold and see the implications of this structure on the system's dynamics. Using as a basic model the classical replicator equation, in this note we attempt to answer these two questions and illustrate our results with simple well studied systems.
fitness-landscapes  replicators  rather-interesting  nonlinear-dynamics  theoretical-biology  philosophy-of-science  to-write-about
august 2017 by Vaguery
Breakdown Of Modularity In Complex Networks | bioRxiv
The presence of modular organisation is a common property of a wide range of complex systems, from cellular or brain networks to technological graphs. Modularity allows some degree of segregation between different parts of the network and has been suggested to be a prerequisite for the evolvability of biological systems. In technology, modularity defines a clear division of tasks and it is an explicit design target. However, many natural and artificial systems experience a breakdown in their modular pattern of connections, which has been associated to failures in hub nodes or the activation of global stress responses. In spite of its importance, no general theory of the breakdown of modularity and its implications has been advanced yet. Here we propose a new, simple model of network landscape where it is possible to exhaustively characterise the breakdown of modularity in a well-defined way. We found that evolution cannot reach maximally modular networks under the presence of functional and cost constraints, implying the breakdown of modularity is an adaptive feature.
fitness-landscapes  network-theory  modularity  rather-interesting  boolean-networks  to-write-about  complexology  simple-models  nudge-targets  evolvability
august 2017 by Vaguery
[1703.07915] Perspective: Energy Landscapes for Machine Learning
Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be applied to these machine learning landscapes to gain new insight into the solution space involved in training and the nature of the corresponding predictions. In particular, we can define quantities analogous to molecular structure, thermodynamics, and kinetics, and relate these emergent properties to the structure of the underlying landscape. This Perspective aims to describe these analogies with examples from recent applications, and suggest avenues for new interdisciplinary research.
machine-learning  introspection  rather-interesting  fitness-landscapes  energy-landscapes  visualization  to-write-about  consider:performance-measures  algorithms  feature-construction
may 2017 by Vaguery
[1512.03466] Computing factorized approximations of Pareto-fronts using mNM-landscapes and Boltzmann distributions
NM-landscapes have been recently introduced as a class of tunable rugged models. They are a subset of the general interaction models where all the interactions are of order less or equal M. The Boltzmann distribution has been extensively applied in single-objective evolutionary algorithms to implement selection and study the theoretical properties of model-building algorithms. In this paper we propose the combination of the multi-objective NM-landscape model and the Boltzmann distribution to obtain Pareto-front approximations. We investigate the joint effect of the parameters of the NM-landscapes and the probabilistic factorizations in the shape of the Pareto front approximations.
Kauffmania  fitness-landscapes  rather-interesting  to-write-about  nudge-targets  consider:looking-to-see
april 2017 by Vaguery
[1309.1837] Evolution and non-equilibrium physics. A study of the Tangled Nature Model
We argue that the stochastic dynamics of interacting agents which replicate, mutate and die constitutes a non-equilibrium physical process akin to aging in complex materials. Specifically, our study uses extensive computer simulations of the Tangled Nature Model (TNM) of biological evolution to show that punctuated equilibria successively generated by the model's dynamics have increasing entropy and are separated by increasing entropic barriers. We further show that these states are organized in a hierarchy and that limiting the values of possible interactions to a finite interval leads to stationary fluctuations within a component of the latter. A coarse-grained description based on the temporal statistics of quakes, the events leading from one component of the hierarchy to the next, accounts for the logarithmic growth of the population and the decaying rate of change of macroscopic variables. Finally, we question the role of fitness in large scale evolution models and speculate on the possible evolutionary role of rejuvenation and memory effects.
theoretical-biology  artificial-life  complexology  ecology  Bak-Sneppen-stuff  fitness-landscapes  Oh  Physics!
march 2017 by Vaguery
[1608.04568] Decision Making on Fitness Landscapes
We discuss fitness landscapes and how they can be modified to account for co-evolution. We are interested in using the landscape as a way to model rational decision making in a toy economic system. We develop a model very similar to the Tangled Nature Model of Christensen et. al. that we call the Tangled Decision Model. This is a natural setting for our discussion of co-evolutionary fitness landscapes. We use a Monte Carlo step to simulate decision making and investigate two different decision making procedures.
Kauffmania  fitness-landscapes  coevolution  theoretical-biology  evolutionary-economics  to-write-about  consider:looking-to-see
march 2017 by Vaguery
[1701.09175] Skip Connections as Effective Symmetry-Breaking
Skip connections made the training of very deep neural networks possible and have become an indispendable component in a variety of neural architectures. A completely satisfactory explanation for their success remains elusive. Here, we present a novel explanation for the benefits of skip connections in training very deep neural networks. We argue that skip connections help break symmetries inherent in the loss landscapes of deep networks, leading to drastically simplified landscapes. In particular, skip connections between adjacent layers in a multilayer network break the permutation symmetry of nodes in a given layer, and the recently proposed DenseNet architecture, where each layer projects skip connections to every layer above it, also breaks the rescaling symmetry of connectivity matrices between different layers. This hypothesis is supported by evidence from a toy model with binary weights and from experiments with fully-connected networks suggesting (i) that skip connections do not necessarily improve training unless they help break symmetries and (ii) that alternative ways of breaking the symmetries also lead to significant performance improvements in training deep networks, hence there is nothing special about skip connections in this respect. We find, however, that skip connections confer additional benefits over and above symmetry-breaking, such as the ability to deal effectively with the vanishing gradients problem.
neural-networks  learning  fitness-landscapes  engineering-design  rather-interesting  symmetry  nudge-targets  consider:looking-to-see
february 2017 by Vaguery
[1607.00318] The Evolution of Sex through the Baldwin Effect
This paper suggests that the fundamental haploid-diploid cycle of eukaryotic sex exploits a rudimentary form of the Baldwin effect. With this explanation for the basic cycle, the other associated phenomena can be explained as evolution tuning the amount and frequency of learning experienced by an organism. Using the well-known NK model of fitness landscapes it is shown that varying landscape ruggedness varies the benefit of the haploid-diploid cycle, whether based upon endomitosis or syngamy. The utility of mechanisms such as pre-meiotic doubling and recombination during the cycle are also shown to vary with landscape ruggedness. This view is suggested as underpinning, rather than contradicting, many existing explanations for sex.
february 2017 by Vaguery
[1510.08697] Systems poised to criticality through Pareto selective forces
Pareto selective forces optimize several targets at the same time, instead of single fitness functions. Systems subjected to these forces evolve towards their Pareto front, a geometrical object akin to the thermodynamical Gibbs surface and whose shape and differential geometry underlie the existence of phase transitions. In this paper we outline the connection between the Pareto front and criticality and critical phase transitions. It is shown how, under definite circumstances, Pareto selective forces drive a system towards a critical ensemble that separates the two phases of a first order phase transition. Different mechanisms implementing such Pareto selective dynamics are revised.
hey-I-know-this-guy  multiobjective-optimization  fitness-landscapes  my-thesis-stuff  theoretical-biology  complexology  small-world
january 2017 by Vaguery
Genotypic complexity of Fisher's geometric model | bioRxiv
Fisher's geometric model was originally introduced to argue that complex adaptations must occur in small steps because of pleiotropic constraints. When supplemented with the assumption of additivity of mutational effects on phenotypic traits, it provides a simple mechanism for the emergence of genotypic epistasis from the nonlinear mapping of phenotypes to fitness. Of particular interest is the occurrence of sign epistasis, which is a necessary condition for multipeaked genotypic fitness landscapes. Here we compute the probability that a pair of randomly chosen mutations interacts sign-epistatically, which is found to decrease algebraically with increasing phenotypic dimension n, and varies non-monotonically with the distance from the phenotypic optimum. We then derive asymptotic expressions for the mean number of fitness maxima in genotypic landscapes composed of all combinations of L random mutations. This number increases exponentially with L, and the corresponding growth rate is used as a measure of the complexity of the genotypic landscape. The dependence of the complexity on the parameters of the model is found to be surprisingly rich, and three distinct phases characterized by different landscape structures are identified. The complexity generally decreases with increasing phenotypic dimension, but a non-monotonic dependence on n is found in certain regimes. Our results inform the interpretation of experiments where the parameters of Fisher's model have been inferred from data, and help to elucidate which features of empirical fitness landscapes can (or cannot) be described by this model.
population-biology  theoretical-biology  theory-and-practice-sitting-in-a-tree  fitness-landscapes  models-and-modes  to-write-about  nudge-targets  consider:rediscovery  consider:robustness  consider:multiobjective-versions
january 2017 by Vaguery
[1612.00193] Learning Potential Energy Landscapes using Graph Kernels
Recent machine learning methods make it possible to model potential energy of atomic configurations with chemical-level accuracy (as calculated from ab-initio calculations) and at speeds suitable for molecular dynamics simulation. Best performance is achieved when the known physical constraints are encoded in the machine learning models. For example, the atomic energy is invariant under global translations and rotations; it is also invariant to permutations of same-species atoms. Although simple to state, these symmetries are complicated to encode into machine learning algorithms. In this paper, we present a machine learning approach based on graph theory that naturally incorporates translation, rotation, and permutation symmetries. Specifically, we use a random walk graph kernel to measure the similarity of two adjacency matrices, each of which represents a local atomic environment. We show on a standard benchmark that our Graph Approximated Energy (GRAPE) method is competitive with state of the art kernel methods. Furthermore, the GRAPE framework is flexible and admits many possible extensions.
energy-landscapes  fitness-landscapes  machine-learning  approximation  kernel-methods  graph-kernels  to-understand  representation  nudge-targets  consider:representation
december 2016 by Vaguery
Exploring the mutational robustness of nucleic acids by searching genotype neighbourhoods in sequence space | bioRxiv
To assess the mutational robustness of nucleic acids, many genome- and protein-level studies have been performed; in these investigations, nucleic acids are treated as genetic information carriers and transferrers. However, the molecular mechanism through which mutations alter the structural, dynamic and functional properties of nucleic acids is poorly understood. Here, we performed SELEX in silico study to investigate the fitness distribution of the nucleic acid genotype neighborhood in a sequence space for L-Arm binding aptamer. Although most mutants of the L-Arm-binding aptamer failed to retain their ligand-binding ability, two novel functional genotype neighborhoods were isolated by SELEX in silico and experimentally verified to have similar binding affinity (Kd = 69.3 μM and 110.7 μM) as the wild-type aptamer (Kd = 114.4 μM). Based on data from the current study and previous research, mutational robustness is strongly influenced by the local base environment and ligand-binding mode, whereas bases distant from the binding pocket provide potential evolutionary pathways to approach global fitness maximum. Our work provides an example of successful application of SELEX in silico to optimize an aptamer and demonstrates the strong sensitivity of mutational robustness to the site of genetic variation.
biophysics  structural-biology  fitness-landscapes  combinatorial-libraries  aptamers  rather-interesting  to-write-about  experiment
december 2016 by Vaguery
[1607.00318] The Evolution of Sex through the Baldwin Effect
This paper suggests that the fundamental haploid-diploid cycle of eukaryotic sex exploits a rudimentary form of the Baldwin effect. With this explanation for the basic cycle, the other associated phenomena can be explained as evolution tuning the amount and frequency of learning experienced by an organism. Using the well-known NK model of fitness landscapes it is shown that varying landscape ruggedness varies the benefit of the haploid-diploid cycle, whether based upon endomitosis or syngamy. The utility of mechanisms such as pre-meiotic doubling and recombination during the cycle are also shown to vary with landscape ruggedness. This view is suggested as underpinning, rather than contradicting, many existing explanations for sex.
Kauffmania  fitness-landscapes  theoretical-biology  complexology  simulation  to-write-about  nudge-targets  consider:fixing
september 2016 by Vaguery
From epigenetic landscape to phenotypic fitness landscape: evolutionary effect of pathogens on host traits | bioRxiv
The epigenetic landscape illustrates how cells differentiate into different types through the control of gene regulatory networks. Numerous studies have investigated epigenetic gene regulation but there are limited studies on how the epigenetic landscape and the presence of pathogens influence the evolution of host traits. Here we formulate a multistable decision-switch model involving many possible phenotypes with the antagonistic influence of parasitism. As expected, pathogens can drive dominant (common) phenotypes to become inferior, such as through negative frequency-dependent selection. Furthermore, novel predictions of our model show that parasitism can steer the dynamics of phenotype specification from multistable equilibrium convergence to oscillations. This oscillatory behavior could explain pathogen-mediated epimutations and excessive phenotypic plasticity. The Red Queen dynamics also occur in certain parameter space of the model, which demonstrates winnerless cyclic phenotype-switching in hosts and in pathogens. The results of our simulations elucidate how epigenetic landscape is associated with the phenotypic fitness landscape and how parasitism facilitates non-genetic phenotypic diversity.
epigenetics  fitness-landscapes  eve-devo  theoretical-biology  the-minefield  rather-interesting  consider:looking-to-see
may 2016 by Vaguery
[1601.00313] When more of the same is better
Problem solving (e.g., drug design, traffic engineering, software development) by task forces represents a substantial portion of the economy of developed countries. Here we use an agent-based model of cooperative problem solving systems to study the influence of diversity on the performance of a task force. We assume that agents cooperate by exchanging information on their partial success and use that information to imitate the more successful agent in the system -- the model. The agents differ only in their propensities to copy the model. We find that, for easy tasks, the optimal organization is a homogeneous system composed of agents with the highest possible copy propensities. For difficult tasks, we find that diversity can prevent the system from being trapped in sub-optimal solutions. However, when the system size is adjusted to maximize performance the homogeneous systems outperform the heterogeneous systems, i.e., for optimal performance, sameness should be preferred to diversity.
problem-solving  Kauffmania  fitness-landscapes  planning  fish-or-get-off-the-can-problems  nudge-targets  consider:looking-to-see
march 2016 by Vaguery
[1508.01453] Asymptotic Green's function for the stochastic reproduction of competing variants via Fisher's angular transformation
The Wright-Fisher Fokker-Planck equation describes the stochastic dynamics of self-reproducing, competing variants at fixed population size. We use Fisher's angular transformation, which defines a natural length for this stochastic process, to remove the co-ordinate dependence of it's diffusive dynamics, resulting in simple Brownian motion in an unstable potential, driving variants to extinction or fixation. This insight allows calculation of very accurate asymptotic formula for the Green's function under neutrality and selection, using a novel heuristic Gaussian approximation.
population-biology  theoretical-biology  fitness-landscapes  nudge-targets  consider:looking-to-see
february 2016 by Vaguery
[1307.2506] Landscape construction in non-gradient dynamics: A case from evolution
Adaptive landscape has been a fundamental concept in many branches of modern biology since Wright's first proposition in 1932. Meanwhile, the general existence of landscape remains controversial. The causes include the mixed uses of different landscape definitions with their own different aims and advantages. Sometimes the difficulty and the impossibility of the landscape construction for complex models are also equated. To clarify these confusions, based on a recent formulation of Wright's theory, the current authors construct generalized adaptive landscape in a two-loci population model with non-gradient dynamics, where the conventional gradient landscape does not exist. On the generalized landscape, a population moves along an evolutionary trajectory which always increases or conserves adaptiveness but does not necessarily follow the steepest gradient direction. Comparisons of different aspects of various landscapes lead to a conclusion that the generalized landscape is a possible direction to continue the exploration of Wright's theory for complex dynamics.
fitness-landscapes  theoretical-biology  formalization  to-understand
february 2016 by Vaguery
[1509.06453] Complex ordering in spin networks: Critical role of adaptation rate for dynamically evolving interactions
Many complex systems can be represented as networks of dynamical elements whose states evolve in response to interactions with neighboring elements, noise and external stimuli. The collective behavior of such systems can exhibit remarkable ordering phenomena such as chimera order corresponding to coexistence of ordered and disordered regions. Often, the interactions in such systems can also evolve over time responding to changes in the dynamical states of the elements. Link adaptation inspired by Hebbian learning, the dominant paradigm for neuronal plasticity, has been earlier shown to result in structural balance by removing any initial frustration in a system that arises through conflicting interactions. Here we show that the rate of the adaptive dynamics for the interactions is crucial in deciding the emergence of different ordering behavior (including chimera) and frustration in networks of Ising spins. In particular, we observe that small changes in the link adaptation rate about a critical value result in the system exhibiting radically different energy landscapes, viz., smooth landscape corresponding to balanced systems seen for fast learning, and rugged landscapes corresponding to frustrated systems seen for slow learning.
nonlinear-dynamics  fitness-landscapes  energy-landscapes  physics  optimization  relaxation  complexology  nudge-targets  Hebbian-learning  to-replicate
february 2016 by Vaguery
[1507.08245] Epistasis and the structure of fitness landscapes: are experimental fitness landscapes compatible with Fisher's model?
The fitness landscape defines the relationship between genotypes and fitness in a given environment, and underlies fundamental quantities such as the distribution of selection coefficient, or the magnitude and type of epistasis. A better understanding of variation of landscape structure across species and environments is thus necessary to understand and predict how populations adapt. An increasing number of experiments access the properties of fitness landscapes by identifying mutations, constructing genotypes with combinations of these mutations, and measuring fitness of these genotypes. Yet these empirical landscapes represent a very small sample of the vast space of all possible genotypes, and this sample is biased by the protocol used to identify mutations. Here we develop a rigorous and flexible statistical framework based on Approximate Bayesian Computation to address these concerns, and use this framework to fit a broad class of phenotypic fitness models (including Fisher's model) to 24 empirical landscapes representing 9 diverse biological systems. In spite of uncertainty due to the small size of most published empirical landscapes, the inferred landscapes have similar structure in similar biological systems. Surprisingly, goodness of fit tests reveal that this class of phenotypic models, which has been successful so far in interpreting experimental data, is a plausible model in only 3 out of 9 biological systems. In most cases, including notably the landscapes of drug resistance, Fisher's model is not able to explain the structure of empirical landscapes and patterns of epistasis.
fitness-landscapes  looking-to-see  theory-and-practice-sitting-in-a-tree  theoretical-biology  population-biology  modeling-is-not-mathematics
february 2016 by Vaguery
[1506.09091] Exploring the Quantum Speed Limit with Computer Games
Humans routinely solve problems of immense computational complexity by intuitively forming simple, low-dimensional heuristic strategies. Citizen science exploits this intuition by presenting scientific research problems to non-experts. Gamification is an effective tool for attracting citizen scientists and allowing them to provide novel solutions to the research problems. Citizen science games have been used successfully in Foldit, EteRNA and EyeWire to study protein and RNA folding and neuron mapping. However, gamification has never been applied in quantum physics. Everyday experiences of non-experts are based on classical physics and it is \textit{a priori} not clear that they should have an intuition for quantum dynamics. Does this premise hinder the use of citizen scientists in the realm of quantum mechanics? Here we report on Quantum Moves, an online platform gamifying optimization problems in quantum physics. Quantum Moves aims to use human players to find solutions to a class of problems associated with quantum computing. Players discover novel solution strategies which numerical optimizations fail to find. Guided by player strategies, a new low-dimensional heuristic optimization method is formed, efficiently outperforming the most prominent established methods. We have developed a low-dimensional rendering of the optimization landscape showing a growing complexity when the player solutions get fast. These fast results offer new insight into the nature of the so-called Quantum Speed Limit. We believe that an increased focus on heuristics and landscape topology will be pivotal for general quantum optimization problems beyond the type presented here.
heuristics  crowdsourcing  intuition  problem-solving  fitness-landscapes  rather-interesting  quantums
february 2016 by Vaguery
[1509.01194] A mathematical analysis of the evolutionary benefits of sexual reproduction
The question as to why most higher organisms reproduce sexually has remained open despite extensive research, and has been called "the queen of problems in evolutionary biology". Theories dating back to Weismann have suggested that the key must lie in the creation of increased variability in offspring, causing enhanced response to selection. Rigorously quantifying the effects of assorted mechanisms which might lead to such increased variability, and establishing that these beneficial effects outweigh the immediate costs of sexual reproduction has, however, proved problematic. Here we introduce an approach which does not focus on particular mechanisms influencing factors such as the fixation of beneficial mutants or the ability of populations to deal with deleterious mutations, but rather tracks the entire distribution of a population of genotypes as it moves across vast fitness landscapes. In this setting simulations now show sex robustly outperforming asex across a broad spectrum of finite or infinite population models. Concentrating on the additive infinite populations model, we are able to give a rigorous mathematical proof establishing that sexual reproduction acts as a more efficient optimiser of mean fitness, thereby solving the problem for this model. Some of the key features of this analysis carry through to the finite populations case.
population-biology  theoretical-biology  simple-models-of-the-evolution  fitness-landscapes  open-questions-that-aren't-especially-open
february 2016 by Vaguery
[1602.03093] The effect of environmental stochasticity on species richness in neutral communities
Environmental stochasticity is known to be a destabilizing factor, increasing abundance fluctuations and extinction rates of populations. However, the stability of a community may benefit from the differential response of species to environmental variations due to the storage effect. This paper provides a systematic and comprehensive discussion of these two contradicting tendencies, using the metacommunity version of the recently proposed time-average neutral model of biodiversity which incorporates environmental stochasticity and demographic noise and allows for extinction and speciation. We show that the incorporation of demographic noise into the model is essential to its applicability, yielding realistic behavior of the system when fitness variations are relatively weak. The dependence of species richness on the strength of environmental stochasticity changes sign when the correlation time of the environmental variations increases. This transition marks the point at which the storage effect no longer succeeds in stabilizing the community.
fitness-landscapes  community-formation  diversity  theoretical-biology  population-biology  rather-interesting  to-write-about
february 2016 by Vaguery
[1510.08697] Systems poised to criticality through Pareto selective forces
Pareto selective forces optimise several targets at the same time, instead of single fitness functions. Systems subjected to these forces evolve towards their Pareto front, a geometrical object akin to the thermodynamical Gibbs surface and whose shape and differential geometry underlie the existence of phase transitions. In this paper we outline the connection of the Pareto front with criticality and critical phase transitions. It is shown how, under definite circumstances, Pareto selective forces drive a system towards a critical ensemble that separates the two phases of a first order phase transition. Different mechanisms implementing such Pareto selective dynamics are revised.
multiobjective-optimization  hey-I-know-this-guy  took-long-enough  fitness-landscapes  nudge-targets  to-write-about
february 2016 by Vaguery
[1601.02712] IRLS and Slime Mold: Equivalence and Convergence
In this paper we present a connection between two dynamical systems arising in entirely different contexts: one in signal processing and the other in biology. The first is the famous Iteratively Reweighted Least Squares (IRLS) algorithm used in compressed sensing and sparse recovery while the second is the dynamics of a slime mold (Physarum polycephalum). Both of these dynamics are geared towards finding a minimum l1-norm solution in an affine subspace. Despite its simplicity the convergence of the IRLS method has been shown only for a certain regularization of it and remains an important open problem. Our first result shows that the two dynamics are projections of the same dynamical system in higher dimensions. As a consequence, and building on the recent work on Physarum dynamics, we are able to prove convergence and obtain complexity bounds for a damped version of the IRLS algorithm.
optimization  abstraction  maybe-a-bit-much  fitness-landscapes  algorithms  nudge-targets  consider:trying-again
february 2016 by Vaguery
[1411.6322] The Complexity of Genetic Diversity
A key question in biological systems is whether genetic diversity persists in the long run under evolutionary competition or whether a single dominant genotype emerges. Classic work by Kalmus in 1945 has established that even in simple diploid species (species with two chromosomes) diversity can be guaranteed as long as the heterozygote individuals enjoy a selective advantage. Despite the classic nature of the problem, as we move towards increasingly polymorphic traits (e.g. human blood types) predicting diversity and understanding its implications is still not fully understood. Our key contribution is to establish complexity theoretic hardness results implying that even in the textbook case of single locus diploid models predicting whether diversity survives or not given its fitness landscape is algorithmically intractable. We complement our results by establishing that under randomly chosen fitness landscapes diversity survives with significant probability. Our results are structurally robust along several dimensions (e.g., choice of parameter distribution, different definitions of stability/persistence, restriction to typical subclasses of fitness landscapes). Technically, our results exploit connections between game theory, nonlinear dynamical systems, complexity theory and biology and establish hardness results for predicting the evolution of a deterministic variant of the well known multiplicative weights update algorithm in symmetric coordination games which could be of independent interest.
february 2016 by Vaguery
[1502.00726] The context-dependence of mutations: a linkage of formalisms
Defining the extent of epistasis - the non-independence of the effects of mutations - is essential for understanding the relationship of genotype, phenotype, and fitness in biological systems. The applications cover many areas of biological research, including biochemistry, genomics, protein and systems engineering, medicine, and evolutionary biology. However, the quantitative definitions of epistasis vary among fields, and its analysis beyond just pairwise effects remains obscure in general. Here, we show that different definitions of epistasis are versions of a single mathematical formalism - the weighted Walsh-Hadamard transform. We discuss that one of the definitions, the backgound-averaged epistasis, is the most informative when the goal is to uncover the general epistatic structure of a biological system, a description that can be rather different from the local epistatic structure of specific model systems. Key issues are the choice of effective ensembles for averaging and to practically contend with the vast combinatorial complexity of mutations. In this regard, we discuss possible approaches for optimally learning the epistatic structure of biological systems.
simple-mathematical-formalism  fitness-landscapes  theoretical-biology  unification  starting-to-really-worry-about-fitness  to-revise
february 2016 by Vaguery
[1511.02088] Realization of Waddington's Metaphor: Potential Landscape, Quasi-potential, A-type Integral and Beyond
Motivated by the famous Waddington's epigenetic landscape metaphor in developmental biology, biophysicists and applied mathematicians made different proposals to realize this metaphor in a rationalized way. We adopt comprehensive perspectives to systematically investigate three different but closely related realizations in recent literature: namely the potential landscape theory from the steady state distribution of stochastic differential equations (SDEs), the quasi-potential from the large deviation theory, and the construction through SDE decomposition and A-type integral.The connections among these theories are established in this paper. We demonstrate that the quasi-potential is the zero noise limit of the potential landscape. We also show that the potential function in the third proposal coincides with the quasi-potential. The most probable transition path by minimizing the Onsager-Machlup or Freidlin-Wentzell action functional is discussed as well. Furthermore, we compare the difference between local and global quasi-potential through the exchange of limit order for time and noise amplitude. As a consequence of such explorations, we arrive at the existence result for the SDE decomposition while deny its uniqueness in general cases. It is also clarified that the A-type integral is more appropriate to be applied to the decomposed SDEs rather than the original one. Our results contribute to a better understanding of existing landscape theories for biological systems.
fitness-landscapes  oh-dear  theoretical-biology  physics!  maybe-not
december 2015 by Vaguery
[1512.05213] The Tangled Nature Model of evolutionary dynamics reconsidered: structural and dynamical effects of trait inheritance
The Tangled Nature Model of biological and cultural evolution features interacting agents which compete for limited resources and reproduce in an error prone fashion and at a rate depending on the tangle' of interactions they maintain with others.
The set of interactions linking a TNM individual to others is key to its reproductive success and arguably constitutes its most important property. Yet, in many studies, the interactions of an individual and those of its mutated off-spring are unrelated, a rather unrealistic feature corresponding to a point mutation turning a giraffe into an elephant. To bring out the structural and dynamical effects of trait inheritance , we introduce and numerically analyze a family of TNM models where a positive integer K parametrises correlations between the interactions of an agent and those of its mutated offspring. For K=1 a single point mutation randomizes all the interactions, while increasing K up to the length of the genome ensures an increasing level of trait inheritance. We show that the distribution of the interactions generated by our rule is nearly independent of the value of K. Changing K strengthens the core structure of the ecology, leads to population abundance distributions which are better approximated by log-normal probability densities and increases the probability that a species extant at time tw is also extant at a later time t. In particular, survival probabilities are shown to decay as powers of the ratio t/tw, similarity to the pure aging behaviour approximately describing glassy systems of physical origin. Increasing the value of K decreases the numerical value of the decay exponent of the power law, which is a clear quantitative dynamical effect of trait inheritance.
evolutionary-biology  fitness-landscapes  theoretical-biology  define-your-terms  rather-interesting  nudge-targets  use-for:what-a-program-does
december 2015 by Vaguery
[1410.1493] Scaling properties of evolutionary paths in a biophysical model of protein adaptation
The enormous size and complexity of genotypic sequence space frequently requires consideration of coarse-grained sequences in empirical models. We develop scaling relations to quantify the effect of this coarse-graining on properties of fitness landscapes and evolutionary paths. We first consider evolution on a simple Mount Fuji fitness landscape, focusing on how the length and predictability of evolutionary paths scale with the coarse-grained sequence length and alphabet. We obtain simple scaling relations for both the weak- and strong-selection limits, with a non-trivial crossover regime at intermediate selection strengths. We apply these results to evolution on a biophysical fitness landscape that describes how proteins evolve new binding interactions while maintaining their folding stability. We combine the scaling relations with numerical calculations for coarse-grained protein sequences to obtain quantitative properties of the model for realistic binding interfaces and a full amino acid alphabet.
fitness-landscapes  approximation  physics  Kauffmania  representation  smh  nudge-targets  against-fitness-landscapes  consider:representation
august 2015 by Vaguery
Metacommunities in Dynamic Landscapes | bioRxiv
Predictions from theory, field data, and experiments have shown that high landscape connectivity promotes higher species richness than low connectivity. However, examples demonstrating high diversity in low connected landscapes also exist. Here we describe the many factors that drive landscape connectivity at different spatiotemporal scales by varying the amplitude and frequency of changes in the dispersal radius of spatial networks. We found that the fluctuations of landscape connectivity support metacommunities with higher species richness than static landscapes. Our results also show a dispersal radius threshold below which species richness drops dramatically in static landscapes. Such a threshold is not observed in dynamic landscapes for a broad range of amplitude and frequency values determining landscape connectivity. We conclude that merging amplitude and frequency as drivers of landscape connectivity together with patch dynamics into metacommunity theory can provide new testable predictions about species diversity in rapidly changing landscapes.
against-fitness-landscapes  fitness-landscapes  evolutionary-biology  Kauffmania  theoretical-biology  looking-to-see  nudge-targets  low-hanging-fruit
august 2015 by Vaguery
Coalescent models for developmental biology and the spatio-temporal dynamics of growing tissues. | bioRxiv
Development is a process that needs to tightly coordinated in both space and time. Cell tracking and lineage tracing have become important experimental techniques in developmental biology and allow us to map the fate of cells and their progeny in both space and time. A generic feature of developing (as well as homeostatic) tissues that these analyses have revealed is that relatively few cells give rise to the bulk of the cells in a tissue; the lineages of most cells come to an end fairly quickly. This has spurned the interest also of computational and theoretical biologists/physicists who have developed a range of modelling -- perhaps most notably are the agent-based modelling (ABM) --- approaches. These can become computationally prohibitively expensive but seem to capture some of the features observed in experiments. Here we develop a complementary perspective that allows us to understand the dynamics leading to the formation of a tissue (or colony of cells). Borrowing from the rich population genetics literature we develop genealogical models of tissue development that trace the ancestry of cells in a tissue back to their most recent common ancestors. We apply this approach to tissues that grow under confined conditions --- as would, for example, be appropriate for the neural crest --- and unbounded growth --- illustrative of the behaviour of 2D tumours or bacterial colonies. The classical coalescent model from population genetics is readily adapted to capture tissue genealogies for different models of tissue growth and development. We show that simple but universal scaling relationships allow us to establish relationships between the coalescent and different fractal growth models that have been extensively studied in many different contexts, including developmental biology. Using our genealogical perspective we are able to study the statistical properties of the processes that give rise to tissues of cells, without the need for large-scale simulations.
theoretical-biology  developmental-biology  evo-devo  artificial-life  population-biology  self-organization  rather-interesting  morphology  fitness-landscapes  nudge-targets  consider:detailed-reexamination
july 2015 by Vaguery
[1502.00916] Learning Planar Ising Models
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a simple greedy algorithm for learning the best planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. We demonstrate our method in simulations and for the application of modeling senate voting records.
machine-learning  fitness-landscapes  the-good-old-days  optimization  rather-interesting  graph-theory  nudge-targets  did-that
april 2015 by Vaguery
Repeatability of evolution on epistatic landscapes | bioRxiv
Evolution is a dynamic process. The two classical forces of evolution are mutation and selection. Assuming small mutation rates, evolution can be predicted based solely on the fitness differences between phenotypes. Predicting an evolutionary process under varying mutation rates as well as varying fitness is still an open question. Experimental procedures, however, do include these complexities along with fluctuating population sizes and stochastic events such as extinctions. We investigate the mutational path probabilities of systems having epistatic effects on both fitness and mutation rates using a theoretical and computational framework. In contrast to previous models, we do not limit ourselves to the typical strong selection, weak mutation (SSWM)-regime or to fixed population sizes. Rather we allow epistatic interactions to also affect mutation rates. This can lead to qualitatively non-trivial dynamics. Pathways, that are negligible in the SSWM-regime, can overcome fitness valleys and become accessible. This finding has the potential to extend the traditional predictions based on the SSWM foundation and bring us closer to what is observed in experimental systems.
epistasis  theoretical-biology  fitness-landscapes  Kauffmania  simulation  nudge-targets  against-fitness-landscapes
april 2015 by Vaguery
[1502.05589] Book Review of 'Evolutionary and Interpretive Archaeologies' Edited by Ethan E. Cochrane and Andrew Gardner
Evolutionary and Interpretive Archaeologies, edited by Ethan E. Cochrane and Andrew Gardner, grew out of a seminar at the Institute for Archaeology at University College London in 2007. It consists of 15 chapters by archaeologists who self-identify themselves as practitioners who emphasize the benefits of evolutionary or interpretive approaches to the study of the archaeological record. While the authors' theoretical views are dichotomous, the editors' aim for the book as a whole is not to expound on the differences between these two kinds of archaeology but to bring forward a richer understanding of the discipline and to highlight areas of mutual concern. Some chapters come across as a bit of a sales pitch, but the majority of the contributions emphasize how each approach can be productively used to address the goals of the other. The book seeks to contribute to a mutually beneficial and more productive discipline, and overall, it succeeds in this effort.
february 2015 by Vaguery
[1501.07414] Approximations and bounds for binary Markov random fields
We develop an easy to compute second order interaction removal approximation for pseudo-Boolean functions and use this to define a new approximation and upper and lower bounds for the normalising constant of binary Markov random fields (MRFs). As a by-product of the approximation procedure we obtain also a partially ordered Markov model (POMM) approximation of the MRF.
fitness-landscapes  energy-landscapes  approximation  Markov-random-fields  modeling  nudge-targets  consider:approximation
february 2015 by Vaguery
[1501.04708] Modularity Enhances the Rate of Evolution in a Rugged Fitness Landscape
Biological systems are modular, and this modularity affects the evolution of biological systems over time and in different environments. We here develop a theory for the dynamics of evolution in a rugged, modular fitness landscape. We show analytically how horizontal gene transfer couples to the modularity in the system and leads to more rapid rates of evolution at short times. The model, in general, analytically demonstrates a selective pressure for the prevalence of modularity in biology. We use this model to show how the evolution of the influenza virus is affected by the modularity of the proteins that are recognized by the human immune system. Approximately 25\% of the observed rate of fitness increase of the virus could be ascribed to a modular viral landscape.
Kauffmania  fitness-landscapes  evolutionary-biology  theoretical-biology  complexology  Misrepresentations  consider:the-flawed-narrative-of-externalities
february 2015 by Vaguery
[1412.5738] Watersheds in disordered media
What is the best way to divide a rugged landscape? Since ancient times, watersheds separating adjacent water systems that flow, for example, toward different seas, have been used to delimit boundaries. Interestingly, serious and even tense border disputes between countries have relied on the subtle geometrical properties of these tortuous lines. For instance, slight and even anthropogenic modifications of landscapes can produce large changes in a watershed, and the effects can be highly nonlocal. Although the watershed concept arises naturally in geomorphology, where it plays a fundamental role in water management, landslide, and flood prevention, it also has important applications in seemingly unrelated fields such as image processing and medicine. Despite the far-reaching consequences of the scaling properties on watershed-related hydrological and political issues, it was only recently that a more profound and revealing connection has been disclosed between the concept of watershed and statistical physics of disordered systems. This review initially surveys the origin and definition of a watershed line in a geomorphological framework to subsequently introduce its basic geometrical and physical properties. Results on statistical properties of watersheds obtained from artificial model landscapes generated with long-range correlations are presented and shown to be in good qualitative and quantitative agreement with real landscapes.
fitness-landscapes  watersheds  hey-I-remember-that-sketch-from-I-shouldn't-say  LOL-at-"only-recently"  feature-construction  nudge-targets  rather-interesting  can-be-useful-as-I-recall  consider:check-work-citing-this
february 2015 by Vaguery
[1211.2878] Defensive complexity and the phylogenetic conservation of immune control
One strategy for winning a coevolutionary struggle is to evolve rapidly. Most of the literature on host-pathogen coevolution focuses on this phenomenon, and looks for consequent evidence of coevolutionary arms races. An alternative strategy, less often considered in the literature, is to deter rapid evolutionary change by the opponent. To study how this can be done, we construct an evolutionary game between a controller that must process information, and an adversary that can tamper with this information processing. In this game, a species can foil its antagonist by processing information in a way that is hard for the antagonist to manipulate. We show that the structure of the information processing system induces a fitness landscape on which the adversary population evolves. Complex processing logic can carve long, deep fitness valleys that slow adaptive evolution in the adversary population. We suggest that this type of defensive complexity on the part of the vertebrate adaptive immune system may be an important element of coevolutionary dynamics between pathogens and their vertebrate hosts. Furthermore, we cite evidence that the immune control logic is phylogenetically conserved in mammalian lineages. Thus our model of defensive complexity suggests a new hypothesis for the lower rates of evolution for immune control logic compared to other immune structures.
theoretical-biology  coevolution  evolutionary-biology  fitness-landscapes  game-theory  information-theory  nudge-targets  consider:actually-checking
february 2015 by Vaguery
[1309.1898] Fitness and entropy production in a cell population dynamics with epigenetic phenotype switching
Motivated by recent understandings in the stochastic natures of gene expression, biochemical signaling, and spontaneous reversible epigenetic switchings, we study a simple deterministic cell population dynamics in which subpopulations grow with different rates and individual cells can bi-directionally switch between a small number of different epigenetic phenotypes. Two theories in the past, the population dynamics and thermodynamics of master equations, separatedly defined two important concepts in mathematical terms: the {\em fitness} in the former and the (non-adiabatic) {\em entropy production} in the latter. Both play important roles in the evolution of the cell population dynamics. The switching sustains the variations among the subpopulation growth thus continuous natural selection. As a form of Price's equation, the fitness increases with (i) natural selection through variations and (ii) a positive covariance between the per capita growth and switching, which represents a Lamarchian-like behavior. A negative covariance balances the natural selection in a fitness steady state | "the red queen" scenario. At the same time the growth keeps the proportions of subpopulations away from the "intrinsic" switching equilibrium of individual cells, thus leads to a continous entropy production. A covariance, between the per capita growth rate and the "chemical potential" of subpopulation, counter-acts the entropy production. Analytical results are obtained for the limiting cases of growth dominating switching and vice versa.
evolutionary-biology  Kauffmania  fitness-landscapes  theoretical-biology  philosophy-of-science  complexology  I-really-have-to-tear-the-roots-out-of-this-crap-some-day-soon
january 2015 by Vaguery
Scaling properties of evolutionary paths in a biophysical model of protein adaptation | bioRxiv
The enormous size and complexity of genotypic sequence space frequently requires consideration of coarse-grained sequences in empirical models. We develop scaling relations to quantify the effect of this coarse-graining on properties of fitness landscapes and evolutionary paths. We first consider evolution on a simple Mount Fuji fitness landscape, focusing on how the length and predictability of evolutionary paths scale with the coarse-grained sequence length and number of alleles. We obtain simple scaling relations for both the weak- and strong-selection limits, with a non-trivial crossover regime at intermediate selection strengths. We apply these results to evolution on a biophysical fitness landscape designed to describe how proteins evolve new binding interactions while maintaining their folding stability. We combine numerical calculations for coarse-grained protein sequences with the scaling relations to obtain quantitative properties of the model for realistic binding interfaces and a full amino acid alphabet.
fitness-landscapes  Kauffmania  theoretical-biology  molecular-design  complexology  nudge-targets  multiscale-representations?  approximation
october 2014 by Vaguery
[1410.0576] Mapping Energy Landscapes of Non-Convex Learning Problems
In many statistical learning problems, the target functions to be optimized are highly non-convex in various model spaces and thus are difficult to analyze. In this paper, we compute \emph{Energy Landscape Maps} (ELMs) which characterize and visualize an energy function with a tree structure, in which each leaf node represents a local minimum and each non-leaf node represents the barrier between adjacent energy basins. The ELM also associates each node with the estimated probability mass and volume for the corresponding energy basin. We construct ELMs by adopting the generalized Wang-Landau algorithm and multi-domain sampler that simulates a Markov chain traversing the model space by dynamically reweighting the energy function. We construct ELMs in the model space for two classic statistical learning problems: i) clustering with Gaussian mixture models or Bernoulli templates; and ii) bi-clustering. We propose a way to measure the difficulties (or complexity) of these learning problems and study how various conditions affect the landscape complexity, such as separability of the clusters, the number of examples, and the level of supervision; and we also visualize the behaviors of different algorithms, such as K-mean, EM, two-step EM and Swendsen-Wang cuts, in the energy landscapes.
fitness-landscapes  energy-landscapes  algorithms  machine-learning  visualization  rather-interesting  talking-about-doing  theory-and-practice-sitting-in-a-tree
october 2014 by Vaguery
Convergent Evolution During Local Adaptation to Patchy Landscapes | bioRxiv
fitness-landscapes  evolution  ecology  diversity  models  nudge-targets  consider:recognizers  wondering-about-archiving
september 2014 by Vaguery
[1409.1143] Tunably Rugged Landscapes with Known Maximum and Minimum
We propose NM landscapes as a new class of tunably rugged benchmark problems. NM landscapes are well-defined on alphabets of any arity, including both discrete and real-valued alphabets, include epistasis in a natural and transparent manner, are proven to have known value and location of the global maximum and, with some additional constraints, are proven to also have a known global minimum. Empirical studies are used to illustrate that, when coefficients are selected from a recommended distribution, the ruggedness of NM landscapes is smoothly tunable and correlates with several measures of search difficulty. We discuss why these properties make NM landscapes preferable to both NK landscapes and Walsh polynomials as benchmark landscape models with tunable epistasis.
fitness-landscapes  Kauffmania  theoretical-biology  hey-I-know-that-lady  parametrization  complexology  nudge-targets  consider:horse-race
september 2014 by Vaguery
[1408.4856] Phase transition in random adaptive walks on correlated fitness landscapes
We study biological evolution on a random fitness landscape where correlations are introduced through a linear fitness gradient of strength c. When selection is strong and mutations rare the dynamics is a directed uphill walk that terminates at a local fitness maximum. We analytically calculate the dependence of the walk length on the genome size L. When the distribution of the random fitness component has an exponential tail we find a phase transition of the walk length D between a phase at small c where walks are short (D∼lnL) and a phase at large c where walks are long (D∼L). For all other distributions only a single phase exists for any c>0. The considered process is equivalent to a zero temperature Metropolis dynamics for the random energy model in an external magnetic field, thus also providing insight into the aging dynamics of spin glasses.
fitness-landscapes  complexology  theoretical-biology  nudge-targets  phase-transitions  oh-that-old-thing
august 2014 by Vaguery
[1304.0246] The number of accessible paths in the hypercube
Motivated by an evolutionary biology question, we study the following problem: we consider the hypercube {0,1}L where each node carries an independent random variable uniformly distributed on [0,1], except (1,1,…,1) which carries the value 1 and (0,0,…,0) which carries the value x∈[0,1]. We study the number Θ of paths from vertex (0,0,…,0) to the opposite vertex (1,1,…,1) along which the values on the nodes form an increasing sequence. We show that if the value on (0,0,…,0) is set to x=X/L then Θ/L converges in law as L→∞ to e−X times the product of two standard independent exponential variables.
As a first step in the analysis we study the same question when the graph is that of a tree where the root has arity L, each node at level 1 has arity L−1, \ldots, and the nodes at level L−1 have only one offspring which are the leaves of the tree (all the leaves are assigned the value 1, the root the value x∈[0,1]).
fitness-landscapes  Kauffmania  combinatorics  probability-theory  counting
may 2014 by Vaguery
[1404.1061] Inferring fitness landscapes by regression produces biased estimates of epistasis
The genotype-fitness map plays a fundamental role in shaping the dynamics of evolution. However, it is difficult to directly measure a fitness landscape in practice, because the number of possible genotypes is astronomical. One approach is to sample as many genotypes as possible, measure their fitnesses, and fit a statistical model of the landscape that includes additive and pairwise interactive effects between loci. Here we elucidate the pitfalls of using such regressions, by studying artificial but mathematically convenient fitness landscapes. We identify two sources of bias inherent in these regression procedures that each tends to under-estimate high fitnesses and over-estimate low fitnesses. We characterize these biases for random sampling of genotypes, as well as for samples drawn from a population under selection in the Wright-Fisher model of evolutionary dynamics. We show that common measures of epistasis, such as the number of monotonically increasing paths between ancestral and derived genotypes, the prevalence of sign epistasis, and the number of local fitness maxima, are distorted in the inferred landscape. As a result, the inferred landscape will provide systematically biased predictions for the dynamics of adaptation. We identify the same biases in a computational RNA-folding landscape, as well as in regulatory sequence binding data, treated with the same fitting procedure. Finally, we present a method that may ameliorate these biases in some cases.
april 2014 by Vaguery
[1307.1918] The Changing Geometry of a Fitness Landscape Along an Adaptive Walk
It has recently been noted that the relative prevalence of the various kinds of epistasis varies along an adaptive walk. This has been explained as a result of mean regression in NK model fitness landscapes. Here we show that this phenomenon occurs quite generally in fitness landscapes. We propose a simple and general explanation for this phenomemon, confirming the role of mean regression. We provide support for this explanation with simulations, and discuss the empirical relevance of our findings.
fitness-landscapes  Kauffmania  theoretical-biology  nudge-targets  consider:inverse-problem  consider:stress-testing
march 2014 by Vaguery
[1312.0688] Biophysical Fitness Landscapes for Transcription Factor Binding Sites
Evolutionary trajectories and phenotypic states available to cell populations are ultimately dictated by intermolecular interactions between DNA, RNA, proteins, and other molecular species. Here we study how evolution of gene regulation in a single-cell eukaryote S. cerevisiae is affected by the interactions between transcription factors (TFs) and their cognate genomic sites. Our study is informed by high-throughput in vitro measurements of TF-DNA binding interactions and by a comprehensive collection of genomic binding sites. Using an evolutionary model for monomorphic populations evolving on a fitness landscape, we infer fitness as a function of TF-DNA binding energy for a collection of 12 yeast TFs, and show that the shape of the predicted fitness functions is in broad agreement with a simple thermodynamic model of two-state TF-DNA binding. However, the effective temperature of the model is not always equal to the physical temperature, indicating selection pressures in addition to biophysical constraints caused by TF-DNA interactions. We find little statistical support for the fitness landscape in which each position in the binding site evolves independently, showing that epistasis is common in evolution of gene regulation. Finally, by correlating TF-DNA binding energies with biological properties of the sites or the genes they regulate, we are able to rule out several scenarios of site-specific selection, under which binding sites of the same TF would experience a spectrum of selection pressures depending on their position in the genome. These findings argue for the existence of universal fitness landscapes which shape evolution of all sites for a given TF, and whose properties are determined in part by the physics of protein-DNA interactions.
fitness-landscapes  bioinformatics  biochemistry  biological-engineering  experiment  simulation  interesting  theoretical-biology  of-course-it's-physics-now
march 2014 by Vaguery
[1402.3065] Adaptation in tunably rugged fitness landscapes: The Rough Mount Fuji Model
Much of the current theory of adaptation is based on Gillespie's mutational landscape model (MLM), which assumes that the fitness values of genotypes linked by single mutational steps are independent random variables. On the other hand, a growing body of empirical evidence shows that real fitness landscapes, while possessing a considerable amount of ruggedness, are smoother than predicted by the MLM. In the present article we propose and analyse a simple fitness landscape model with tunable ruggedness based on the Rough Mount Fuji (RMF) model originally introduced by Aita et al. [Biopolymers 54:64-79 (2000)] in the context of protein evolution. We provide a comprehensive collection of results pertaining to the topographical structure of RMF landscapes, including explicit formulae for the expected number of local fitness maxima, the location of the global peak, and the fitness correlation function. The statistics of single and multiple adaptive steps on the RMF landscape are explored mainly through simulations, and the results are compared to the known behavior in the MLM model. Finally, we show that the RMF model can explain the large number of second-step mutations observed on a highly-fit first step backgound in a recent evolution experiment with a microvirid bacteriophage [Miller et al., Genetics 187:185-202 (2011)].
fitness-landscapes  Kauffmania  Nk-models  complexology  nudge-targets  consider:edge-case-exploration
march 2014 by Vaguery
[1312.1983] Satisfiability and Evolution
We show that, if truth assignments on n variables reproduce through recombination so that satisfaction of a particular Boolean function confers a small evolutionary advantage, then a polynomially large population over polynomially many generations (polynomial in n and the inverse of the initial satisfaction probability) will end up almost certainly consisting exclusively of satisfying truth assignments. We argue that this theorem sheds light on the problem of novelty in Evolution.
fitness-landscapes  having-your-own-names-for-things  errrmm...
march 2014 by Vaguery
[1305.6231] Evolutionary Predictability and Complications with Additivity
fitness-landscapes  theoretical-biology  models  probability-theory  inference  oh-dear  are-people-actually-doing-this-in-biology?!  nudge-targets  consider:stress-testing
january 2014 by Vaguery
[1301.1439] Adaptive walks and distribution of beneficial fitness effects
We study the adaptation dynamics of a maladapted asexual population on rugged fitness landscapes with many local fitness peaks. The distribution of beneficial fitness effects is assumed to belong to one of the three extreme value domains, viz. Weibull, Gumbel and Fr{\'e}chet. We work in the strong selection-weak mutation regime in which beneficial mutations fix sequentially, and the population performs an uphill walk on the fitness landscape until a local fitness peak is reached. A striking prediction of our analysis is that the fitness difference between successive steps follows a pattern of diminishing returns in the Weibull domain and accelerating returns in the Fr{\'e}chet domain, as the initial fitness of the population is increased. These trends are found to be robust with respect to fitness correlations. We believe that this result can be exploited in experiments to determine the extreme value domain of the distribution of beneficial fitness effects. Our work here differs significantly from the previous ones that assume the selection coefficient to be small. On taking large effect mutations into account, we find that the length of the walk shows different qualitative trends from those derived using small selection coefficient approximation.
fitness-landscapes  Kauffmania  nudge-targets  theoretical-biology  oh-no-you-didn't
december 2013 by Vaguery
[1309.1152] The inevitability of unconditionally deleterious substitutions during adaptation
Studies on the genetics of adaptation typically neglect the possibility that a deleterious mutation might fix. Nonetheless, here we show that, in many regimes, the first substitution is most often deleterious, even when fitness is expected to increase in the long term. In particular, we prove that this phenomenon occurs under weak mutation for any house-of-cards model with an equilibrium distribution. We find that the same qualitative results hold under Fisher's geometric model. We also provide a simple intuition for the surprising prevalence of unconditionally deleterious substitutions during early adaptation. Importantly, the phenomenon we describe occurs on fitness landscapes without any local maxima and is therefore distinct from "valley-crossing". Our results imply that the common practice of ignoring deleterious substitutions leads to qualitatively incorrect predictions in many regimes. Our results also have implications for the substitution process at equilibrium and for the response to a sudden decrease in population size.
fitness-landscapes  theoretical-biology  population-biology  alma-maters  nudge-targets
november 2013 by Vaguery
[1310.8592] Slow protein fluctuations explain the emergence of growth phenotypes and persistence in clonal bacterial populations
One of the most challenging problems in microbiology is to understand how a small fraction of microbes that resists killing by antibiotics can emerge in a population of genetically identical cells, the phenomenon known as persistence or drug tolerance. Its characteristic signature is the biphasic kill curve, whereby microbes exposed to a bactericidal agent are initially killed very rapidly but then much more slowly. Here we relate this problem to the more general problem of understanding the emergence of distinct growth phenotypes in clonal populations. We address the problem mathematically by adopting the framework of the phenomenon of so-called weak ergodicity breaking, well known in dynamical physical systems, which we extend to the biological context. We show analytically and by direct stochastic simulations that distinct growth phenotypes can emerge as a consequence of slow-down of stochastic fluctuations in the expression of a gene controlling growth rate. In the regime of fast gene transcription, the system is ergodic, the growth rate distribution is unimodal, and accounts for one phenotype only. In contrast, at slow transcription and fast translation, weakly non-ergodic components emerge, the population distribution of growth rates becomes bimodal, and two distinct growth phenotypes are identified. When coupled to the well-established growth rate dependence of antibiotic killing, this model describes the observed fast and slow killing phases, and reproduces much of the phenomenology of bacterial persistence. The model has major implications for efforts to develop control strategies for persistent infections.
systems-biology  microbiology  fitness-landscapes  dynamics  models  theoretical-biology  nudge-targets
november 2013 by Vaguery
[1310.6372] A multiobjective optimization approach to statistical mechanics
Optimization problems have been the subject of statistical physics approximations, given the natural connection between fitness optima in rugged landscapes, disordered systems and their underlying Hamiltonians. In this paper we present a novel approach to Multi-Objective Optimization (MOO), a general optimization method that encompasses all optimal solutions as long as no bias toward any of the objectives exists. The potential solutions are scattered over the so called Pareto front, and it is shown here that thermodynamics can be treated as a MOO using the associated free energy with constraints. The nature of phase transitions is tied to the Pareto front organization and we illustrate this using both the Ising and Potts models.
via:cshalizi  multiobjective-optimization  statistical-mechanics  fitness-landscapes  representation  interesting
october 2013 by Vaguery
[1309.3312] Universality and predictability in the evolution of molecular quantitative traits
Molecular traits, such as gene expression levels or protein binding affinities, are increasingly accessible to quantitative measurement by modern high-throughput techniques. Such traits measure molecular functions and, from an evolutionary point of view, are important as targets of natural selection. Here we discuss recent developments in the evolutionary theory of quantitative traits that reach beyond classical quantitative genetics. We focus on universal evolutionary characteristics: these are largely independent of a trait's genetic basis, which is often at least partially unknown. We show that universal measurements can be used to infer selection on a quantitative trait, which determines its evolutionary mode of conservation or adaptation. Furthermore, universality is closely linked to predictability of trait evolution across lineages. We argue that universal trait statistics extends over a range of cellular scales and opens new avenues of quantitative evolutionary systems biology.
fitness-landscapes  evolutionary-biology  population-biology  quantitative-biology  complexology  neutral-networks  meh?
september 2013 by Vaguery
[1309.2979] Fitness Probability Distribution of Bit-Flip Mutation
Bit-flip mutation is a common mutation operator for evolutionary algorithms applied to optimize functions over binary strings. In this paper, we develop results from the theory of landscapes and Krawtchouk polynomials to exactly compute the probability distribution of fitness values of a binary string undergoing uniform bit-flip mutation. We prove that this probability distribution can be expressed as a polynomial in p, the probability of flipping each bit. We analyze these polynomials and provide closed-form expressions for an easy linear problem (Onemax), and an NP-hard problem, MAX-SAT. We also discuss some implications of the results for runtime analysis.
fitness-landscapes  search-algorithms  evolutionary-algorithms  performance-measure  theoretical-biology  nudge-targets  computer-science
september 2013 by Vaguery
[1306.2538] Delayed self-regulation leads to novel states in epigenetic landscapes
The epigenetic pathway of a cell as it differentiates from a stem cell state to a mature lineage-committed one has been historically understood in terms of Waddington's landscape, consisting of hills and valleys. The smooth top and valley-strewn bottom of the hill represents their undifferentiated and differentiated states respectively. Although mathematical ideas rooted in nonlinear dynamics and bifurcation theory have been used to quantify this picture, the importance of time delays arising from multistep chemical reactions or cellular shape transformations have been ignored so far. We argue that this feature is crucial in understanding cell differentiation and explore the role of time delay in a model of a single gene regulatory circuit. We show that the interplay of time-dependant drive and delay introduces a new regime where the system shows sustained oscillations between the two admissible steady states. We interpret these results in the light of recent perplexing experiments on inducing the pluripotent state in mouse somatic cells. We also comment on how such an oscillatory state can provide a framework for understanding more general feedback circuits in cell development.
theoretical-biology  fitness-landscapes  epigenetics  nudge-targets  complexology
june 2013 by Vaguery
[1306.1938] Evolutionary accessibility of modular fitness landscapes
A fitness landscape is a mapping from the space of genetic sequences, which is modeled here as a binary hypercube of dimension $L$, to the real numbers. We consider random models of fitness landscapes, where fitness values are assigned according to some probabilistic rule, and study the statistical properties of pathways to the global fitness maximum along which fitness increases monotonically. Such paths are important for evolution because they are the only ones that are accessible to an adapting population when mutations occur at a low rate. The focus of this work is on the block model introduced by A.S. Perelson and C.A. Macken [Proc. Natl. Acad. Sci. USA 92:9657 (1995)] where the genome is decomposed into disjoint sets of loci (modules') that contribute independently to fitness, and fitness values within blocks are assigned at random. We show that the number of accessible paths can be written as a product of the path numbers within the blocks, which provides a detailed analytic description of the path statistics. The block model can be viewed as a special case of Kauffman's NK-model, and we compare the analytic results to simulations of the NK-model with different genetic architectures. We find that the mean number of accessible paths in the different versions of the model are quite similar, but the distribution of the path number is qualitatively different in the block model due to its multiplicative structure. A similar statement applies to the number of local fitness maxima in the NK-models, which has been studied extensively in previous works. The overall evolutionary accessibility of the landscape, as quantified by the probability to find at least one accessible path to the global maximum, is dramatically lowered by the modular structure.
Kauffmania  NK-landscapes  fitness-landscapes  artificial-life  graph-theory  sadly-single-objective
june 2013 by Vaguery
[1303.3842] Antibiotic resistance landscapes: a quantification of theory-data incompatibility for fitness landscapes
Fitness landscapes are central in analyzing evolution, in particular for drug resistance mutations for bacteria and virus. We show that the fitness landscapes associated with antibiotic resistance are not compatible with any of the classical models; additive, uncorrelated and block fitness landscapes. The NK model is also discussed. It is frequently stated that virtually nothing is known about fitness landscapes in nature. We demonstrate that available records of antimicrobial drug mutations can reveal interesting properties of fitness landscapes in general. We apply the methods to analyze the TEM family of $\beta$-lactamases associated with antibiotic resistance. Laboratory results agree with our observations. The qualitative tools we suggest are well suited for comparisons of empirical fitness landscapes. Fitness landscapes are central in the theory of recombination and there is a potential for finding relations between the tools and recombination strategies.
fitness-landscapes  theoretical-biology  evolution  Kauffmania  empirical-theorizing  nudge-targets  interesting
may 2013 by Vaguery
[1304.5003] An Efficient Linear Programming Algorithm to Generate the Densest Lattice Sphere Packings
Finding the densest sphere packing in $d$-dimensional Euclidean space $\mathbb{R}^d$ is an outstanding fundamental problem with relevance in many fields, including the ground states of molecular systems, colloidal crystal structures, coding theory, discrete geometry, number theory, and biological systems. Numerically generating the densest sphere packings becomes very challenging in high dimensions due to an exponentially increasing number of possible sphere contacts and sphere configurations, even for the restricted problem of finding the densest lattice sphere packings. In this paper, we apply the Torquato-Jiao packing algorithm, which is a method based on solving a sequence of linear programs, to robustly reproduce the densest known lattice sphere packings for dimensions 2 through 19. We show that the TJ algorithm is appreciably more efficient at solving these problems than previously published methods. Indeed, in some dimensions, the former procedure can be as much as three orders of magnitude faster at finding the optimal solutions than earlier ones. We also study the suboptimal local density-maxima solutions (inherent structures or "extreme" lattices) to gain insight about the nature of the topography of the "density" landscape.
algorithms  computational-geometry  geometry  linear-programming  nudge-targets  fitness-landscapes
may 2013 by Vaguery
[1207.6431] Optimal reconstruction of the folding landscape using differential energy surface analysis
In experiments and in simulations, the free energy of a state of a system can be determined from the probability that the state is occupied. However, it is often necessary to impose a biasing potential on the system so that high energy states are sampled with sufficient frequency. The unbiased energy is typically obtained from the data using the weighted histogram analysis method (WHAM). Here we present differential energy surface analysis (DESA), in which the gradient of the energy surface, dE/dx, is extracted from data taken with a series of harmonic biasing potentials. It is shown that DESA produces a maximum likelihood estimate of the folding landscape gradient. DESA is demonstrated by analyzing data from a simulated system as well as data from a single-molecule unfolding experiment in which the end-to-end distance of a DNA hairpin is measured. It is shown that the energy surface obtained from DESA is indistinguishable from the energy surface obtained when WHAM is applied to the same data. Two criteria are defined which indicate whether the DESA results are self-consistent. It is found that these criteria can detect a situation where the energy is not a single-valued function of the measured reaction coordinate. The criteria were found to be satisfied for the experimental data analyzed, confirming that end-to-end distance is a good reaction coordinate for the experimental system. The combination of DESA and the optical trap assay in which a structure is disrupted under harmonic constraint facilitates an extremely accurate measurement of the folding energy surface.
biochemistry  protein-folding  fitness-landscapes  nudge-targets  algorithms  performance-measure
may 2013 by Vaguery
[1304.3681] A path-based approach to random walks on networks characterizes how proteins evolve new function
We develop a path-based approach to continuous-time random walks on networks with arbitrarily weighted edges. We describe an efficient numerical algorithm for calculating statistical properties of the stochastic path ensemble. After demonstrating our approach on two reaction rate problems, we present a biophysical model that describes how proteins evolve new functions while maintaining thermodynamic stability. We use our methodology to characterize dynamics of evolutionary adaptation, reproducing several key features observed in directed evolution experiments. We find that proteins generally fall into two qualitatively different regimes of adaptation depending on their binding and folding energetics.
fitness-landscapes  theoretical-biology  exaptation  nudge-targets  simulation  interesting
may 2013 by Vaguery
[1301.0004] Population genetics of gene function
This paper shows that differentiating the lifetimes of two phenotypes independently from their fertility can lead to a qualitative change in the equilibrium of a population: since survival and reproduction are distinct functional aspects of an organism, this observation contributes to extend the population-genetical characterisation of biological function. To support this statement a mathematical relation is derived to link the lifetime ratio T_1/T_2, which parametrizes the different survival ability of two phenotypes, with population variables that quantify the amount of neutral variation underlying a population's phenotypic distribution.
population-biology  theoretical-biology  evolution  life-histories  nudge-targets  fitness-landscapes  contingency  define-your-terms-project
april 2013 by Vaguery
[1304.3738] Effects of Epistasis and Pleiotropy on Fitness Landscapes
The factors that influence genetic architecture shape the structure of the fitness landscape, and therefore play a large role in the evolutionary dynamics. Here the NK model is used to investigate how epistasis and pleiotropy -- key components of genetic architecture -- affect the structure of the fitness landscape, and how they affect the ability of evolving populations to adapt despite the difficulty of crossing valleys present in rugged landscapes. Populations are seen to make use of epistatic interactions and pleiotropy to attain higher fitness, and are not inhibited by the fact that valleys have to be crossed to reach peaks of higher fitness.
NK-models  Kauffmania  fitness-landscapes  nudge-targets  theoretical-biology  epigenetics  looking-under-a-lightpost
april 2013 by Vaguery
[1302.3541] An analysis of NK and generalized NK landscapes
Simulated landscapes have been used for decades to evaluate search strategies whose goal is to find the landscape location with maximum fitness. Applications include modeling the capacity of enzymes to catalyze reactions and the clinical effectiveness of medical treatments. Understanding properties of landscapes is important for understanding search difficulty. This paper presents a novel and transparent characterization of NK landscapes.

We prove that NK landscapes can be represented by parametric linear interaction models where model coefficients have meaningful interpretations. We derive the statistical properties of the model coefficients, providing insight into how the NK algorithm parses importance to main effects and interactions. An important insight derived from the linear model representation is that the rank of the linear model defined by the NK algorithm is correlated with the number of local optima, a strong determinant of landscape complexity and search difficulty. We show that the maximal rank for an NK landscape is achieved through epistatic interactions that form partially balanced incomplete block designs. Finally, an analytic expression representing the expected number of local optima on the landscape is derived, providing a way to quickly compute the expected number of local optima for very large landscapes.
NK-landscapes  fitness-landscapes  Kauffmania  matrices  representation  but-they-didn't-do-the-obvious-relaxation-experiment  nudge-targets
march 2013 by Vaguery
[1211.3609] Neutral selection
Hubbell's neutral theory of biodiversity has successfully explained the observed composition of many ecological communities but it relies on strict demographic equivalence among species and provides no room for evolutionary processes like selection, adaptation and speciation. Here we show how to embed the neutral theory within the Darwinian framework. In a fitness landscape with a quadratic maximum, typical of quantitative traits, selection restricts the extant species to have traits close to optimal, so that the fitness differences between surviving species are small. For sufficiently small mutation steps, the community structure fits perfectly to the Fisher log-series species abundance distribution. The theory is relatively insensitive to moderate amounts of environmental noise, wherein the location of the fitness maximum changes by amounts of order the width of the noise-free distribution. Adding very large environmental noise to the model qualitatively changes the abundance distributions, converting the exponential fall-off of large species to a power-law decay, typical of a neutral model with environmental noise.
fitness-landscapes  population-biology  theoretical-biology  nudge-targets  simulation
march 2013 by Vaguery
[1208.5954] How to infer relative fitness from a sample of genomic sequences
Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we shall demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator which identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3 depending on the mutation/selection parameters. The ranking also enables to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks.
genetics  fitness-landscapes  inference  statistics  phylogenetics  inverse-problems  nudge-targets  modeling
march 2013 by Vaguery
[1207.4452] Pareto Local Optima of Multiobjective NK-Landscapes with Correlated Objectives
"In this paper, we conduct a fitness landscape analysis for multiobjective combinatorial optimization, based on the local optima of multiobjective NK-landscapes with objective correlation. In single-objective optimization, it has become clear that local optima have a strong impact on the performance of metaheuristics. Here, we propose an extension to the multiobjective case, based on the Pareto dominance. We study the co-influence of the problem dimension, the degree of non-linearity, the number of objectives and the correlation degree between objective functions on the number of Pareto local optima."
NK-landscapes  multiobjective-optimization  fitness-landscapes  thank-god-somebody-finally-finished-my-thesis-project-15-years-later
august 2012 by Vaguery
[1207.1253] Interpolating between Random Walks and Shortest Paths: a Path Functional Approach
"General models of network navigation must contain a deterministic or drift component, encouraging the agent to follow routes of least cost, as well as a random of diffusive component, enabling free wandering. This paper proposes a thermodynamic formalism involving two path functionals, namely an energy functional governing the drift and an entropy functional governing the diffusion. A freely adjustable parameter, the temperature, arbitrates between the conflicting objectives of minimising travel costs and maximising spatial exploration. The theory is illustrated on various graphs and various temperatures. The resulting optimal paths, together with presumably new associated edges and nodes centrality indices, are analytically and numerically investigated."
fitness-landscapes  network-theory  useful-parametrization  exploration  exploitation  models-of-search  nudge  for-the-book
august 2012 by Vaguery
[1207.4631] Analyzing the Effect of Objective Correlation on the Efficient Set of MNK-Landscapes
"In multiobjective combinatorial optimization, there exists two main classes of metaheuristics, based either on multiple aggregations, or on a dominance relation. As in the single objective case, the structure of the search space can explain the difficulty for multiobjective metaheuristics, and guide the design of such methods. In this work we analyze the properties of multiobjective combinatorial search spaces. In particular, we focus on the features related the efficient set, and we pay a particular attention to the correlation between objectives. Few benchmark takes such objective correlation into account. Here, we define a general method to design multiobjective problems with correlation. As an example, we extend the well-known multiobjective NK-landscapes. By measuring different properties of the search space, we show the importance of considering the objective correlation on the design of metaheuristics."
multiobjective-optimization  NK-landscapes  metaheuristics  fitness-landscapes  nudge-targets  thank-god-somebody-finally-finished-my-thesis-project-15-years-later
august 2012 by Vaguery
[1112.5218] Patterns of neutral diversity under general models of selective sweeps
"Two major sources of stochasticity in the dynamics of neutral alleles result from resampling of finite populations (genetic drift) and the random genetic background of nearby selected alleles on which the neutral alleles are found (linked selection). There is now good evidence that linked selection plays an important role in shaping polymorphism levels in a number of species. One of the best investigated models of linked selection is the recurrent full sweep model, in which newly arisen selected alleles fix rapidly. However, the bulk of selected alleles that sweep into the population may not be destined for rapid fixation. Here we develop a general model of recurrent selective sweeps in a coalescent framework, one that generalizes the recurrent full sweep model to the case where selected alleles do not sweep to fixation. We show that in a large population, only the initial rapid increase of a selected allele affects the genealogy at partially linked sites, which under fairly general assumptions are unaffected by the subsequent fate of the selected allele. We also apply the theory to a simple model to investigate the impact of recurrent partial sweeps on levels of neutral diversity, and find that for a given reduction in diversity, the impact of recurrent partial sweeps on the frequency spectrum at neutral sites is determined primarily by the frequencies achieved by the selected alleles. Consequently, recurrent sweeps of selected alleles to low frequencies can have a profound effect on levels of diversity but can leave the frequency spectrum relatively unperturbed. In fact, the limiting coalescent model under a high rate of sweeps to low frequency is identical to the standard neutral model. The general model of selective sweeps we describe goes some way towards providing a more flexible framework to describe genomic patterns of diversity than is currently available."
neutral-networks  evolutionary-dynamics  fitness-landscapes  diversity  theoretical-biology  evolution
june 2012 by Vaguery
[1006.2908] Critical properties of complex fitness landscapes
"Evolutionary adaptation is the process that increases the fit of a population to the fitness landscape it inhabits. As a consequence, evolutionary dynamics is shaped, constrained, and channeled, by that fitness landscape. Much work has been expended to understand the evolutionary dynamics of adapting populations, but much less is known about the structure of the landscapes. Here, we study the global and local structure of complex fitness landscapes of interacting loci that describe protein folds or sets of interacting genes forming pathways or modules. We find that in these landscapes, high peaks are more likely to be found near other high peaks, corroborating Kauffman's "Massif Central" hypothesis. We study the clusters of peaks as a function of the ruggedness of the landscape and find that this clustering allows peaks to form interconnected networks.…"
NK-landscapes  fitness-landscapes  Stuart-Kauffman  thesis  complexology
june 2010 by Vaguery

Copy this bookmark: