Stanford University - Authorea

https://www.stanford.edu

by author

by title

by keyword

Estimation and validation of LOF-intolerant variants in essential transcription facto...

Nasa Sinnott-Armstrong

August 14, 2017

The human transcription factor repertoire is a complex collection of paralogous proteins that represent the major portion of lineage commitment and differentiation. Estimates of mitochondrial DNA content in wastewater are on the order of 105 copies per mL, substantially higher than other accessible sources (~104 copies per 100mL in river water, for instance), and thus wastewater will be used for this initial study \cite{Caldwell_2009,20208426}. As the copy number in direct fecal samples is 109 per gram, and wastewater is approximately 99.5% water; thus in 1L of wastewater we expect 5g of feces, which corresponds to ~5e9 mtDNA and 5e6 nuclear DNA copies per liter. This is comparable to the estimates obtained from direct wastewater samples and will be used going forward.Aim 1: A sensitive and specific method for isolation of human DNA sequences from wastewater.Goal: derive accurate, rare allele frequency estimates.Simulation of allele frequenciesShow off pretty figures.DNA purificationGet some gallons of wastewater. Isopropanol extraction, followed by ethanol precipitation, maybe followed by beads?Library preparation and sequencing#figureComparison to ExACSo much better, so much cheaper, beat that.Aim 2: Population modeling of functional diversification in the homeobox transcription factor HOXC6.Prime & go home. Basically we have a class case of subfunctionalization going onAim 3: Downstream functional characterization of constrained HOXC6 variants with a high throughput sequencing assay.Clone library of plasmids using the essential gene protocol or technique from AB2.2 cells. Sequence that shit.ConclusionsThere is also the possibility of detecting disease in samples \cite{Putaporntip_2011}.

Mining Theorems in Compressed Sensing for NMR Gold

Jeffrey Hoch

and 1 more

May 15, 2017

Abstract Theoretical results from the field of compressed sensing (CS), principally theorems specifying the minimum number of samples need to reconstruct a sparse spectrum, led to an explosion of activity in NMR on efficient sampling strategies and spectral reconstruction methods. While these results firmly established the notion that sparse multidimensional NMR spectra can be accurately recovered using far fewer samples and with higher resolution than is possible using conventional uniform sampling, quantitative agreement between the theorems and empirical observations remains elusive. Contributing to this discordance, NMR spectra do not satisfy the strict definitions of sparseness assumed by CS, and practical algorithms may not achieve the limitning.limit behav.Inbehaviorprescribed by the .theorems. addition, Monajemi and Donoho \cite{MoDo17} recently showed that certain specifics of NMR experiment design require modification of CS theories. For example, when uniform sampling is conducted in a dimension orthogonal to dimensions that are sampled non-uniformly, an excess coherence is introduced that increases the minimum number of samples required to recover the spectrum. Similarly, the simple quantification of coherence used in CS requires modification when phase subdimensions are sampled nonuniformly (as in random phase detection or partial component sampling \cite{21949370,23246651}). Beyond the minimum bounds on the number of samples needed to recover a sparse multidimensional NMR spectrum, the notion of phase transitions in CS, the sharp transition from successful recovery to failure, holds important implications for the design of NMR experiments, including the impact of higher magnetic fields. Here we consider the broader implications of CS theorems for nonuniform sampling in multidimensional NMR via a dialog between mathematicians and NMR spectroscopists. Promising avenues for further investigation emerge from the dialog. IntroductionNMR spectroscopists have long intuited, with ample empirical evidence, that sparse multidimensional NMR spectra can be faithfully reconstructed from far fewer measurements than are required in the Jeener paradigm of parametric uniform sampling of indirect time dimensions. The field of compressed sensing (CS) didn’t so much emerge as coalesce beginning in 2006, around rigorous theorems placing bounds on the minimum number of samples needed to reconstruct a spectrum. CS theorems have continued to evolve, decreasing the lower bounds below the initial rather conservative limits. Beyond the quantitative lower bounds provided by CS theorems, the concepts of a phase diagram linking sampling coverage and spectral sparsity and the phase transition separating the regimes corresponding to successful and unsuccessful recovery have important implications for experiment design in multidimensional NMR, including the effect of higher magnetic fields. In addition to discussing these implications, we consider some recent results from CS that Reconstruction guaranteesResearchers in the field of compresses sensing have carefully studied the number of required samples for successful reconstruction of unknown signals from undersampled data. The most well-known theoretical arguments include coherence\cite{Donoho_2006,Tropp_2004,Monajemi2016ACHA} , restricted isometry property (RIP) \cite{CandesStable} and phase transition \cite{DoTa10,MoDo17,DoTa05} theories. Coherence and RIP arguments provide sufficient conditions for a successful reconstruction, which often lead to pessimistic lower bounds on the number required samples. Phase transition theories, on the other hand, measure the exact probability of successful reconstruction and lead to accurate theoretical lower bounds that match the empirical results exactly \cite{Monajemi_2012}. New results for multidimensional NMR Traditional CS theories are not directly applicable for reconstruction of NMR spectra due to anisotropy of sampling and phase-sensitive nature of the experiments. Accommodating these additional considerations requires a new new notion of coherence and further theoretical investigation of phase transition phenomena, which has been recently examined in detail [cite Hatef's theiss]Hatef']Stanford. thesis]. Here, we discuss their implications for the field of NMR spectroscopy. Coherence guarantee

This is an example paper

Tim Howes

January 06, 2017

Welcome to Authorea!

Tim Howes

December 01, 2016

Hey, welcome. Double click anywhere on the text to start writing. In addition to simple text you can also add text formatted in BOLDFACE, _italic_, and yes, math too: E = mc²! Add images by drag’n’drop or click on the “Insert Figure” button.

Implementing Socially Aware LSTMs for effective Crowd Navigation

Karthik Raju

and 1 more

November 21, 2016

AbstractIn this paper, we hope to suggest and experiment with novel ways to predict the movement of individuals in a shared space. We use an LSTM as the primary framework behind the prediction model and test alternative ways to modify the predictions, namely a Social Pooling layer and a Dynamic Bayesian Network.IntroductionIn recent years, autonomous navigation has become of increasing prominence. Companies including Uber, Google, and Tesla have worked on and tested their state-of-the-art self-driving cars. This space has vigorously changed over the last decade. One important constituent of the 'self-driving' problem is motion planning, breaking down a desired action into discrete motions that satisfy movement constraints and optimize movement to avoid collisions. There are other opportunities for self-navigating vehicles in non-road settings, including social robots that can navigate crowded spaces. Applicable examples are driving through malls, walking dogs through populated parks, or helping blind pedestrians navigate around others. In such spaces with a non-trivial density of human activity, it's crucial for robots to navigate crowds organically. Humans have multiple unconscious rules when interacting with others' trajectories. They adopt numerous common sense rules and comply with social convention. For example, they plan where to move next by considering their immediate neighbors' personal spaces as well as understanding who has the right-of-way. So, an autonomous robots should also adopt a similar model of movement that fluidly circumvents humans by predicting their trajectories and roughly complying with the same common sense rules as those they share space with. This problem is a good application of recurrent neural nets. The solution will be that of a sequence generation problem.Related Works

Stochastic Deconvolution

Mihir Mongia

and 2 more

February 17, 2016

INTRODUCTION In the deep learning literature there have been many methods to produce images that correspond to certain classes or specific neurons in the CNN[Zeiler]. There are two main methods in the literature. Deconvolution methods rely on an input image and highlight pixels in an image that activate a neuron of interest. Deconvolution requires the presence of an image. There are other methods that try maximize the class scores or activations of neurons with respect to pixel intensities[Simonyan]. However these methods only work for lower level features or more shallow neurons. At higher layers, the neurons represent more abstract concepts such as a dog. Thus an optimal image may have 10 dogs all over the image in different orientations and also have tails and dog ears that are actually not part of a dog. We propose several potential methods that do not rely on an input image that can also create realistic abstract concepts that correspond to certain neurons ”deep” in the CNN. The key reason abstract concepts such as “dog” can not be generated using the above method is that that there are multiple features in multiple locations that may fire the “dog” neuron. In real life however dog images do not occur with dogs all over the sky and big gigantic ears that exist by themselves without an attached body. Since intuitively, shallower neurons correspond to smaller features and the higher level neurons correspond to combinations of shallower features, a natural approach to fix the issue of generating unrealistic images would be to gather statistics of joint distributions of shallower features. We could use these statistics in a variety of ways. We could for example, use the optimization method mentioned in class and then look at the activations that the optimization method generates. If the activations of the shallow features seem to be an outlier of the joint distribution , we can decide that we need to reduce the activations of certain neurons. Once those neurons have been decided, we can back propagate from any one of those unneeded neurons, and instead take gradients steps to decrease the activation rather than increase it. This could be seen as a method combining both Deconv and and the method introduced by Simonyan. One could also conceptually have joint distributions of layer k and layer k+1 for all k less than the number of layers. Now suppose we want to generate the abstract concept that a neuron N represents. Initially, we could find which activations of neurons in the previous layer are associated with N firing. This most likely follows some distribution. Thus we can sample the activations from the joint distribution where we fix the activation of N. Now we can use this same method over and over again and proceed back into the image where each time we fix in the joint distribution the activations of layer k+1, and sample the marginal for layer k. As one can see many potential ideas seem plausible with the extra information of statistics generated from many images going through the convnet. We aim to try a few methods, improve our understanding, and then iterate to think of improved methods that might generate better images. PROBLEM STATEMENT In our problem we will aim to use a pretrained CNN to generate random images corresponding to abstract concepts. We will use the pretrained VGGNET model with 16 layers from Oxford University. We will pass many images corresponding to a specific class (that we will get from ImageNet) to capture statistics of activations for neurons. We then will use our methods to generate random images corresponding to abstract concepts. We expect to be able to generate more realistic images than images generated by Simonyan and we can test this by simply comparing our generated images to created by Simonyan.

Proposal cs221 project: Sheepherding

Sven Schmit

October 17, 2014

SHEEPHERDING Sheephearding is one of the oldest trades, where a herder (with help of one or multiple dogs) tries to bring sheep together at some target location. In this project, we define a simplified model of sheepherding where dogs try to move sheep to a target. We then use simulations and apply reinforcement learning to train an AI for the dogs that is capable of performing several tasks related to sheepherding. In particular, we are interested in cooperative behavior: can dogs cooperate efficiently without communication, or does a central authority that commands to the dogs much better at herding? Literature First, we give a brief overview of some related literature. In , a method is proposed to gather ducklings using a robot in a simple circular area, but their approach is not based on any learning. does discuss methods for herging sheep from an AI perspective, where they use a hierarchical and stack-based finite state machine. proposes to use genetic algorithms to fing decision rules to model robots behavior. In a case study, one robot tries to guide another robot to a target location. An interesting perspective is given in , where so-called specialists are discussed. Sometimes it is more useful for agents to specify in specific tasks (such as in playing soccer), while for other tasks this is not useful. Finally, discusses an approach where agents start by acting individually, but learn how and when to cooperate if needed. This, they argue, leads to a small state space because only necessaray interaction is taken into account.

Strong Lens Time Delay Challenge: I. Experimental Design

Greg Dobler

and 7 more

May 28, 2013

ABSTRACT: The time delays between point-like images in gravitational lens systems can be used to measure cosmological parameters as well as probe the dark matter (sub-)structure within the lens galaxy. The number of lenses with measuring time delays is growing rapidly due to dedicated efforts. In the near future, the upcoming _Large Synoptic Survey Telescope_ (LSST), will monitor ∼10³ lens systems consisting of a foreground elliptical galaxy producing multiple images of a background quasar. In an effort to assess the present capabilities of the community to accurately measure the time delays in strong gravitational lens systems, and to provide input to dedicated monitoring campaigns and future LSST cosmology feasibility studies, we pose a “Time Delay Challenge” (TDC). The challenge is organized as a set of “ladders,” each containing a group of simulated datasets to be analyzed blindly by participating independent analysis teams. Each rung on a ladder consists of a set of realistic mock observed lensed quasar light curves, with the rungs’ datasets increasing in complexity and realism to incorporate a variety of anticipated physical and experimental effects. The initial challenge described here has two ladders, TDC0 and TDC1. TDC0 has a small number of datasets, and is designed to be used as a practice set by the participating teams as they set up their analysis pipelines. The non mondatory deadline for completion of TDC0 will be December 1 2013. The teams that perform sufficiently well on TDC0 will then be able to participate in the much more demanding TDC1. TDC1 will consists of 10³ lightcurves, a sample designed to provide the statistical power to make meaningful statements about the sub-percent accuracy that will be required to provide competitive Dark Energy constraints in the LSST era. In this paper we describe the simulated datasets in general terms, lay out the structure of the challenge and define a minimal set of metrics that will be used to quantify the goodness-of-fit, efficiency, precision, and accuracy of the algorithms. The results for TDC1 from the participating teams will be presented in a companion paper to be submitted after the closing of TDC1, with all TDC1 participants as co-authors.