Public Articles
Library of Words
This blog post describes the rationale and motivation behind the Library of Words, a digital collection of pages filled with every possible combination of 320 words.
Concinnitas: The Art of the Equation
and 3 collaborators
San Francisco, CA – On view at Crown Point Press is an exhibition of etchings by scientists and mathematicians, September 4 - October 27, 2015.
We came across this set of beautiful etchings on Artsy depicting mathematical equations. We decided to reproduce them on Authorea, using our equation editor and some LaTeX. Here’s the result.
Authorea Raises a new round of funding to Advance Open, Reproducible, Data-Driven Research
Authorea is rapidly growing in fields outside of the hard sciences, such as genomics, environmental science, and computational biology. For example, in June 2015, a dedicated global team of epidemiology researchers began an ambitious project to track the Ebola virus using large-scale genome sequencing. Their groundbreaking research, written on Authorea, was published in the journal Cell and covered by the New York Times. The Authorea version of their article is the only place where readers can peruse the history, workflows, and research data connected with the study. Authorea is poised to shake up the stale academic publishing industry via an online platform that encourages data sharing, and a more open and transparent dissemination of research results complete with all the data sources necessary to reproduce them. Authorea plans to use the proceeds of this funding to encourage more open, data-driven research of this kind.
Introducing real time chat for user support
and 1 collaborator
Authorea goes to Paris
and 1 collaborator
Great news! We're happy to announce that Authorea is one of the winners of the NYC-Paris Business Exchange competition. We'll be opening an Authorea office in Paris in March 2016. C'est génial!
Gravitational Waves and the Death of the PDF
and 1 collaborator
Einstein published in 1916 a paper containing the prediction of the existence of gravitational waves. It has just one author (A.E. himself) and consists of a few pages of text and equations \citep{1916SPAW.......688E}. Fast forward exactly 100 years, the LIGO collaboration announced in a paper that they observed what Einstein had predicted. The paper has more than 1000 co-authors and it condenses, in just a few pages of text, equations and figures, an enormous amount of technical information \citep{PhysRevLett.116.061102}.
MathML on the Web -- Please!
Today I merged a pull request for which introduced the following setup for equation editing, as an alpha feature for our RichText editor:
The “status quo” renderer, displaying the mathematics on all “read-mode” article components.
A new renderer, specifically loaded in the iframe of our editor widget. Why? Because loading MathJax twice is too slow for our show, but we still want our displayed richtext equations to be, well, rich.
An additional math renderer, part of our equation-specific editing widget, so that authors can also input formulas in an appealing richtext flow.1 See the great demos by for examples.
You read that correctly - not one, not two, but three separate math renderers on the same HTML page, each of which different due to balancing on the trade-offs of performance, coverage and visualization.
I hear you cry:
– Well, this is clearly horrible design, simplify and streamline it!
Indeed! My thoughts exactly. But the great solution, the one that solves this problem not only for me, but for the entire math-on-the-web developer ecosystem, is not for me or my team to implement.
This renderer medley can be traced to a single root cause - the absence of ubiquitous support in modern browsers. If you are not familiar with MathML, it is a W3C and ISO standard and a core part of HTML5. MathML does a great job of providing a single language for representing mathematics in structured documents, especially web pages. But while we have that great language, we lack major browser implementations – in fact only Firefox has great MathML support, and has long been the browser-lead in math support.
A different perspective tells us that we are just two browsers short of having the tide turn overwhelmingly towards native rendering. I am referring specifically to and . Having native support would allow us – the mortal developers interested in providing exciting and powerful math-enabled web applications – to sleep calmly at night and work proudly at day. And hence my sincere plea to all major browser vendors:
Please, do the math.
P.S. How is the native MathML solution better?
Best. Performance. Possible.
Your browser will be capable to render MathML the moment it loads, just as it can CSS. No extra load times needed.
The DOM will set you free
As math-on-the-web developers, we need to select into and manipulate mathematical objects, just as all web developers need to manipulate forms and input fields. I want my cool math interactivity widget to be an easy drop-in for any webpage, just the same way that a jQuery widget is. And we can’t have that without equations being a proper participant in the HTML DOM – CSS would have never taken off if say <div>
and <span>
elements only existed for sites that had first loaded a third-party css.js
library.
Out-of-the-box Accessibility
Exposing the MathML source of an equation directly in its web page2 will be the default state of any HTML5 web page. Math-to-speech and Braille adaptors can then simply use the raw HTML as-is.
P.P.S. If you are interested in showing your personal support for adding native MathML, add your vote and voice to the public issues:
Edge MathML support:
https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/6508572-mathml
Chrome MathML support:
https://code.google.com/p/chromium/issues/detail?id=152430
Personally, I have joined an effort to promote MathML publicly and to remind developers of its many strong suits and far-reaching benefits to the web develpment ecosystem. You can visit our MathML Association website, or follow us on Twitter at @mathml3.
Data Visualization: Intro to Infographics
and 1 collaborator
Mosquito Counter
and 3 collaborators
Mosquito-borne diseases such as dengue fever, malaria, and chikungunya has always been one of the leading causes of morbidity in the Philippines \cite{dohLeadingMorbidity}. In fact, the Global Dengue Initiative identifies dengue fever as one of the national notifiable diseases in any country \cite{whoGlobalStrategy}. More than that, the country’s Department of Health created priority control prevention programs for the aforementioned diseases \cite{dohDengueControl,dohMalariaControl}. Despite these measures, the country still continues to face outbreaks of these mosquito-borne diseases. In fact, in 2014, it was reported that there is a surge in mosquito population \cite{dohMosquitoElimination,dohChikungunyaOutbreaks,philstarDogProbing}. To prevent outbreaks of dengue, effective vector control measures should be in place.
A report from the Asia-Pacific and Americas Dengue Prevention Boards has identified that an initial step to combat dengue is to improve surveillance systems \cite{whoGlobalStrategy}. A particular aspect of such a system requires enhanced mosquito-vector surveillance. Despite these suggestions, only a handful of research endeavors are currently implementing such schemes properly integrated to their mosquito-borne diseases surveillance systems \cite{Beatty_2010}.
In the Philippines an initiative by the Department of Science and Technology focused on ovitraps supplemented with manual landing rate counts, these are still in their infancies and are rather insufficient \cite{dohIntegratedDiseaseSurveillance,noah,dostPredictAbundance}. While several literatures have established the importance of entomological surveillance in supplementing disease surveillance and response, only pupal and adult vector counts are considered reliable because of their high correlation with actual disease cases. Moreover, many studies have identified that ovicyte and larval indices offer little value with respect to surveillance because of the low survival rates of eggs and larvae \cite{focks2003review}.
Thus, this project proposes a cost-effective tool that is able to automatically collect, identify, and count adult mosquitoes. The automation feature of the tool allows data collection with minimal human intervention and is suitable even at remote areas where resources are limited. It provides a solution for generating reliable entomological indices which in turn, strengthens the disease surveillance system.
Mean Field Model of Infection
and 1 collaborator
N≡ number of host cells
NI≡ number of infected cells
NB≡ number of bacteria
NR≡ number of ruffles
Nr≡ number or ruffling cells (≥1 ruffles)
tmax≡ total incubation time
m≡ multiplicity of infection (MOI) $ = \frac{N_B(t=0)}{N}$
c≡ confluency $ = \frac{N a}{L^2}$
a≡ mean cellular area
L≡ side length of square well
x≡ fraction of host cells infected $ = \frac{N_I}{N}$
b≡ fraction of bacteria remaining (i.e. not landed on a host) $ = \frac{N_B}{N_B (0)}$
f≡ fraction of attached bacteria that form ruffles
r≡ fraction of host cells with ruffling (≥ 1 ruffle)
$\tilde{r} \equiv $ ruffles per cell $ = \frac{N_R}{N}$
$\tilde{b}_R \equiv $ bacteria per ruffle
$\quad \tilde{b}_R(t=0) = 1$
Γ0≡ primary attachment rate per bacterial density
Γ1≡ ruffle recruitment rate per bacterial density
About Bayes
Probabilities can be viewed as frequencies of outcomes of an event or..
Probabilities can be used to describe degrees of beliefs of outcomes.
Cox axioms of consistency map beliefs to probability spaces if they satisfy the following axioms:
Degrees of belief are transitive i.e. if B(x)≥B(y) and B(y)≥B(z) then B(x)≥B(z).
The degree of belief of x and its negation are related i.e. there exists a function f such that B(x)=f(B(x))
The degree of belief of a conjunction x and y is related to the degree of belief of the conditional proposition x|y and B(y) i.e. there exists a function g s.t. B(x)=g(B(x|y)B(y))
Where x is a proposition with a true/false outcome, B(x) is the degree of belief on that proposition, the negation of x is and the degree of belief of prop x given that y is true is B(x|y).
The bayesian view is subjective in the way that probabilities depend on assumptions and you can’t make inference without assumptions. In this way probabilities can be used to describe different assumptions and to make an inference on those.
Forward probability problems: a generative model, in which a process is described and a model is given to characterize how the data at hand was generated. For example, taking white and black balls from urns. The model gives an explicit definition of the data’s distribution or certain moment (such as expectation, variance, etc).
Inverse probability problems: also a generative models, but instead of computing the prob. Distr. Of the process assumed to produce the data, the conditional probability of one or more unobserved variables in the process, given the observed variables i.e. the data.
Prior probability: given to that belief before “evidence” is taken into account ie. the probability distribution of the parameters. It is the marginal probability of that proposition.
Likelihood function(of the parameters): P(x|θ) is the conditional probability of the data given the parameters but is always taken as a function of (the parameters). Observe that it is not a probability since it doesn’t “add up” to 1. But if we fix then P(x|θ) is indeed a probability. Don’t say the likelihood of the data!
Posterior probability: the probability of the params, given the data.
Hypothesis: we hypothesize the different alternatives to the parameter values.
Difference to classical view: in the classical view one hypothesizes over the model’s parameters and then tests that hypothesis (or a bunch of them) to test its plausability. Whereas in the bayesian view the different hypothesis are all being ‘marginalized over’.
Subjective priors: in general, we need to make assumptions about the probability priors of the parameters. The values of these are unknown (or just fixed as some data) and a model needs to be assigned to test the hypothesis. The same goes to the likelihoods, assigning a distribution to the parameters in subjective way will change our likelihood function.
The likelihood principle: given a generative model for data d, given parameters θ, the likelihood is defined as P(d|θ), and having observed a particular outcome d1 , all inferences and predictions should depend only on the function P(d1|θ) i.e. they depend only on the data at hand, on what actually happened.
Shannon information content of an outcome: let x be an event/outcome then h(x) is defined to be $= log _2(\frac{1}{ P(x)})$ Note that it is measured in bits and that less probable events carry more “information”. This number is a measure of the information content of a bit.
Entropy: defined as $H(X) = E[log _2(\frac{1}{ P(x)})]$ where X is a random variable and by convention $0*log(\frac{1}{0}) = 0$. It is clear from the definition that H(x)≥0 and is equal to 0 only if x is s.t. P(x)=1.
Entropy is maximized when P(x) Uniform and as such for any given X it goes that H(X)≤log(|A(x)|)
Entropy is additive for two independent random variables i.e. H(X, Y)=H(X)+H(Y)
Decomposability of entropy: if p1, p2, ..,pn
Boolean DeGroot Processes, Fixpoint Logics and Liquid Democracy
and 1 collaborator
The paper focuses on a specific class of opinion diffusion processes in which opinions are binary, and agents are influenced by exactly one influencer, possibly themselves, of which they copy the opinion. This is an extremely simple model of opinion diffusion on networks, and it is of interest for two reasons. First, it corresponds to a class of processes which lies at the interface of two classes of diffusion processes that have remained so far unrelated: the stochastic opinion diffusion model known as DeGroot’s \cite{Degroot_1974}, and the more recent propositional opinion diffusion model due to \cite{Grandi:2015:POD:2772879.2773278}. The processes we study—called here Boolean DeGroot processes—are the $\set{0,1}$ limit case of the DeGroot stochastic processes and, at the same time, the limit case of propositional opinion diffusion processes where each agent has access to the opinion of exactly one neighbor (cf. Figure [figure:intersection]). Second, it provides an abstract model with which to analyze some aspects the popular, and currently much discussed, aggregation system called liquid democracy \cite{liquid_feedback}. We will see that Boolean DeGroot processes offer a novel and natural angle on the issue of delegation cycles in liquid democracy.
The paper studies the convergence of Boolean DeGroot processes, characterizing them with necessary and sufficient conditions. In doing so the paper uses standard graph-theoretic tools as well as techniques from modal fixpoint logics, thereby establishing a fruitful interface between such logics and qualitative models of opinion diffusion. The results we obtain on the characterization of convergence are then applied to provide novel insights into liquid democracy, which remains a rather underexplored system in the social-choice literature.
Section [sec:preliminaries] introduces the paper’s notation and the key definition of Boolean DeGroot process. Section [sec:convergence] studies necessary and sufficient conditions for those processes to converge. Section [sec:logic] shows how off-the-shelf fixpoint logics (specifically the modal μ-calculus) can be used to specify the properties of such processes formally. Section [sec:coloring] elaborates further on the link of our work with the propositional opinion diffusion model. It studies convergence conditions for a simple generalization of Boolean DeGroot processes, where several influencers are allowed and opinions change under unanimity of the influencers. Section [sec:liquid] shows how Boolean DeGroot processes relate to liquid democracy, contributing some novel insights into the understanding of delegation cycles. Section [sec:conclusions] concludes the paper and sketches some on-going lines of research.
Preparing contributions to CERN reports (school, workshop, and conference proceedings)
Low-Noise Frequency Translation of Single Photons via Four Wave Mixing Bragg Scattering
We present a single-photon frequency translation setup based on Four Wave Mixing Bragg-Scattering in fiber, able to achieve simultaneously close to unitary conversion while maintaining very low-noise.
A survey of a number of galaxies
and 3 collaborators
Objects Nearby: the number of objects returned by a search on http://ned.ipac.caltech.edu, within +/- 750 kpc and +/- 500 km/s.
Research Center as Distant Publisher: Developing Non-Consumptive Compliant Open Data Worksets to Support New Modes of Inquiry
The HathiTrust Research Center (HTRC), founded in 2010, is managed by Indiana University Bloomington and the University of Illinois at Urbana-Champaign under an agreement with the HathiTrust Board of Governors and the University of Michigan. The HTRC mission supports new knowledge creation through novel computational uses of the Hathitrust Digital Library (HTDL). Through the introduction of the concept of distant publishing, this short paper will discuss ideas for data and software publication that support the HTRC non-consumptive research methodologies and offer scholars new methods for research inquiry.
Модель геометрической структуры синсета
and 1 collaborator
Аннотация
В статье поставлен вопрос формализации понятия синонимии. На основе векторного представления слов в работе предлагается геометрический подход для математического моделирования наборов синонимов (синсетов). Определен такой вычислимый атрибут синсетов, как внутренность синсета (IntS). Введены понятия ранг и центральность слов в синсете, позволяющие определить более значимые, “центральные” слова в синсете. Для ранга и центральности дана математическая формулировка и предложена процедура их вычисления. Для вычислений использованы нейронные модели (Skip-gram, CBOW), созданные программой Т. Миколова word2vec. На примере синсетов Русского Викисловаря построены IntS по нейронным моделям корпусов проекта RusVectores. Результаты, полученные по двум корпусам (Национальный корпус русского языка и новостной корпус), в значительной степени совпадают. Это говорит о некоторой универсальности предлагаемой математической модели.
Ключевые слова: синоним, синсет, нейронная сеть, корпусная лингвистика, word2vec, RusVectores, gensim, Русский Викисловарь
Keywords: synonym, synset, neural network, corpus linguistics, word2vec, RusVectores, gensim, Russian Wiktionary
Проверка устойчивости метода вычисления ошибки расстояния между двумя упорядоченными списками
and 1 collaborator
Академическое ранжирование — процесс построения рейтинга высших учебных заведений на основе учёта различных факторов. Ранжирование проводится университетами, журналами, правительством, независимыми экспертами. При большом количестве ранжируемых университетов количество национальных вузов, вошедших в число лучших вузов мира, становится важным показателем, характеризующим систему высшего образования \cite{Karpenko_2014}. В мире существует достаточно большое число рейтингов вузов. Рейтинги создаются для повышения конкуренции, как между отдельными вузами, так и между национальными системами высшего образования. При составлении каждого рейтинга исследовательская группа использует собственную методологию — за основу берутся различные критерии, их сочетания и методы сбора информации. Для существующих рейтингов такие термины как «качество образования», «уровень научных исследований», «академическая репутация» могут иметь различное значение. Международные рейтинги университетов задают стандарты современного университета, которым пытаются следовать многие вузы, и пытаются влиять на исследователей. Однако далеко не всеми исследователями университетские рейтинги оцениваются позитивно \cite{Shtyhno_2014}.
На сегодняшний день не существует «идеального» рейтинга, то есть такого рейтинга, который сможет охватить все существующие вузы, будет обладать прозрачной методикой и все будут довольны результатами ранжирования. Составители рейтингов преследуют определенные цели и ориентируются на целевую аудиторию при составлении рейтингов. Так в одном рейтинге отдельный вуз может занимать лидирующие места, а в другом занимать позицию далеко не в первом десятке. Не представляется возможным равняться сразу на всех. Ключевым фактором, влияющим на величину рейтинга, является наличие (или отсутствие) того или иного показателя. Поэтому при ранжировании любой перечень учитываемых показателей должен опираться на научную основу \cite{Azgaldov_2012}.
Основной целью исследования является построение нового рейтинга по данным из Википедии и сравнение нового рейтинга с существующими путем вычисления метрики «ошибка расстояния» (error distance). К наиболее известным моделям глобальных рейтингов относят \cite{Skalaban_2013}:
академический рейтинг университетов мира (ARWU, Academic Ranking of World Universities),
международный рейтинг университетов британского издания Times Higher Education (THE),
вебометричеcкий рейтинг испанской лаборатории Cybermetrics (Webometrics).
Целью работы является сравнение существующих глобальных университетских рейтингов путем вычисления «ошибки расстояния» и проверка устойчивости данного метода путем перестановках объектов (в данном случае вузов) внутри списка (рейтинга).
Low Power Wireless Sensor Networks - Market Overview
and 1 collaborator
Wireless Sensor Networks (WPNs) are crucial to development of the Internet Of Things, yet these pose various challenges in terms of multiplexing, power efficiency, range and transmission speed. This document delivers high-level comparison of Zigbee, 6LoWPAN, Bluetooth Low Energy, LoRa and Narrowband-IoT in listed areas.
An Exploration of the Statistical Signatures of Stellar Feedback
and 3 collaborators
All molecular clouds are observed to be turbulent, but the origin, means of sustenance, and evolution of the turbulence remain debated. One possibility is that stellar feedback injects enough energy into the cloud to drive observed motions on parsec scales. Recent numerical studies of molecular clouds have found that feedback from stars, such as protostellar outflows and winds, injects energy and impacts turbulence. We expand upon these studies by analyzing magnetohydrodynamic simulations of winds interacting with molecular clouds which vary the stellar mass-loss rates and magnetic field strength. We generate synthetic 12CO(1-0) maps assuming that the simulations are at the distance of the nearby Perseus molecular cloud. By comparing the outputs from different initial conditions and evolutionary times, we identify differences in the synthetic observations and characterize these using common astrostatistics. We quantify the different statistical responses using a variety of metrics proposed in the literature. We find that multiple astrostatistics, such as principle component analysis, velocity component spectrum, and dendrograms, are sensitive to changes in stellar mass-loss rates and/or magnetic field strength. This demonstrates that stellar feedback influences molecular cloud turbulence and can be identified and quantified observationally using such statistics.
Here Be Dragons: Characterization of ACS/WFC Scattered Light Anomalies
ACS/WFC images can suffer from a number of optical and scattered light anomalies. Most of the optical anomalies that effect ACS have been well characterized. Hardware, software, and optical anomalies are discussed in ISR 2008-01. This is not the case for the scattered light anomalies known as “dragon’s breath” and edge glow. Dragon’s breath is caused by reflections being scattered back to the detector. There is a knife-edged mask in front of the CCD that scatters light back to the detector when its back side is illuminated by reflections from the CCD surface. These phenomena were discovered in early testing of ACS and were mitigated by sharpening the knife edges and coating them black. However, when point sources fall on the edge of the mask, scattering still occurs (Hartig et. al.).
Authorea 编辑 LateX 的在线编辑器
and 1 collaborator