Public Articles

How might libraries serve 21st century information needs? Authorea's proposal.

and 3 collaborators

This document summarizes Authorea's submission entry to the Knight Foundation's open call for ideas focused on advancing libraries to better serve individuals and communities in the 21st century. Here is the call for proposals.

Open Science Meetup on April 1 with computational biologist Holly Bik

and 3 collaborators

We're excited to invite you to our THIRD NY Open Science Meetup hosted at our new location in the Rise Labs in the Flatiron District.

Our guest speaker is Holly Bik, an awesome Project Scientist at the Center for Genomics and Systems Biology at NYU. Her work uses environmental DNA sequencing to track changes and patterns in microbial communities, such as the impact of the Deepwater Horizon oil spill on marine microbes in the Gulf of Mexico. She is also heavily involved in software development projects, including Phinch, an open source data visualization framework for large genomics datasets.

Holly will talk about her experience as an interdisciplinary computational biologist and how she contributes to the Open Access movement (check out her ImpactStory profile).

Don't be an April fool; come geek out with Holly and Authorea in our new office space instead! Wine and light bites will be offered.

When: **Friday April 1 at 6pm **

Gravitational Waves and the Death of the PDF

and 1 collaborator

Einstein published in 1916 a paper containing the prediction of the existence of gravitational waves. It has just one author (A.E. himself) and consists of a few pages of text and equations \citep{1916SPAW.......688E}. Fast forward exactly 100 years, the LIGO collaboration announced in a paper that they observed what Einstein had predicted. The paper has more than 1000 co-authors and it condenses, in just a few pages of text, equations and figures, an enormous amount of technical information \citep{PhysRevLett.116.061102}.

Authorea User Spotlight - Casey Law

and 2 collaborators

I took physics as a senior in high school and found it thrilling. It excited me to find a subject that tries to tackle the most fundamental laws of the universe. When I realized I could study that full time in college, I didn't hesitate to declare my major.

My current research focuses on data intensive uses of radio interferometers. Interferometers have a rather peculiar way of seeing (Fourier transforms abound!) and there are a wide range of algorithms that can be applied to get at the underlying signal. I am tackling projects to perform large surveys, real-time data analysis, and high-speed imaging.

MathML on the Web -- Please!

Today I merged a pull request for which introduced the following setup for equation editing, as an alpha feature for our RichText editor:

The “status quo” renderer, displaying the mathematics on all “read-mode” article components.

A new renderer, specifically loaded in the iframe of our editor widget. Why? Because loading MathJax twice is too slow for our show, but we still want our displayed richtext equations to be, well, rich.

An additional math renderer, part of our equation-specific editing widget, so that authors can also

**input**formulas in an appealing richtext flow.^{1}See the great demos by for examples.

You read that correctly - not one, not two, but **three separate math renderers on the same HTML page**, each of which different due to balancing on the trade-offs of performance, coverage and visualization.

I hear you cry:

– Well, this is clearly horrible design, simplify and streamline it!

Indeed! My thoughts exactly. But the **great solution**, the one that solves this problem not only for me, but for the **entire math-on-the-web developer ecosystem**, is not for me or my team to implement.

This renderer medley can be traced to a single root cause - the absence of ubiquitous support in modern browsers. If you are not familiar with MathML, it is a W3C and ISO standard and a core part of HTML5. MathML does a great job of providing a single language for representing mathematics in structured documents, especially web pages. But while we have that great language, we lack major browser implementations – in fact only Firefox has **great** MathML support, and has long been the browser-lead in math support.

A different perspective tells us that we are just two browsers short of having the tide turn overwhelmingly towards native rendering. I am referring specifically to and . Having native support would allow us – the mortal developers interested in providing exciting and powerful math-enabled web applications – to sleep calmly at night and work proudly at day. And hence my sincere plea to all major browser vendors:

**Please, do the math.**

P.S. How is the native MathML solution better?

**Best. Performance. Possible.**Your browser will be capable to render MathML the moment it loads, just as it can CSS. No extra load times needed.

**The DOM will set you free**As math-on-the-web developers, we need to select into and manipulate mathematical objects, just as all web developers need to manipulate forms and input fields. I want my cool math interactivity widget to be an easy drop-in for any webpage, just the same way that a jQuery widget is. And we can’t have that without equations being a proper participant in the HTML DOM – CSS would have never taken off if say

`<div>`

and`<span>`

elements only existed for sites that had first loaded a third-party`css.js`

library.**Out-of-the-box Accessibility**Exposing the MathML source of an equation directly in its web page

^{2}will be the default state of any HTML5 web page. Math-to-speech and Braille adaptors can then simply use the raw HTML as-is.

P.P.S. If you are interested in showing your personal support for adding native MathML, add your vote and voice to the public issues:

Edge MathML support:

https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/6508572-mathmlChrome MathML support:

https://code.google.com/p/chromium/issues/detail?id=152430

Personally, I have joined an effort to promote MathML publicly and to remind developers of its many strong suits and far-reaching benefits to the web develpment ecosystem. You can visit our MathML Association website, or follow us on Twitter at @mathml3.

Data Visualization: Intro to Infographics

and 1 collaborator

Today,
science & R&D social media channels have become just as cluttered as consumer
social media channels. For academic researchers, trying to get the word out on
your research paper has come to parallel digital and online marketing. It’s as
if communicating research main points effectively wasn’t hard enough. Now, even
trying to stay afloat on Twitter—much less going viral—is a challenge.

This is where data
visualizations come in to play. Visualized data, such as
charts, infographics, and interactive figures can represent extensive amounts
of complicated data more coherently. It's significantly
faster to analyze information in graphical format (versus in spreadsheets).
Consequently, scientists, government bodies, and businesses are able to
spot correlations, patterns, trends, outliers, etc. with greater ease.

Data
visualization also makes communication possible, effective, and interesting.** **Getting
over the subject-specific learning curve (e.g. jargon) often makes sharing
findings to the general public hard--even with other researchers!
Using visually impactful representations of data gets the message across
quickly, engages new audiences, **encourages sharing and visibility**,
and opens the floor to new research opportunities. Click
here to read about How the Scientific Community Reacts to Newly Submitted
Preprints.

According
to Buffer, **content with visuals get 94% more total views** and **visual content is more than 40X more likely to get shared on
social media than other types of content**. In fact, **infographics are
liked and shared on social media 3X more than other any other type of content**.
(MassPlanner) So here are a few common types of data visualizations
to help the writer to explain and reader to explore large quantities of data.

Mosquito Counter

and 3 collaborators

Mosquito-borne diseases such as dengue fever, malaria, and chikungunya has always been one of the leading causes of morbidity in the Philippines \cite{dohLeadingMorbidity}. In fact, the Global Dengue Initiative identifies dengue fever as one of the national notifiable diseases in any country \cite{whoGlobalStrategy}. More than that, the country’s Department of Health created priority control prevention programs for the aforementioned diseases \cite{dohDengueControl,dohMalariaControl}. Despite these measures, the country still continues to face outbreaks of these mosquito-borne diseases. In fact, in 2014, it was reported that there is a surge in mosquito population \cite{dohMosquitoElimination,dohChikungunyaOutbreaks,philstarDogProbing}. To prevent outbreaks of dengue, effective vector control measures should be in place.

A report from the Asia-Pacific and Americas Dengue Prevention Boards has identified that an initial step to combat dengue is to improve surveillance systems \cite{whoGlobalStrategy}. A particular aspect of such a system requires enhanced mosquito-vector surveillance. Despite these suggestions, only a handful of research endeavors are currently implementing such schemes properly integrated to their mosquito-borne diseases surveillance systems \cite{Beatty_2010}.

In the Philippines an initiative by the Department of Science and Technology focused on ovitraps supplemented with manual landing rate counts, these are still in their infancies and are rather insufficient \cite{dohIntegratedDiseaseSurveillance,noah,dostPredictAbundance}. While several literatures have established the importance of entomological surveillance in supplementing disease surveillance and response, only pupal and adult vector counts are considered reliable because of their high correlation with actual disease cases. Moreover, many studies have identified that ovicyte and larval indices offer little value with respect to surveillance because of the low survival rates of eggs and larvae \cite{focks2003review}.

Thus, this project proposes a cost-effective tool that is able to automatically collect, identify, and count adult mosquitoes. The automation feature of the tool allows data collection with minimal human intervention and is suitable even at remote areas where resources are limited. It provides a solution for generating reliable entomological indices which in turn, strengthens the disease surveillance system.

Mean Field Model of Infection

and 1 collaborator

*N*≡ number of host cells

*N*_{I}≡ number of infected cells

*N*_{B}≡ number of bacteria

*N*_{R}≡ number of ruffles

*N*_{r}≡ number or ruffling cells (≥1 ruffles)

*t*_{max}≡ total incubation time

*m*≡ multiplicity of infection (MOI) $ = \frac{N_B(t=0)}{N}$

*c*≡ confluency $ = \frac{N a}{L^2}$

*a*≡ mean cellular area

*L*≡ side length of square well

*x*≡ fraction of host cells infected $ = \frac{N_I}{N}$

*b*≡ fraction of bacteria remaining (i.e. not landed on a host) $ = \frac{N_B}{N_B (0)}$

*f*≡ fraction of attached bacteria that form ruffles

*r*≡ fraction of host cells with ruffling (≥ 1 ruffle)

$\tilde{r} \equiv $ ruffles per cell $ = \frac{N_R}{N}$

$\tilde{b}_R \equiv $ bacteria per ruffle

$\quad \tilde{b}_R(t=0) = 1$

*Γ*_{0}≡ primary attachment rate per bacterial density

*Γ*_{1}≡ ruffle recruitment rate per bacterial density

About Bayes

Probabilities can be viewed as frequencies of outcomes of an event or..

Probabilities can be used to describe degrees of beliefs of outcomes.

Cox axioms of consistency map beliefs to probability spaces if they satisfy the following axioms:

Degrees of belief are transitive i.e. if *B*(*x*)≥*B*(*y*) and *B*(*y*)≥*B*(*z*) then *B*(*x*)≥*B*(*z*).

The degree of belief of x and its negation are related i.e. there exists a function *f* such that *B*(*x*)=*f*(*B*(*x*))

The degree of belief of a conjunction x and y is related to the degree of belief of the conditional proposition *x*|*y* and *B*(*y*) i.e. there exists a function *g* s.t. *B*(*x*)=*g*(*B*(*x*|*y*)*B*(*y*))

Where x is a proposition with a true/false outcome, *B*(*x*) is the degree of belief on that proposition, the negation of x is and the degree of belief of prop x given that y is true is *B*(*x*|*y*).

The bayesian view is subjective in the way that probabilities depend on assumptions and you can’t make inference without assumptions. In this way probabilities can be used to describe different assumptions and to make an inference on those.

**Forward probability problems**: a generative model, in which a process is described and a model is given to characterize how the data at hand was generated. For example, taking white and black balls from urns. The model gives an explicit definition of the data’s distribution or certain moment (such as expectation, variance, etc).

**Inverse probability problems**: also a generative models, but instead of computing the prob. Distr. Of the process assumed to produce the data, the conditional probability of one or more unobserved variables in the process, given the observed variables i.e. the data.

**Prior probability**: given to that belief before “evidence” is taken into account ie. the probability distribution of the parameters. It is the marginal probability of that proposition.

**Likelihood function(of the parameters)**: *P*(*x*|*θ*) is the conditional probability of the data given the parameters but is always taken as a function of (the parameters). Observe that it is not a probability since it doesn’t “add up” to 1. But if we fix then *P*(*x*|*θ*) is indeed a probability. Don’t say the likelihood of the data!

**Posterior probability**: the probability of the params, given the data.

**Hypothesis**: we hypothesize the different alternatives to the parameter values.

**Difference to classical view**: in the classical view one hypothesizes over the model’s parameters and then tests that hypothesis (or a bunch of them) to test its plausability. Whereas in the bayesian view the different hypothesis are all being ‘marginalized over’.

**Subjective priors**: in general, we need to make assumptions about the probability priors of the parameters. The values of these are unknown (or just fixed as some data) and a model needs to be assigned to test the hypothesis. The same goes to the likelihoods, assigning a distribution to the parameters in subjective way will change our likelihood function.

**The likelihood principle**: given a generative model for data *d*, given parameters *θ*, the likelihood is defined as *P*(*d*|*θ*), and having observed a particular outcome *d*_{1} , all inferences and predictions should depend only on the function *P*(*d*_{1}|*θ*) i.e. they depend only on the data at hand, on what actually happened.

**Shannon information content of an outcome**: let x be an event/outcome then *h*(*x*) is defined to be $= log _2(\frac{1}{ P(x)})$ Note that it is measured in bits and that less probable events carry more “information”. This number is a measure of the information content of a bit.

**Entropy**: defined as $H(X) = E[log _2(\frac{1}{ P(x)})]$ where *X* is a random variable and by convention $0*log(\frac{1}{0}) = 0$. It is clear from the definition that *H*(*x*)≥0 and is equal to 0 only if x is s.t. *P*(*x*)=1.

Entropy is maximized when *P*(*x*) *U**n**i**f**o**r**m* and as such for any given *X* it goes that *H*(*X*)≤*l**o**g*(|*A*(*x*)|)

Entropy is additive for two independent random variables i.e. *H*(*X*, *Y*)=*H*(*X*)+*H*(*Y*)

**Decomposability of entropy**: if *p*1, *p*2, ..,*p**n*

Boolean DeGroot Processes, Fixpoint Logics and Liquid Democracy

and 1 collaborator

The paper focuses on a specific class of opinion diffusion processes in which opinions are binary, and agents are influenced by exactly one influencer, possibly themselves, of which they copy the opinion. This is an extremely simple model of opinion diffusion on networks, and it is of interest for two reasons. First, it corresponds to a class of processes which lies at the interface of two classes of diffusion processes that have remained so far unrelated: the stochastic opinion diffusion model known as DeGroot’s \cite{Degroot_1974}, and the more recent propositional opinion diffusion model due to \cite{Grandi:2015:POD:2772879.2773278}. The processes we study—called here Boolean DeGroot processes—are the $\set{0,1}$ limit case of the DeGroot stochastic processes and, at the same time, the limit case of propositional opinion diffusion processes where each agent has access to the opinion of exactly one neighbor (cf. Figure [figure:intersection]). Second, it provides an abstract model with which to analyze some aspects the popular, and currently much discussed, aggregation system called liquid democracy \cite{liquid_feedback}. We will see that Boolean DeGroot processes offer a novel and natural angle on the issue of delegation cycles in liquid democracy.

The paper studies the convergence of Boolean DeGroot processes, characterizing them with necessary and sufficient conditions. In doing so the paper uses standard graph-theoretic tools as well as techniques from modal fixpoint logics, thereby establishing a fruitful interface between such logics and qualitative models of opinion diffusion. The results we obtain on the characterization of convergence are then applied to provide novel insights into liquid democracy, which remains a rather underexplored system in the social-choice literature.

Section [sec:preliminaries] introduces the paper’s notation and the key definition of Boolean DeGroot process. Section [sec:convergence] studies necessary and sufficient conditions for those processes to converge. Section [sec:logic] shows how off-the-shelf fixpoint logics (specifically the modal *μ*-calculus) can be used to specify the properties of such processes formally. Section [sec:coloring] elaborates further on the link of our work with the propositional opinion diffusion model. It studies convergence conditions for a simple generalization of Boolean DeGroot processes, where several influencers are allowed and opinions change under unanimity of the influencers. Section [sec:liquid] shows how Boolean DeGroot processes relate to liquid democracy, contributing some novel insights into the understanding of delegation cycles. Section [sec:conclusions] concludes the paper and sketches some on-going lines of research.

Preparing contributions to CERN reports (school, workshop, and conference proceedings)

This document explains instructions for authoring CERN Reports which are processed by the CERN E-Publishing Service.

Low-Noise Frequency Translation of Single Photons via Four Wave Mixing Bragg Scattering

We present a single-photon frequency translation setup based on Four Wave Mixing Bragg-Scattering in fiber, able to achieve simultaneously close to unitary conversion while maintaining very low-noise.

A survey of a number of galaxies

and 3 collaborators

*Objects Nearby:* the number of objects returned by a search on http://ned.ipac.caltech.edu, within +/- 750 kpc and +/- 500 km/s.

Research Center as Distant Publisher: Developing Non-Consumptive Compliant Open Data Worksets to Support New Modes of Inquiry

The HathiTrust Research Center (HTRC), founded in 2010, is managed by Indiana University Bloomington and the University of Illinois at Urbana-Champaign under an agreement with the HathiTrust Board of Governors and the University of Michigan. The HTRC mission supports new knowledge creation through novel computational uses of the Hathitrust Digital Library (HTDL). Through the introduction of the concept of *distant publishing*, this short paper will discuss ideas for data and software publication that support the HTRC non-consumptive research methodologies and offer scholars new methods for research inquiry.

Модель геометрической структуры синсета

and 1 collaborator

**Аннотация**

В статье поставлен вопрос формализации понятия синонимии. На основе векторного представления слов в работе предлагается геометрический подход для математического моделирования наборов синонимов (синсетов). Определен такой вычислимый атрибут синсетов, как *внутренность синсета* (IntS). Введены понятия *ранг* и *центральность* слов в синсете, позволяющие определить более значимые, “центральные” слова в синсете. Для ранга и центральности дана математическая формулировка и предложена процедура их вычисления. Для вычислений использованы нейронные модели (Skip-gram, CBOW), созданные программой Т. Миколова word2vec. На примере синсетов Русского Викисловаря построены IntS по нейронным моделям корпусов проекта RusVectores. Результаты, полученные по двум корпусам (Национальный корпус русского языка и новостной корпус), в значительной степени совпадают. Это говорит о некоторой универсальности предлагаемой математической модели.

Ключевые слова: синоним, синсет, нейронная сеть, корпусная лингвистика, word2vec, RusVectores, gensim, Русский Викисловарь

Keywords: synonym, synset, neural network, corpus linguistics, word2vec, RusVectores, gensim, Russian Wiktionary

Проверка устойчивости метода вычисления ошибки расстояния между двумя упорядоченными списками

and 1 collaborator

Академическое ранжирование — процесс построения рейтинга высших учебных заведений на основе учёта различных факторов. Ранжирование проводится университетами, журналами, правительством, независимыми экспертами. При большом количестве ранжируемых университетов количество национальных вузов, вошедших в число лучших вузов мира, становится важным показателем, характеризующим систему высшего образования \cite{Karpenko_2014}. В мире существует достаточно большое число рейтингов вузов. Рейтинги создаются для повышения конкуренции, как между отдельными вузами, так и между национальными системами высшего образования. При составлении каждого рейтинга исследовательская группа использует собственную методологию — за основу берутся различные критерии, их сочетания и методы сбора информации. Для существующих рейтингов такие термины как «качество образования», «уровень научных исследований», «академическая репутация» могут иметь различное значение. Международные рейтинги университетов задают стандарты современного университета, которым пытаются следовать многие вузы, и пытаются влиять на исследователей. Однако далеко не всеми исследователями университетские рейтинги оцениваются позитивно \cite{Shtyhno_2014}.

На сегодняшний день не существует «идеального» рейтинга, то есть такого рейтинга, который сможет охватить все существующие вузы, будет обладать прозрачной методикой и все будут довольны результатами ранжирования. Составители рейтингов преследуют определенные цели и ориентируются на целевую аудиторию при составлении рейтингов. Так в одном рейтинге отдельный вуз может занимать лидирующие места, а в другом занимать позицию далеко не в первом десятке. Не представляется возможным равняться сразу на всех. Ключевым фактором, влияющим на величину рейтинга, является наличие (или отсутствие) того или иного показателя. Поэтому при ранжировании любой перечень учитываемых показателей должен опираться на научную основу \cite{Azgaldov_2012}.

Основной целью исследования является построение нового рейтинга по данным из Википедии и сравнение нового рейтинга с существующими путем вычисления метрики «ошибка расстояния» (error distance). К наиболее известным моделям глобальных рейтингов относят \cite{Skalaban_2013}:

академический рейтинг университетов мира (ARWU, Academic Ranking of World Universities),

международный рейтинг университетов британского издания Times Higher Education (THE),

вебометричеcкий рейтинг испанской лаборатории Cybermetrics (Webometrics).

Целью работы является сравнение существующих глобальных университетских рейтингов путем вычисления «ошибки расстояния» и проверка устойчивости данного метода путем перестановках объектов (в данном случае вузов) внутри списка (рейтинга).

Low Power Wireless Sensor Networks - Market Overview

and 1 collaborator

Wireless Sensor Networks (WPNs) are crucial to development of the Internet Of Things, yet these pose various challenges in terms of multiplexing, power efficiency, range and transmission speed. This document delivers high-level comparison of **Zigbee**, **6LoWPAN**, **Bluetooth Low Energy**, **LoRa** and **Narrowband-IoT** in listed areas.

An Exploration of the Statistical Signatures of Stellar Feedback

and 3 collaborators

All molecular clouds are observed to be turbulent, but the origin, means of sustenance, and evolution of the turbulence remain debated. One possibility is that stellar feedback injects enough energy into the cloud to drive observed motions on parsec scales. Recent numerical studies of molecular clouds have found that feedback from stars, such as protostellar outflows and winds, injects energy and impacts turbulence. We expand upon these studies by analyzing magnetohydrodynamic simulations of winds interacting with molecular clouds which vary the stellar mass-loss rates and magnetic field strength. We generate synthetic ^{12}CO(1-0) maps assuming that the simulations are at the distance of the nearby Perseus molecular cloud. By comparing the outputs from different initial conditions and evolutionary times, we identify differences in the synthetic observations and characterize these using common astrostatistics. We quantify the different statistical responses using a variety of metrics proposed in the literature. We find that multiple astrostatistics, such as principle component analysis, velocity component spectrum, and dendrograms, are sensitive to changes in stellar mass-loss rates and/or magnetic field strength. This demonstrates that stellar feedback influences molecular cloud turbulence and can be identified and quantified observationally using such statistics.

Here Be Dragons: Characterization of ACS/WFC Scattered Light Anomalies

ACS/WFC images can suffer from a number of optical and scattered light anomalies. Most of the optical anomalies that effect ACS have been well characterized. Hardware, software, and optical anomalies are discussed in ISR 2008-01. This is not the case for the scattered light anomalies known as “dragon’s breath” and edge glow. Dragon’s breath is caused by reflections being scattered back to the detector. There is a knife-edged mask in front of the CCD that scatters light back to the detector when its back side is illuminated by reflections from the CCD surface. These phenomena were discovered in early testing of ACS and were mitigated by sharpening the knife edges and coating them black. However, when point sources fall on the edge of the mask, scattering still occurs (Hartig et. al.).

Authorea 编辑 LateX 的在线编辑器

and 1 collaborator

如果您正在寻找基于网页的 LaTex 编辑器,Authorea 应该是您一个不错的选择。

Authorea 是一个强大的在线 LaTeX 编辑器,并拥有很多强大的功能,包括立即可

用的模板,方便合作的工具,文档更改历史记录,聊天功能,自动创建参考文献,

方便插入图像,链接,表格等。 Authorea 还支持多种文件格式,包括 LaTeX,

HTML,Markdown 等等。索引页面的创建,导出文件为 PDF,在社交媒体上分享,

支持的平台:基于网页的,支持所有主流浏览器。

快速编辑,打开/关闭评论,字数统计,这些都是 Authorea 能够提供的一些主要功

能。这也是为什么它能成为 LaTeX 最好的的编辑器之一,使你能方便地用你最喜

欢的浏览器做 LaTeX 编辑。Authorea 使得插入数学公式,图像和表格变得非常简

单。总而言之,这些功能使得 Authorea 是一款简单易用的 LaTeX 编辑工具,你不

需要花费很长时间就能学会使它。

主要功能:非常容易插入图像,数学公式,表格和其它对象,合作,引用文章,自

动创建参考文献,评论功能,可互动的图像,可插入 Ipython/Jupyter 文本。

支持的平台:基于网页的,支持所有主流浏览器。