Public Articles
Low-Noise Frequency Translation of Single Photons via Four Wave Mixing Bragg Scattering
We present a single-photon frequency translation setup based on Four Wave Mixing Bragg-Scattering in fiber, able to achieve simultaneously close to unitary conversion while maintaining very low-noise.
A survey of a number of galaxies
and 3 collaborators
Objects Nearby: the number of objects returned by a search on http://ned.ipac.caltech.edu, within +/- 750 kpc and +/- 500 km/s.
Research Center as Distant Publisher: Developing Non-Consumptive Compliant Open Data Worksets to Support New Modes of Inquiry
The HathiTrust Research Center (HTRC), founded in 2010, is managed by Indiana University Bloomington and the University of Illinois at Urbana-Champaign under an agreement with the HathiTrust Board of Governors and the University of Michigan. The HTRC mission supports new knowledge creation through novel computational uses of the Hathitrust Digital Library (HTDL). Through the introduction of the concept of distant publishing, this short paper will discuss ideas for data and software publication that support the HTRC non-consumptive research methodologies and offer scholars new methods for research inquiry.
Модель геометрической структуры синсета
and 1 collaborator
Аннотация
В статье поставлен вопрос формализации понятия синонимии. На основе векторного представления слов в работе предлагается геометрический подход для математического моделирования наборов синонимов (синсетов). Определен такой вычислимый атрибут синсетов, как внутренность синсета (IntS). Введены понятия ранг и центральность слов в синсете, позволяющие определить более значимые, “центральные” слова в синсете. Для ранга и центральности дана математическая формулировка и предложена процедура их вычисления. Для вычислений использованы нейронные модели (Skip-gram, CBOW), созданные программой Т. Миколова word2vec. На примере синсетов Русского Викисловаря построены IntS по нейронным моделям корпусов проекта RusVectores. Результаты, полученные по двум корпусам (Национальный корпус русского языка и новостной корпус), в значительной степени совпадают. Это говорит о некоторой универсальности предлагаемой математической модели.
Ключевые слова: синоним, синсет, нейронная сеть, корпусная лингвистика, word2vec, RusVectores, gensim, Русский Викисловарь
Keywords: synonym, synset, neural network, corpus linguistics, word2vec, RusVectores, gensim, Russian Wiktionary
Проверка устойчивости метода вычисления ошибки расстояния между двумя упорядоченными списками
and 1 collaborator
Академическое ранжирование — процесс построения рейтинга высших учебных заведений на основе учёта различных факторов. Ранжирование проводится университетами, журналами, правительством, независимыми экспертами. При большом количестве ранжируемых университетов количество национальных вузов, вошедших в число лучших вузов мира, становится важным показателем, характеризующим систему высшего образования \cite{Karpenko_2014}. В мире существует достаточно большое число рейтингов вузов. Рейтинги создаются для повышения конкуренции, как между отдельными вузами, так и между национальными системами высшего образования. При составлении каждого рейтинга исследовательская группа использует собственную методологию — за основу берутся различные критерии, их сочетания и методы сбора информации. Для существующих рейтингов такие термины как «качество образования», «уровень научных исследований», «академическая репутация» могут иметь различное значение. Международные рейтинги университетов задают стандарты современного университета, которым пытаются следовать многие вузы, и пытаются влиять на исследователей. Однако далеко не всеми исследователями университетские рейтинги оцениваются позитивно \cite{Shtyhno_2014}.
На сегодняшний день не существует «идеального» рейтинга, то есть такого рейтинга, который сможет охватить все существующие вузы, будет обладать прозрачной методикой и все будут довольны результатами ранжирования. Составители рейтингов преследуют определенные цели и ориентируются на целевую аудиторию при составлении рейтингов. Так в одном рейтинге отдельный вуз может занимать лидирующие места, а в другом занимать позицию далеко не в первом десятке. Не представляется возможным равняться сразу на всех. Ключевым фактором, влияющим на величину рейтинга, является наличие (или отсутствие) того или иного показателя. Поэтому при ранжировании любой перечень учитываемых показателей должен опираться на научную основу \cite{Azgaldov_2012}.
Основной целью исследования является построение нового рейтинга по данным из Википедии и сравнение нового рейтинга с существующими путем вычисления метрики «ошибка расстояния» (error distance). К наиболее известным моделям глобальных рейтингов относят \cite{Skalaban_2013}:
академический рейтинг университетов мира (ARWU, Academic Ranking of World Universities),
международный рейтинг университетов британского издания Times Higher Education (THE),
вебометричеcкий рейтинг испанской лаборатории Cybermetrics (Webometrics).
Целью работы является сравнение существующих глобальных университетских рейтингов путем вычисления «ошибки расстояния» и проверка устойчивости данного метода путем перестановках объектов (в данном случае вузов) внутри списка (рейтинга).
Low Power Wireless Sensor Networks - Market Overview
and 1 collaborator
Wireless Sensor Networks (WPNs) are crucial to development of the Internet Of Things, yet these pose various challenges in terms of multiplexing, power efficiency, range and transmission speed. This document delivers high-level comparison of Zigbee, 6LoWPAN, Bluetooth Low Energy, LoRa and Narrowband-IoT in listed areas.
An Exploration of the Statistical Signatures of Stellar Feedback
and 3 collaborators
All molecular clouds are observed to be turbulent, but the origin, means of sustenance, and evolution of the turbulence remain debated. One possibility is that stellar feedback injects enough energy into the cloud to drive observed motions on parsec scales. Recent numerical studies of molecular clouds have found that feedback from stars, such as protostellar outflows and winds, injects energy and impacts turbulence. We expand upon these studies by analyzing magnetohydrodynamic simulations of winds interacting with molecular clouds which vary the stellar mass-loss rates and magnetic field strength. We generate synthetic 12CO(1-0) maps assuming that the simulations are at the distance of the nearby Perseus molecular cloud. By comparing the outputs from different initial conditions and evolutionary times, we identify differences in the synthetic observations and characterize these using common astrostatistics. We quantify the different statistical responses using a variety of metrics proposed in the literature. We find that multiple astrostatistics, such as principle component analysis, velocity component spectrum, and dendrograms, are sensitive to changes in stellar mass-loss rates and/or magnetic field strength. This demonstrates that stellar feedback influences molecular cloud turbulence and can be identified and quantified observationally using such statistics.
Here Be Dragons: Characterization of ACS/WFC Scattered Light Anomalies
ACS/WFC images can suffer from a number of optical and scattered light anomalies. Most of the optical anomalies that effect ACS have been well characterized. Hardware, software, and optical anomalies are discussed in ISR 2008-01. This is not the case for the scattered light anomalies known as “dragon’s breath” and edge glow. Dragon’s breath is caused by reflections being scattered back to the detector. There is a knife-edged mask in front of the CCD that scatters light back to the detector when its back side is illuminated by reflections from the CCD surface. These phenomena were discovered in early testing of ACS and were mitigated by sharpening the knife edges and coating them black. However, when point sources fall on the edge of the mask, scattering still occurs (Hartig et. al.).
Authorea 编辑 LateX 的在线编辑器
and 1 collaborator
Projektskizze VARIED: Vorkommen, Erfahrungen und Einstellungen zum Freiwilligen Verzicht auf Nahrung und Flüssigkeit (FVNF) in der Schweiz
and 1 collaborator
Алгоритмы локации и маршрутизации. Алгоритм Калмана-Мельникова
Обозначим за xt величину, которую мы будем измерять, а потом фильтровать. Мы будем измерять координату падающего камня. Движение камня задано формулой: $$x_{t}=400\cdot t+4.9\cdot t^2$$ Выразим координату камня через ускорение и предыдущую позицию камня: $$x_{t+1}=400\cdot (t+1)+4.9\cdot (t+1)^2=400\cdot t+400+4.9\cdot t^2+9.8\cdot t+4.9=x_{t}+9.8\cdot t+404.9$$ Так как взяв два раза производную от xt получим, что ускорение равно 9.8, то координата каменя будет изменяться по закону: $$x_{t+1}=x_{t}+a\cdot t+404.9$$ где a = 0.2 Но в реальной жизни мы не можем учесть в наших расчетах маленькие возмущения, действующие на камень, такие как: ветер, сопротивление воздуха и т.п., поэтому настоящая координата бронепоезда будет отличаться от расчетной. К правой части написанного уравнения добавится случайная величина Et $$x_{t+1}=x_{t}+a\cdot t+404.9+E_{t}$$ Мы установили на земле под камнем дальномер. Дальномер будет измерять координату xt, но, к сожалению, он не может точно измерить ее и мерит с ошибкой Nt,которая тоже является случайной величиной: $$z_{t}=x_{t}+N_{t}$$ Задача состоит в том, чтобы, зная неверные показания дальномера zt, найти хорошее приближение для истинной координаты камня xt. Это приближение мы будем обозначать xtopt. Таким образом, уравнение для координаты и показания дальномера будут выглядеть следующим образом: $$ \begin{cases} x_{t+1}=x_{t}+a\cdot t+404.9+E_{t}\\ z_{t}=x_{t}+N_{t} \end{cases} $$
Идея Калмана состоит в том, чтобы получить наилучшее приближение к истинной координате xt + 1,мы должны выбрать золотую середину между показанием zt + 1 неточного сенсора и xtopt + a ⋅ t + 6.9 — нашим предсказанием того, что мы ожидали от него увидеть. Показанию сенсора мы дадим вес K, а на предсказанное значение останется вес (1 − K): $$x^{opt}_{t+1}=K_{t+1}\cdot z_{t+1}+(1-K_{t+1})\cdot (x^{opt}_{t}+a\cdot t+404.9)$$ где Kt + 1 - коэффициент Калмана, зависящий от шага итерации.
Мы должны выбрать Kt + 1 таким, чтобы получившееся оптимальное значение координаты xt + 1opt было бы наиболее близкое к истиной координате xt + 1. В общем случае, чтобы найти точное значение коэффициента Калмана Kt + 1 , нужно просто минимизировать ошибку: $$e_{t+1}=x_{t+1}-x^{opt}_{t+1}$$ Подставляем в уравнение (7) выражение (8) и упрощаем: $$
e_{t+1}=x_{t+1}–K_{t+1}\cdot z_{t+1}-(1-K_{t+1})\cdot (x^{opt}_{t}+a\cdot t+404.9)=\\=x_{t+1}-K_{t+1}\cdot (x_{t+1}+N_{t+1})-(1-K_{t+1})\cdot (x^{opt}_{t}+a\cdot t+404.9)=\\=x_{t+1}\cdot (1-K_{t+1})-K_{t+1}\cdot (x_{t+1}+N_{t+1})-(1-K_{t+1})\cdot (x^{opt}_{t}+a\cdot t+404.9)=\\=(1-K_{t+1})\cdot (x_{t+1}–x^{opt}_{t}–a\cdot t–404.9)–K_{t+1}\cdot N_{t+1}=\\=(1-K_{t+1})\cdot (x_{t}+a\cdot t+404.9+E_{t}–x^{opt}_{t}–a\cdot t–404.9)–K_{t+1}\cdot N_{t+1}=\\=(1-K_{t+1})\cdot (x_{t}–x^{opt}_{t}+E_{t})–K_{t+1}\cdot N_{t+1}=\\=(1-K_{t+1})\cdot (e_{t}+E_{t})–K_{t+1}\cdot N_{t+1}
$$ Таким образом, получаем: $$e_{t+1}=(1-K_{t+1})\cdot (e_{t}+E_{t})–K_{t+1}\cdot N_{t+1}$$ Мы будем минимизировать среднее значение от квадрата ошибки: E(et + 12)→min Т.к. все входящие в et + 1) случайные величины независимые и средние значения ошибок сенсора и модели равны нулю: E[Et]=E[Nt + 1]=0, и все перекрестные значения равны нулю: E[Et ⋅ Nt + 1]=E[et ⋅ Et]=E[et ⋅ Nt + 1]=0, то получаем: $$E(e^{2}_{t+1})=(1-K_{t+1})^{2}\cdot (E(e^{2}_{t})+D(E_{t}))+K^{2}_{t+1}\cdot D(N_{t})$$ Где D(Et) и D(Nt + 1)-дисперсии случайных величин Et и Nt + 1. Найдем минимальное значение для выражения (11) (т.е. найдем производную): $$-2\cdot (1-K_{t+1})\cdot (E(e^{2}_{t})+D(E_{t}))+2\cdot K_{t+1}\cdot D(N_{t})=0$$ $$-E(e^{2}_{t})–D(E_{t})+K_{t+1}\cdot E(e^{2}_{t})+K_{t+1}\cdot D(E_{t})+K_{t+1}\cdot D(N_{t})=0$$ $$K_{t+1}=\frac{E(e^{2}_{t})+D(E_{t})}{E(e^{2}_{t})+D(E_{t})+D(N_{t})}$$ Таким образом, получаем такое Kt + 1, что выражение E(et + 12) будет минимальным: $$K_{t+1}=\frac{E(e^{2}_{t})+D(E_{t})}{E(e^{2}_{t})+D(E_{t})+D(N_{t})}$$ Случайные величины имеют нормальный закон распределения и мы знаем, что их дисперсии равны: δE2 и δN2. Заметим, что дисперсии не зависят от t, потому что законы распределения не зависят от него. Подставляем в выражение для среднеквадратичной ошибки E(et + 12) минимизирующее ее значение коэффициента Калмана Kt + 1 и получаем: $$
E(e^{2}_{t+1})=(1-\frac{E(e^{2}_{t})+\delta^{2}_{E}}{E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N}})^{2}\cdot (E(e^{2}_{t})+\delta^{2}_{E})+(\frac{E(e^{2}_{t})+\delta^{2}_{E}}{E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N}})^{2}\cdot \delta^{2}_{N}
$$ $$
E(e^{2}_{t+1})=(1-\frac{E(e^{2}_{t})+\delta^{2}_{E}}{E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N}})^{2}\cdot (E(e^{2}_{t})+\delta^{2}_{E})+(\frac{E(e^{2}_{t})+\delta^{2}_{E}}{E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N}})^{2}\cdot \delta^{2}_{N}=\frac{(\delta^{2}_{N})^{2}\cdot (E(e^{2}_{t})+\delta^{2}_{E})}{E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N})^{2}}+\frac{\delta^{2}_{N}\cdot (E(e^{2}_{t})+\delta^{2}_{E})^{2}}{(E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N})^{2}}=\\=\frac{\delta^{2}_{N}\cdot (E(e^{2}_{t})+\delta^{2}_{E})\cdot (E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N})}{(E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N})^{2}}=\frac{c\cdot (E(e^{2}_{t})+\delta^{2}_{E})}{E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N}}=\frac{\delta^{2}_{N}\cdot (E(e^{2}_{t})+\delta^{2}_{E})}{E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N}}
$$ Таким образом, получаем: $$
E(e^{2}_{t+1})=\frac{\delta^{2}_{N}\cdot (E(e^{2}_{t})+\delta^{2}_{E})}{E(e^{2}_{t})+\delta^{2}_{E}+\delta^{2}_{N}} - среднее\ значение\ квадрата\ ошибки
$$ Таким образом, мы получили формулу для вычисления коэффициента Калмана.
Теперь, когда мы познакомились с фильтром Калмана, реализуем пример на Delphi. Для тестирования программы мы воспользуемся Delphi 7
Код:
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs, StdCtrls, ExtCtrls;
type
TForm1 = class(TForm)
img1: TImage;
edt1: TEdit;
edt2: TEdit;
lbl1: TLabel;
lbl2: TLabel;
edt3: TEdit;
lbl3: TLabel;
edt4: TEdit;
lbl4: TLabel;
btn1: TButton;
procedure btn1Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
var
Form1: TForm1;
implementation
{$R *.dfm}
procedure TForm1.btn1Click(Sender: TObject);
var
N, i: Integer;
a, sigma_psi, sigma_eta: Real;
x, z, xOpt, eOpt, k: array [1..100] of Real;
begin
Randomize;
N:=StrToInt(edt1.Text);
a:=StrToFloat(edt2.Text);
sigma_psi:=Sqrt(StrToFloat(edt3.Text));
sigma_eta:=Sqrt(StrToFloat(edt4.Text));
x[1]:=0;
z[1]:=x[1]+Random(Trunc(sigma_eta))-sigma_eta/2;
for i:=1 to N-1 do
begin
x[i+1]:=x[i]+404.9+a*i+random(Trunc(sigma_psi))-sigma_psi/2;
z[i+1]:=x[i+1]+random(Trunc(sigma_eta))-sigma_eta/2;
if x[i+1]<0 then x[i+1]:=0;
if z[i+1]<0 then z[i+1]:=0;
end;
xOpt[1]:=1;
eOpt[1]:=sigma_eta;
for i:=1 to N-1 do
begin
eOpt[i+1]:=sqrt((Sqr(sigma_eta))*(Sqr(eOpt[i])+sqr(sigma_psi))/(sqr(sigma_eta)+sqr(eOpt[i])+sqr(sigma_psi)));
k[i+1]:=(Sqr(eOpt[i+1]))/sqr(sigma_eta);
xOpt[i+1]:=(xOpt[i]+a*i)*(1-k[i+1])+k[i+1]*z[i+1];
if xOpt[i+1]<0 then xOpt[i+1]:=0;
end;
img1.Picture:=nil;
img1.Canvas.MoveTo(30,40);
img1.Canvas.LineTo(30,450);
img1.Canvas.LineTo(540,450);
img1.Canvas.Pen.Color:=clLime;
img1.Canvas.MoveTo(30,450-round(xOpt[1]));
for i:=2 to N do
begin
img1.Canvas.LineTo(30+(i-1)*5,450-round(xOpt[i]*0.05));
end;
img1.Canvas.Pen.Color:=clRed;
img1.Canvas.MoveTo(30,450-round(z[1]));
for i:=2 to N do
begin
img1.Canvas.LineTo(30+(i-1)*5,450-round(z[i]*0.05));
end;
img1.Canvas.Pen.Color:=clBlue;
img1.Canvas.MoveTo(30,450-round(x[1]));
for i:=2 to N do
begin
img1.Canvas.LineTo(30+(i-1)*5,450-round(x[i]*0.05));
end;
Img1.Canvas.Textout(15, 455, '0');
Img1.Canvas.Textout(10, 390, '1200');
Img1.Canvas.Textout(10, 330, '2400');
Img1.Canvas.Textout(10, 270, '3600');
Img1.Canvas.Textout(10, 210, '4800');
Img1.Canvas.Textout(5, 150, '6000');
Img1.Canvas.Textout(5, 90, '7200');
Img1.Canvas.Textout(5, 30, '8400');
Img1.Canvas.Textout(30, 20, 'V');
Img1.Canvas.Textout(130, 455, '20');
Img1.Canvas.Textout(230, 455, '40');
Img1.Canvas.Textout(330, 455, '60');
Img1.Canvas.Textout(430, 455, '80');
Img1.Canvas.Textout(530, 455, '100');
Img1.Canvas.Textout(550, 445, 't');
end;
end.
Запустим программу и получим график на котором будут располагаться 3 линии: линия истинного положения бронепоезда, линия показания сенсора и линия полученная после использования фильтра Калмана.
Результат выполнения программы:
sfw Projektbericht
and 1 collaborator
In February and March 2016, CorrelAid and streetfootballworld (sfw) co-organized a workshop series on using data to unleash the power of football for social good. In three workshops which focused on streetfootballworld's network member communications, campaigns and performance metrics nine experts from CorrelAid's network provided consultancy on a pro bono basis to streetfootballworld. The three workshops focused on the following themes:
Challenges for the CIO
and 1 collaborator
SWOT | Helpful | Harmful |
Internal | Strength · Cost reduction · Scalable IT resources · Technology is up to date · No investment risk * Server management is ourtsourced | Weaknesses · Dependent on provider · Risk of renouncing own IT competence · Working internet connection needed * Reliability of the cloud |
External | Opportuinities · Green IT * Drives innovation | Threats · Safety and data protection * Data protection regulations |
Несжимаемая неоднородная жидкость в условиях теплообмена
Какую жидкость считают несжимаемой? Часто под этим понимают жидкость, дивергенция скорости которой всюду равна нулю $\div \vec u \equiv 0$. На самом деле, это верно лишь в двух случаях: (1) термодинамическое равновесие наступает быстрее механического, а значит, распределение температуры оказывается квазиоднородным; или (2) частицы жидкости вовсе не обмениваются теплом друг с другом. Другими словами, в первом случае теплообмен очень интенсивен, во втором - отсутствует вовсе. Для жидкостей с неоднородным распределением температуры и при наличии механизмов внутреннего теплообмена условие несжимаести имеет другой вид, т.е. для них $\div \vec u \ne 0$. В данной работе условие несжимаемости устанавливается на основе более фундаментального условия неразрывности и уравнения теплопереноса.
В конце автор рассмотрел два приложения полученного здесь условия несжимаемости к кинематике жидкости и гидростатике - расчёту вертикальной скорости свободной поверхности и давления на дне жидкости. В результате получено, что вертикальная скорость свободной поверхности определяется не только дивергенцией скорости столбика под ней, но и совокупностью эффектов термического расширения и интенсивности теплообмена. На величину давления эффекты термического расширения и тепллообмена не сказываются.
REDES NEURONALES ARTIFICIALES PARA MODELAR LA PREDICCIÓN DE LA DEFORMACIÓN DE UN MATERIAL EXPUESTO A LA RADIACIÓN SOLAR
and 1 collaborator
Una Red Neuronal Artificial (Artifical Neural Network, ANN) es un modelo matemático que trata de emular a los sistemas neuronales biológicos en el procesamiento de información \cite{alejo2010analisis}.
Las ANN se basan en una estructura de neuronas unidas por enlaces que transmiten información a otras neuronas, las cuales entregan un resultado mediante funciones matemáticas. Las ANN aprenden de la información histórica a través de un entrenamiento, proceso mediante el cual se ajustan los parámetros de la red, a fin de entregar la respuesta deseada, adquiriendo entonces la capacidad de predecir respuestas del mismo fenómeno. El comportamiento de las redes depende entonces de los pesos para los enlaces, de las funciones de activación que se especifican para las neuronas, las que pueden ser de tres categorías: lineal, de umbral (o escalón) y sigmoideal, y de la forma en que propagan el error \cite{freeman1991algorithms}.
Existen varios algoritmos que permiten ir corrigiendo el error de pronóstico; uno de los más usados es el denominado “retro propagación”, que consiste básicamente en propagar el error hacia atrás, desde la capa de salida hasta la de entrada, permitiendo así la adaptación de los pesos con el fin de reducir dicho error \cite{hilera2000redes}.
En forma simplificada, el funcionamiento de una red “retro propagación” consiste en el aprendizaje de un conjunto predefinido de pares de entradas-salidas dados como ejemplo, empleando un ciclo de propagación–adaptación de dos fases: primero, al aplicar un primer patrón como estímulo para la capa de entrada de la red, éste se va propagando a través de las capas siguientes para generar la salida, la cual proporciona el valor del error al compararse con la que se desea obtener. A continuación estos errores se transmiten hacia atrás, partiendo de la capa de salida, hacia todas las neuronas de la capa oculta intermedia que contribuyan directamente a la salida, recibiendo el porcentaje del error aproximado a la participación de las mismas en la salida original \cite{ovando2005redes}..
Este proceso se repite siempre hacia atrás, capa por capa, hasta que todas las neuronas de la red hayan recibido un error que describa su aporte relativo al error total. Basándose en esta información recibida, se reajustan todos los pesos de conexión, de manera que la siguiente vez que se presente el mismo patrón disminuya la diferencia entre la salida calculada y la deseada \cite{ovando2005redes}.
La importancia de la red retro propagación consiste en su capacidad de autoadaptar los pesos de las neuronas de las capas intermedias para aprender la relación que existe entre un conjunto de patrones dados como ejemplo y sus salidas correspondientes.
Dependiendo del tipo de aplicación y sus características, se han desarrollo distintos tipos de redes neuronales, que han sido aplicadas de forma satisfactoria en la predicción de diversos problemas en diferentes áreas del conocimiento tales como Biología, Medicina, Economía, Ingeniería, Psicología, entre otras \cite{pol2000prediccion}.
La toma de decisiones es un punto clave para las áreas anteriormente mencionadas, ya que las decisiones deben de estar evaluadas por criterios de evidencia y experiencia. Como herramienta de criterios de evidencia se han utilizado los modelos de redes neuronales artificiales (ANN), para realizar las predicciones.
Algunos trabajos como los de Javier Trujillano en \cite{trujillano2004aproximacion} realiza estos métodos para la predicción de resultados de medicina, por ejemplo en el fracaso renal, con el objetivo de extraer conclusiones adecuadas acerca de la posible evolución de la enfermedad, en este artículo se comparó el uso de las ANN contra regresión lineal, el resultado fue favorable para las ANN, ya que para la regresión lineal hay que agregar más dependencias para aproximar resultados similares a los obtenidos con las redes neuronales.
Por otro lado, Alfonso Palmer en \cite{pol2000prediccion} realiza una predicción del consumo de éxtasis a partir de redes neuronales artificiales con el fin de descriminar quién consume éxtasis y quién no. Los resultados muestran que la ANN desarrollada es capaz de predecir el consumo de éxtasis a partir de las respuestas dadas a un cuestionario, con un grado de eficacia del 96.66%.
Sin embargo, Juan David en \cite{henao2006modelado} usa un modelo de redes neuronales artificiales para representar la dinámica del índice del tipo de cambio real colombiano, porque describe mejor la dinámica de la serie que un modelo lineal autorregresivo, como lo muestra el resultado del contraste del radio de verosimilitud. El modelo fue aceptado después de aplicarle una serie de pruebas estándar y de contrastar sus resultados con los obtenidos usando un modelo lineal autorregresivo. Los resultados indican que el valor actual de la serie depende únicamente de su valor anterior.
Particularmente, el uso de las ANN han tenido gran interés, debido a su capacidad para representar relaciones desconocidas a partir de los datos mismos.
El resto del artículo está organizado de la siguiente manera. En la sección [preliminares] se describe de manera general conceptos que se utilizan en este trabajo, mientras que la metodología se presenta en la Seccción 3. La fase de la experimentación y resultados se describe en la sección 4, seguido de las conclusiones.
A6
Form a pointset of all circle centers. Use a WSPD to find the closest pair of centers \(O(n\log n)\). Return disjoint
if the distance between these points is > 1, non-disjoint
else.
Disjoint circles
\(\iff\) all pairwise circle centers are > 1 dist apart (unit circles, r=1)
\(\iff\) the closest pair of points is > 1 dist apart \(\square\).
BSHMM : A model for Markov-based DNA methylation profiling and case study in diatoms.
and 3 collaborators
The internship took place in the Laboratory of Quantitative and Computational Biology in Paris. The lab is led by A. Carbone and is affiliated with both UPMC and CNRS. The research focuses on interdisciplinary computational biology, promoting a tight collaboration between theoretical and experimental approaches, both conducted in the same lab within seven different teams composed of biologists, computer scientists, statisticians and biophysicists. Under the supervision of Hugues Richard, I was part of the analytical genomics team whose area of research spans two main subjects : protein evolution and modelling and sequence evolution.
The idea of studying methylation patterns based on a statistical method was initiated during the first year of the master’s degree as a compulsory project. The goal was to construct and implement a model inspired by the Ph.D. thesis of Bogdan Mirauta with his active help and supervision. Guillaume Viejo, a fellow student at the time and myself had to repurpose Parseq \cite{Mirauta_2014}, a model aimed at RNA-Seq data analysis and modify it into a reliable DNA methylation profiling tool, starting from a library of sequencing data called BS-Seq.
A 6 months voluntary internship further extended this work. Even though the main motivation of the project has been kept the same, the statistical methods have been heavily simplified : from a sophisticated Monte Carlo combined with Gibbs particle sampling into a more practical and easier 3-layer hidden Markov process of order 1, more relevant to the aspirations of an internship research project. The tool has been almost entirely implemented during this period and dubbed BSHMM for BS-Seq Hidden Markov Model. It has been proven to be effective on simulated data but no validation had been conducted in real world conditions yet. In addition, during this year, I presented a poster presenting the tool at the CJC (Jeunes Chercheur des Cordeliers) meeting, which is mainly aimed at Ph.D. students.
This second internship was an immediate follow up to the development of BHSMM. We first sought to validate our results by comparing them to those of a different methylation experiment based on microarrays to draw the 5-methylcytosine (5mC) profile of Phaeodactylum tricornutum. The second part consisted of using the tool that we have implemented in a larger pipeline of analysis. Recent publications have shown how methylation profiles exhibit spatial periodicity and play an important role in the chromosome arrangement inside the nucleus of some diatom species via nucleosome linkage. \cite{Huff_2014} Besides, the same type of periodicity has been observed in the expression level of small RNAs, although it is still unclear whether these two single events are related to the same biological process. The goal is to figure out whether this periodicity is also present in P. tricornutum, and also if it is linked in any way to the placement patterns of small non coding RNA (snRNA) derived fragments.
Working title: Variation in grassland community trait patterns over climate gradients.
and 1 collaborator
A central goal in ecology is to identify and understand the processes that influence the distributions of species in space and time. Often, these assembly processes are not directly observable over feasible time scales and must instead by inferred through pattern \cite{Levin_1992}. One increasingly popular approach is to use the values and abundances of species traits in a community as evidence for the influence of particular assembly processes \cite{Cavender_Bares_2004,Ackerly_2007,Kraft_2008}. Trait-based approaches have several advantages over strictly taxonomic approaches in that they are quantitative, easily generalizable, and have explicit ties to ecological strategy and performance \cite{McGill2006,Violle_2007}.
Unfortunately, inferring process from community trait patterns is not always straightforward because different processes can lead to similar patterns, multiple processes can operate simultaneously on multiple traits, and patterns can be affected by exogenous forces. For example: community assembly is sometimes depicted as a balance between environmental filtering, in which species unable to tolerate environmental conditions are filtered out resulting in a clustering of trait values, and niche differentiation, in which competition and limiting similarity result in trait values that are more evenly spaced than expected by chance \cite{Cavender_Bares_2004,Kraft_2007}. But recent work has shown that environmentally-filtered communities can result in random or overdispersed trait patterns (e.g. when there is sufficient within-community environmental heterogeneity) \cite{DAndrea2016}, and competition-structured communities can result in clustering patterns \cite{Mayfield_2010}. In addition, pattern-based evidence of assembly processes can be obfuscated by propagule pressure from adjacent communities \cite{Leibold_2004}, or by fluctuating environmental conditions that favor different species over time \cite{Chesson_1981,Chesson_1994}.
Although it is unlikely that a single pattern-based test will ever provide incontrovertible evidence for niche differentiation, analysis of community trait structure can still shed light on assembly processes if used properly. Different metrics should be used in complementary ways to provide more detailed, and thus more interpretable characterizations of community trait structure. In one recent study, \cite{DAndrea2017} suggest a stepwise analysis pipeline in which potential niches along trait axes are identified using a clustering algorithm, and if clusters are identified, then the fine-scale abundance structure within each cluster is examined for evidence of distance-based competition. Next, tests of community trait structure should be conducted along environmental gradients where they can potentially be tied to mechanistic predictions derived from existing ecological theory \cite{Webb_2010}. Lastly, analyses of community trait structure should be used to develop and select hypotheses for experimental testing in the field, rather than be considered as compelling standalone evidence.
Here, we apply a suite of newly developed and classical metrics of community trait structure to a network of twelve grasslands positioned along temperature and precipitation gradients in southern Norway. Our tests include measures of clustering, fine-scale trait abundance structure, and whole-community trait abundance structure. We look for community-level patterns in four traits: leaf area, maximum potential canopy height, seed mass, and specific leaf area (SLA). Based on our knowledge of the system, we predict a gradual shift in importance of competitive interactions at the coldest sites to environmental filtering at the most stressful sites. We expect that competition for light will be the strongest competitive factor at the warmest sites, and thus there will competition-derived clustering in maximum height and leaf area. We expect there to be niche differentiation in SLA at the coldest sites, where there could be a tradeoff between risky fast-growth strategies and the ability to tolerate/avoid early season frosts. Ultimately, our work uses trait-based predictions of community assembly processes to glean information about the relative influence of assembly mechanisms on grassland community composition.
We measured four traits: leaf area (LFA), specific leaf area (SLA), maximum plant height (MXH), and seed mass (SDM). We standardize our traits by taking the logarithm of the trait value and rescaling the logarithms to range between 0 and 11. We applied our tests on each trait individually, as well as on the Euclidean space formed by these traits, which is a four-dimensional hypercube of side 1.
For each site we calculate its Rao quadratic entropy, defined as $Q=\sum_i^{S-1}\sum_{j=i+1}^S d_{ij}p_i p_j$, where pi and pj are the relative abundance of species i and j, dij is the absolute trait difference between them, and the sum is over all species pairs. It corresponds to the expected trait difference between two individuals randomly sampled (with replacement) from the community. We also used the functional dispersion metric proposed in \cite{Laliberte2010}, defined as the abundance-weighted mean distance di between each species i and the community trait centroid. That is, FDis = ∑ipidi. When a single trait is considered, this is simply ∑ipi|xi − ∑jpjxj|, where xi is the trait value of species i. Both indices have been used to quantify community functional diversity \cite{Botta-Dukat2005,Laliberte2010,Ricotta2011}. A high value indicates trait overdispersion, i.e. species cover a wider region of trait space than expected by chance. In contrast, a low value suggests that species are being filtered toward a particular trait value, possibly due to selection for optimal tolerance to local environmental conditions \cite{Keddy1992}.
In addition to test statistics based on trait dispersion, we also used a measure of the degree of even spacing between adjacent species on the trait axis. The metric is defined as CV = σ/μ, where μ and σ are respectively the mean and standard deviation of the distances between closest neighbors in trait space. When a single trait is considered, species can be ordered by trait value, and the distance vector is di = |xi − xi + 1| between adjacent species i and i + 1. A low CV indicates even spacing. Even spacing has been proposed as indicative of niche differentiation, as it maximizes exploration of niche space \cite{Mason2005}, and minimizes competitive interactions caused by trait similarity \cite{MacArthur1967}. On the other hand, recent work has raised the possibility that resource partitioning may actually lead to species clustering on the trait axis \cite{Scheffer2006}. In particular, clusters in trait space are expected if competitive exclusion is slow or if immigration replenishes species that are not niche-differentiated \cite{DAndrea2016}. Given this possibility, the coefficient of variation may actually be higher than expected by chance.
Although species may be clustered, they may still sort into niches that in turn are evenly spaced. This could occur if competition is caused by trait similarity \cite{Scheffer2006,DAndrea2017}. In that case, the most abundant species in the community might be expected to be evenly spaced even though the community as a whole is clustered. Based on these considerations, we used the CV in two metrics. First, we considered all species in the community without regard for abundance. A similar test statistic, the variance divided by the range, is commonly used to quantify evenness \cite{Stubbs2004,Kraft2008,Ingram2009}. Second, we gradually remove species from the community in increasing order of abundance, at each step calculating the CV among the remaining species. If the CV declines as the least abundant species are progressively removed, this suggests even spacing between niches concomitant with clustering between species.
Finally, we test for the presence of clusters directly by applying a cluster-finding method. Our metric uses a k-medoid clustering algorithm, which partitions trait space into groups (clusters) of species, each group with a specific medoid, i.e. the species that is closest to all other members of its group. It is an iterative process which alternately decides cluster membership and medoid identity by minimizing the average distances in trait space between species and the medoids of their clusters \cite{Kaufman1990}. We implement the algorithm using the function clara in R package cluster \cite{Maechler2016}. For each community-year, we find the number of clusters that best fits the data using R’s optim function for Markov chain Monte Carlo optimization \cite{RCoreTeam2015}. The quantity being optimized is the average silhouette width, a measure of how similar individuals are to their own cluster compared to neighboring clusters \cite{Kaufman1990}. Once the optimal number of clusters is found, the test statistic is the optimized average silhouette width. We then test for clustering by comparing the test statistic against the set of null communities.
In order to create null communities against which to compare our data, we used a mainland-island approach, where each site undergoes zero-sum birth-death neutral dynamics and immigration from a fixed regional species pool \cite{Hubbell2001}. For each site, the regional pool includes all species falling within the observed trait range, with the regional abundance of each species calculated as the mean across all sites. For each site we estimated immigration rates by fitting a neutral model to the observed relative species cover, and estimated community size by matching the neutral simulated communities to observed species richness. Estimated community size ranged from 215 individuals for Fauske to 567 for Gudmedalen, and immigration rate ranged from 0.03 for Ovstedal to 0.53 for Lavisdalen. For each site we simulated 1,000 neutral communities.
To test for significance, for each of our sites in a given year we compare the metric value to the (1 − α)-quantile of the corresponding set of null communities. Of our five metrics, three (Rao, FDis, CV) are two-tailed, as both low and high values can be interpreted to suggest specific community assembly processes, while the other two (CVtrend, Clara) are one-tailed. We use significance level α = 0.025 for the two-tailed tests and α = 0.05 for the one-tailed tests.
Fig. 2 summarizes our results for the 2009 census. Bar plots show the percentage of the 12 sites that tested significant against the set of null communities. We focus on the 2009 census, but our results were consistent across the years (see Figure S2 in the Supplement), indicating that deterministic factors are playing a role in the trait structure of our communities.
Leaf area and SLA, which are related traits, had similar results across most tests. Between 30% and 50% of sites were significantly overdispersed according to Rao and FDis. A smaller percentage (20%) of sites were significantly underdispersed in SLA. The CV was significantly high for leaf area in 50% of the sites, indicating uneven spacing between adjacent species. Results were weaker and more ambiguous for SLA: spacing between adjacent species was significantly even in 20% of sites, and significantly uneven in another 10%.
In contrast, seed mass showed the strongest indication of underdispersion. 30% and 50% of sites had significantly low Rao and FDis indices, respectively. Furthermore, there was no significant evenness in any of the sites according to the CV metric. And 25% of the sites showed a significant negative trend in CV as low-abundance species are removed.
Results were ambiguous for maximum plant height. Rao and FDis results were relatively strong but split between significant overdispersion and underdispersion, with the latter being a slight majority. Our CV result was also ambivalent, with 30% of sites indicating even spacing between species while another 20% indicate the opposite pattern. 20% of sites had a significant negative trend in CV as rare species are removed.
When all four traits were considered together in a Euclidean trait space, results were somewhat ambiguous for the functional dispersion metrics but tended towards overdispersion (30% overdispersion against 20% underdispersion). According to the CV, species were evenly spaced in this multidimensional space in 20% of the sites, and were not significantly uneven in any site.
Rao and FDis results were largely consistent with each other for all traits and the Euclidean space, corroborating previous results that indicate these two statistics are related \cite{Laliberte2010}.
A low percentage of sites, between 10% and 25%, showed evidence of significant clustering according to the CV trend and Clara metrics. Particularly for Clara, numbers were consistently low across traits and the Euclidean space, averaging just above 10% detection of significance. Given the null expectation of significance in 5% of the sites because of our α = 0.05 significance cutoff, these results suggest that species are not sorting into distinguishable clusters in our sites.
Figure 3 shows the variation in the standard score of our Rao results against mean summer temperature of our sites2. We see a significant trend in Rao scores against temperature for SLA and max height, plus the Euclidean space. The trend is negative in all cases, indicating that colder sites tended to be more overdispersed than warmer sites.
Results for the other metrics across the years are shown in Figure S3 in the Supplement and summarized in Table 1. Aside from FDis, which showed similar trends as Rao for the same traits, we found a negative trend in CV for leaf area in two years and for seed mass in one year, and a positive trend in Clara for SLA and the Euclidean space in one year. We also see that for SLA the positive slope in CV as low-abundance species are removed was slightly steeper in higher temperatures, whereas in leaf area, max height, and the Euclidean space the opposite was observed. It should be noted that although consistent across years, those trends were weak and the standard scores involved had small magnitude.
Trends were for the most part consistent across years. No trait showed opposite trends in different years, and many trends were observed in all four years, while some occurred in one, two, or three years (Table 1, see also Fig. 3S). We also checked for trends against mean annual precipitation, but found largely nonsignificant results (Fig. 3S).
Our results indicate that, relative to the regional pool, the leaf traits were often overdispersed in our local alpine communities, in the sense that species with extreme trait values tend to be more abundant than would be expected from a random draw from the pool. We found no evidence that species in local communities are evenly spaced on the leaf trait axes; on the contrary, species tended to be unevenly dispersed in leaf area. There was some suggestion however that even spacing occurred between the most abundant species. Lastly, species rarely seemed to form recognizable groups in these leaf traits. The trait overdispersion concomitant with the lack of even spacing are compatible with the hypothesis that species are being selected into distinct functional groups or niches, but within each niche species either compete neutrally or are selected for a particular trait value.
Seed mass showed the opposite behavior of leaf traits, as a sizeable fraction of our local communities were underdispersed in seed mass: species with a particular seed mass tended to occur more frequently or be more abundant than those deviating from the optimum, possibly because they are better adapted to local conditions or because they are better competitors. There was some suggestion of
Even spacing between adjacent species was distinctly observed in the Euclidean space formed by leaf area, SLA, maximum plant height, and seed mass. Spacing seemed more even between the more abundant species in about one in five sites. Trait dispersion results were ambiguous in the Euclidean space, with the number of significantly underdispersed communities roughly matching that of overdispersed communities. Overall, these results are compatible with the classical idea that species avoid competition by maximizing interspecies distances in niche space.
\label{fig:Fig1}
\label{fig:Fig2}
\label{fig:Fig3}
\label{table:Table1}
Mathematically, we standardize by defining yi = (log(xi)−log(xmin)) / (log(xmax)−log(xmin)), where xi is the trait value measured for species i, and xmin and xmax are the lowest and highest trait value observed in the data.↩
The standard score measures the difference between the data and the null communities relative to the variation across the nulls. If the test score in a site was x, and the mean and standard deviation of the null scores were respectively μ and σ, then the standard score is z = (x − μ)/σ.↩
Алгоритмы локации и маршрутизации. Алгоритм Калмана-Петрова
and 1 collaborator
Предположим, что входной сигнал описывает некоторый авторегрессионный процесс первого порядка: \begin{equation}\label{lab1} x_{t} = \phi \cdot x_{t-1} + u \cdot t + \upsilon_{t} = 0,26 \cdot x_{t-1} + 0,8 \cdot t + \upsilon_{t} , \end{equation} где $\upsilon_{t} \sim N(\bar{m_{\upsilon}},\sigma^2_{\upsilon})$ - некоторая помеха сигнала, произвольная случайная величина, распределенная нормально с параметрами $\bar{m_{\upsilon}} = 0$ ( мат. ожидание), а συ2 = 0, 2 (дисперсия помехи сигнала). Уравнение наблюдение будет выглядеть следующим образом: \begin{equation}\label{lab2} y_{t} = \gamma \cdot x_{t} + \epsilon_{t} = 0,72 \cdot x_{t} + \epsilon_{t} , \end{equation} где $\epsilon_{t} \sim N( \bar{m_{\epsilon}},\sigma^2_{\epsilon} )$ также случайная величина - помеха при наблюдении, распределенная нормально с параметрами $\bar{m_{\epsilon}} = 0$ (мат. ожидание), а σϵ2 = 5 (дисперсия помехи наблюдения).
Более того предполагается, что помехи наблюдения и сигнала некоррелированы ( т.е E(ϵt − iυt − j, ∀i, j) ).
Целью данной работы являлась симуляция работы фильтра Калмана \cite{Gen_ay_2002} на некотором сигнале ([lab1]),где в качестве входных данных выступают дисперсии помехи сигнала συ2 и помехи наблюдения σϵ2.
Алгоритмы локации и маршрутизации. Алгоритм Калмана-Салова
and 1 collaborator
Обозначим за xt величину, которую мы будем измерять, а потом фильтровать. Мы будем измерять координату бронепоезда, который может ехать только вперед и назад. Движение бронепоезда задано формулой: $$ x_{t}=5+2\cdot t+0.1\cdot t^{2} $$ Выразим координату бронепоезда через ускорение и предыдущую позицию бронепоезда: $$ x_{t+1}=5+2\cdot (t+1)+0.1\cdot (t+1)^{2}=5+2\cdot t+0.1\cdot t^{2}+2+0.2\cdot t+0.1=\\=x_{t}+0.2\cdot t+2.1 $$ т.к. взяв два раза производную от xt получим, что ускорение равно 0.2, то координата бронепоезда будет изменяться по закону: $$ x_{t+1}=x_{t}+a\cdot t+2.1 $$ где a=0.2 Но в реальной жизни мы не можем учесть в наших расчетах маленькие возмущения, действующие на бронепоезд, такие как: ветер, качество рельс и т.п., поэтому настоящая координата бронепоезда будет отличаться от расчетной. К правой части написанного уравнения добавится случайная величина Et $$ x_{t+1}=x_{t}+a\cdot t+2.1+E_{t} $$ Мы установили на бронепоезд GPS сенсор. Сенсор будет измерять координату xt, но, к сожалению, он не может точно измерить ее и мерит с ошибкой Nt,которая тоже является случайной величиной: $$ z_{t}=x_{t}+N_{t} $$ Задача состоит в том, чтобы, зная неверные показания сенсора zt, найти хорошее приближение для истинной координаты бронепоезда xt. Это приближение мы будем обозначать xtopt. Таким образом, уравнение для координаты и показания сенсора будут выглядеть следующим образом: $$ \begin{cases} x_{t+1}=x_{t}+a\cdot t+2.1+E_{t}\\ z_{t}=x_{t}+N_{t} \end{cases} $$