Public Articles
UHI in Fortaleza and trends on screen-level air temperature and humidity
Mode Test By GMM and Excess Mass Methods
and 2 collaborators
\label{sec:methods}
\label{sec:methods-gmm}
GMM (Gaussian mixture modeling) method maximizes the likelihood of the data set using EM (expectation-maximization) method.
1. Assume that data has unimodal distribution: x ∼ N(μ, σ2). Calculate μ and σ2
2. Assume that data has bimodal distribution: x ∼ N(μ1, μ2, σ12, σ22, p)
Initial guess: μ1 = μ − σ, μ2 = μ + σ, σ12 = σ22 = σ2, p = 0.5
n= number of observations
θ = (μ1, μ2, σ1, σ2, p) z = (z1, ..., zn) categorical vector, zi = 1, 2
x = (x1, ..., xn) observations, (xi|zi = 1)∼N(μ1, σ12), (xi|zi = 2)∼N(μ2, σ22)
E-step P(z1)=p, P(z2)=1 − p
Marginal likelihood: L(θ; x; z)=P(x, z|θ)=$\prod\limits_{i=1}^n P(Z_i=z_i)f(x_i|\mu_{j}, \sigma^2_{j})$
Q(θ|θ(t))=Ez|x, θ(t)(logL(θ; x; z))
$T^{(t)}_{j,i}=P(Z_i=j|X_i=x_i,\theta^{(t)})=\frac{P(z_{j})f(x_i|\mu^{(t)}_{j}, \sigma^{2(t)}_{j})}{p^{(t)} f(x_i|\mu^{(t)}_{1}, \sigma^{2(t)}_{1})+(1-p^{(t)})f(x_i|\mu^{(t)}_{2}, \sigma^{2(t)}_{2})}$
$Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z})) = \sum\limits_{i=1}^n E[( \log L(\mathbf{\theta};x_{i};z_{i})] =$
$= \sum\limits_{i=1}^n \sum\limits_{j=1}^2 T^{(t)}_{j,i}[\log P(z_{j}) -\frac{1}{2}\log(2\pi) - \frac{1}{2}\log\sigma^{2}_{j} - \frac{(x_{i}-\mu_{j})^2}{2\sigma^{2}_{j}}]$
M-step θ(t + 1) = argmaxQ(θ|θ(t))
$\hat{p}^{(t+1)} = \frac{1}{n} \sum\limits_{i=1}^n T^{(t)}_{1,i}$, $\mu^{(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}x_i}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}$, $\sigma^{2(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}(x_i-\mu^{(t+1)}_{1})^2}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}$
Continue iterations t until |logL(t + 1) − logL(t)|<10−3
Conclusion about data is made based on 3 tests. H0 distribution is unimodal, H1 distribution is bimodal:
1. LRT (Likelihood ratio test) −2lnλ = 2[lnLbimodal − lnLunimodal]∼χ2 (LRT is the main test among all 3 tests for making conclusion about bimodality of data. The bigger −2lnλ is, the more we are convinced that distribution is bimodal).
2. (Bandwidth test) $D = \frac {|\mu_1 - \mu_2|}{(\sigma^2_1+\sigma^2_2)/2)^{0.5}}$ (D(distance)>2 is necessary for a clear separation of 2 peaks).
3. (Kurtosis test) kurtosis < 0 should be negative for a bimodal distribution.
In some hard cases D and kurtosis fail to detect bimodality. That is why our main test is LRT. For example on the next 2 plots distributions are bimodal, however on 1 plot D<2 (it is hard to distinguish 2 peaks) and on the 2 plot kurtosis is positive and that corresponds to unimodal distribution (it happens because distribution is biased):
Example document
and 2 collaborators
Investigation X
Leveraging circadian rhythms to study host-gut microbe interactions in wildlife
and 5 collaborators
Quão big o seu data deve ser?
and 1 collaborator
Clinical indicators for asthma-COPD overlap: a systematic review and meta-analysis
and 4 collaborators
Extracting organ-specific radiobiological model parameters from clinical data for radiation therapy planning of head and neck cancers
Automated & Easy Diagnosis of Cervical Cancer From Onsite Easy Colposcopy Images
and 1 collaborator
Explainable Fragment-based Molecular Property Attribution
and 6 collaborators
WisDM Green: Harnessing Artificial Intelligence to Design and Prioritize Compound Combinations in Peat Moss for Sustainable Farming Applications
and 8 collaborators
Cannabis amnesia – Indian hemp parley at the Office International d’Hygiène Publique in 1935
and 2 collaborators
Emerging green routes to nanocellulose
and 3 collaborators
Quantify precipitation contributions of ecosystems on the Tibetan Plateau
and 2 collaborators
BrM 5.2.3 - Internal Workings
UHI in Fortaleza and trends on screen-level air temperature and precipitation
Leech-Inspired Shape-Encodable Liquid Metal Robots for Reconfigurable Circuit Welding and Transient Electronics
and 8 collaborators
The dilemma of fibrous dysplasia versus chronic osteomyelitis of the posterior mandible: a case report
and 3 collaborators
An individually-controlled multi-tined expandable electrode using active-cannula-based shape morphing for on-demand conformal radiofrequency ablation lesions
and 7 collaborators
The dilated triple
This article was published as The dilated triple. Marko A. Rodriguez, Alberto Pepe, Joshua Shinavier. In: Emergent Web Intelligence: Advanced Semantic Technologies, Advanced Information and Knowledge Processing series, Pages 3-16, ISBN:978-1-84996-076-2, Springer-Verlag. 2010.
Abstract. The basic unit of meaning on the Semantic Web is the RDF statement, or triple, which combines a distinct subject, predicate and object to make a definite assertion about the world. A set of triples constitutes a graph, to which they give a collective meaning. It is upon this very simple foundation that the rich, complex knowledge structures of the Semantic Web are built. Yet the very expressivness of RDF, by inviting comparison with real-world knowledge, highlights a fundamental shortcoming of RDF: that it is limited to statements of absolute fact, in contrast to the thoroughly context-sensitive nature of human thought. However, when a statement is interpreted from beyond the scope of its local graph representation, other statements augment its meaning and identify its uniqueness. Following this line of thought, a model is presented in which each statement in an RDF graph is supplemented by some subjectively related subgraph of the same RDF graph, thereby framing the meaning of the statement within a broader context.
Reproducibility checklist for computational science and engineering
and 1 collaborator
Protein-protein interaction analysis of 2DE proteomic data of desiccation responsive Xerophyta viscosa leaf proteins
[ Environmental and LOS Effects in Lens Models]Quantifying Environmental and Line-of-Sight Effects in Models of Strong Gravitational Lens Systems
and 3 collaborators
Mass in the immediate environment of a gravitational lens galaxy or projected along the line of sight (LOS) can affect strong lensing observables more than current measurement errors. To quantify the resulting biases and uncertainties in lens model determined quantities like the Hubble constant H0, we consider three lens models that treat the environment and sightline to the lens in different ways: the first ignores mass external to the lens (Lens-Only), the second adds an external shear to the lens plane (Lens+Shear), and the third employs our new framework for multi-plane lensing \citep{McCully14} to build the full three-dimensional mass configuration (3-D Lens), including the additional mass from structures in the lens environment and sightline, as well as foreground and background voids. The Lens-Only model yields poor and biased fits. While the Lens+Shear model can account for tidal stretching from perturbing galaxies at the lens redshift and in the background, it requires corrections for external convergence and cannot fully reproduce the more complicated effects from perturbing foreground galaxies. Critically, our 3-D Lens model, which explicitly includes convergence, recovers lens model parameters without bias and with a scatter driven only by the lens profile degeneracy. For computational efficiency, we quantify which galaxies—by a combination of mass, projected distance from the lens, and offset from the lens redshift—can be treated with the tidal approximation and which need to be treated exactly in the 3-D Lens model. While it not surprising that massive galaxies lying close to the lens are significant perturbers, we also find that foreground structures affect the lens potential more. There is a dramatic variation in the strength of the LOS and environment effects across different lens fields. By modeling each field individually, we produce stronger priors on H0 than by ray-tracing through N-body simulations. We show that lens systems with very asymmetric lens configurations, i.e., those produced by highly elliptical main lens galaxies, are less sensitive to the lens profile degeneracy, thus producing stronger constraints on H0 and making appealing targets for LSST follow-up surveys.
Reproducible and replicable CFD: it's harder than you think
and 1 collaborator
Completing a full replication study of our previously published findings on bluff-body aerodynamics was harder than we thought. Despite the fact that we have good reproducible-research practices, sharing our code and data openly. Here's what we learned from three years, four CFD codes and hundreds of runs.