Public Articles
The Pressure Structure of Molecular Clouds
and 2 collaborators
Abstract. Broadly, we seek to understand the role of pressure in star forming molecular clouds. We examine molecular line data of the Perseus region from the COMPLETE survey alongside radiative transfer-processed ‘observations’ of the turbulent simulations of S. Offner to try to (1) understand to what extent we can actually measure pressure through observations, and (2) study how pressure changes within a cloud’s substructure.
A simplified feature vector obtained by wavelets method for fast and accurate recognition of handwritten characters off-line
and 2 collaborators
The study of character recognition is divided into off-line and on-line methods mainly \cite{simistira2015recognition}. The difference between them lies on how handwriting is done and analyze. For the off-line recognition, the data are taken to be a static representation of text, since it can not be established the order on which they were produced by a machine or handwritten \cite{tapia2007survey}. On the other hand, the on-line recognition, the original data are glyphs and points. They are normally storage on regular intervals of time \cite{tapia2005understanding}. Character recognition is one the most important topics in pattern recognition. Specially, digit, character, symbol recognition as well as mathematical expressions. Classification and recognition of vehicular plates, postal codes and others are also of great interest among pattern recognition researchers \cite{hallale2013twelve}. The handwriting recognition has been an ongoing research for decades. But, just recently handwriting recognition has been of great in other areas \cite{zanibbi2012recognition}.
The study of on-line characteristics has been one of the main interest, for that reason the researchers have combined several already developed methods for the extraction of on-line and off-line characteristics to recognize characters \cite{keshari2007hybrid}, \cite{winkler1996hmm}, \cite{alvaro2014recognition}.
This paper is focused on the off-line recognition of handwritten characters. The study is based on descriptors such as FKI already in use \cite{marti2001using,alvaro2014offline} and descriptors based on discrete wavelets \cite{obaidullah2015numeral}. The dataset to be used in this work have been generated by \cite{de2009character}, the author mainly concern is the recognition of character from advertisement images, warning signs, magazines thus creating a database of digits and characters (0 − 9, A-Z, a-z) aiming at gathering and let available images for each individual element. In order to evaluate the results on the characteristic extraction by using the above database, the descriptors FKI, discrete wavelet, and our simplified wavelet method are compared in accuracy and time terms using the Nearest Neighbour rule 1-NN as classifier.
The paper is organized as follows: Section 2 we present a review of the descriptors of interest: FKI and descriptors wavelet, with which we will compare the result with our simplified wavelet method. In section 3 a set of characteristics is defined by a simplified wavelets methods, which are the base of this work. The section 4 we present the result by comparing the three different methods. Finally, in Section 5 we present the conclusions and future work.
(10,6) (2,2.2)(1,0)6 (2,2.2) (6,2.2)(4,2)[r]
Unsupervised learning: Clustering and density estimation
and 2 collaborators
When it comes to determining and explaining information within a large, complicated, or multi-dimensional dataset, it can often be difficult to see patterns and relationships. In the case of supervised learning, where there is input data and output data, it is possible to artificially construct and optimize a model that can, with time and several iterations, predict and improve performance through its own experience by splitting the input data into training and testing sets. Naturally, supervised learning can yield a lot of information and results; however, in the case of unsupervised learning, or when output data is not available, many other methods exist to try and capture the natural structure of the data and make useful observations. This report will attempt to use unsupervised methods to try and infer further information about the cars dataset that was used in the previous exercise.
Interferometric Array Multi-Objective Visual Analytics
\label{sec:intro} This document presents a parametric model to help design an Interferometric Array. It focuses in the value vs. cost trade-off inherent to many of its architecture definitions. This is a Multiple Objective problem. This document describes design parameters to consider in § [sec:var] and a set of equations for research and cost objectives in § [sec:obj]. A spreadsheet that uses these design parameters and produces a CSV file for analysis of the emerging Pareto Front is introduced in § [sec:spreadsheet]. This output enables the of Multiple Objective Visual Analytics (MOVA) for complex engineered systems as proposed in \cite{mova}.
\label{sec:var} This section presents selected design parameters that influence selected objectives in § [sec:obj]. We will select design parameters that are specification agnostic. As an example of this, the parameters will be relevant to multiple antenna specifications, including offset Gregorian and symmetric Cassegrain.
We will use A in this document as each array element collecting area (thus we could also write π ⋅ D2, with D being the dish diameter).
We will use ηa in this document as the antenna efficiency with \begin{equation}\label{eq:antenna_efficiency} \eta_a = \eta_{\text{surface eff.}} \cdot \eta_{\text{aperture blockage}} \cdot \eta_{\text{feed spillover eff.}} \cdot \eta_{\text{illumination taper eff.}} \end{equation} as defined in \cite{antenna}.
We will use N in this document as the number of array elements.
We will use P in this document as the number of pad built for the array. In case re-configuration of the array is envisioned, there might be a bigger number of pads ready for aperture connection to the system.
We will use the geographic latitudes and longitudes to establish pad location in this document. We will calculate the length of the possible baselines using pad positions. We will also calculate length and complexity of the roads, fiber and power networks needed using pad positions. We will use B as the maximum array element separation in any single configuration.
We will use R as the number of frequency bands, being Ri the different frequency bands. If the array bandwidth is λmax − λmin, it is useful for our analysis to use wavelength λ = λmin.
Notes: high bandwidth ration: up to 7 might be practical, but could compromise Ae/Tsys. High absolute bandwidth is challenging for digitalization. up to 20GHz might be practical.
Notes: directly at RF (no reference), single sideband down conversion (LO and timing reference), double sideband (IQ) down conversion (two LO, two references, LO tunable.
Bits per sample (dynamic range)
We will geographic latitude and longitude to establish correlator location in this document. We will calculate fiber, power and road network aspects based in this information.
We will use ηc as correlator efficiency in this document, with \begin{equation}\label{eq:correlator_efficiency} \eta_c(t_{int}) = \frac{\text{correlator sensitivity}}{\text{sesitivity of a perfect analog correlator having the same } t_{int}} \end{equation} as defined in \cite{sensitivity}.
\label{sec:obj} This section aims to include array performance objectives that might be influenced by design variables in § [sec:var].
As derived in \cite{design}, the antenna diameter determines its beam size $\theta_{ant} \approx \frac{\lambda}{D}$. If the plane area $\frac{B}{\lambda}$ is divided in cells of size $\frac{D}{\lambda}$ then \begin{equation}\label{eq:fourier} N_{occ} \leqslant \pi (\frac{B}{D})^2 \end{equation}
An overall measure of performance is the System Equivalent Flux Density, SEFD, defined in \cite{sensitivity} as the flux density of a source that would deliver the same amount of power: \begin{equation}\label{eq:system_equivalent_flux_density} SEFD = {\frac{T_{sys}}{\frac{\eta_a A}{2k_B}}} \end{equation} in units of Janskys where Tsys is the system temperature including contributions from receiver noise, feed losses, spillover, atmospheric emission, galactic background and cosmic background, and kB = 1.380 × 10−23 Joule K−1 is the Boltzmann constant. According to \cite{sensitivity}, if we assume N apertures with the same SEFD, observing the same bandwidth Δν, during the same integration time tint, then weak-source limit in the sensitivity of a synthesis image of a single polarization is \begin{equation}\label{eq:sens} \Delta I_m = {\frac{1}{\eta_s }}{\frac{SEFD}{\sqrt{(N(N-1) \Delta \nu t_{int}}}} \end{equation} in units of Janskys per synthesized beam area, with ηs most important factor being correlator efficiency ηc.
According to \cite{moran}, a commonly used rule of thumb for the cost of an antenna is that it is proportional to Dα, where α ≈ 2.7 for values of D from a few meters to tens of meters. For N antennas of diameter D meters with accuracy $\frac{\lambda}{16}$, where λ is in millimeters we could use \cite{mmadesign} as an upper limit for Antenna construction cost. \begin{equation}\label{eq:antenna_cost} \text{Antenna Cost} = \frac{890N(\frac{D}{10})^{2.7}}{(\lambda^{0.7})} + 500 \end{equation} in K$.
For M frequency bands, each 30% wide, and dual polarization we could use \cite{mmadesign} as an upper limit for Front-End System Cost: \begin{equation}\label{eq:fe_cost} \text{Front-End System Cost} = 45MN + 200M \end{equation} in K$.
We could use \cite{mmadesign} as an upper limit for LO System Cost: \begin{equation}\label{eq:lo_cost} \text{LO System Cost} = 80N+100 \end{equation} in K$.
We could use \cite{mmadesign} as an upper limit for IF Transmission Cost: \begin{equation}\label{eq:IF_Tx_cost} \text{IF Transmission Cost} = 8BN + 30N + 400 \end{equation} in K$.
We could use \cite{mmadesign} Correlator Cost as an upper limit: \begin{equation}\label{eq:correlator} \text{Correlator cost} = 2N^2 + 112N +1360 \end{equation} in K$.
\label{sec:spreadsheet} This section presents a spreadsheet that produces data in the right format for performing visual analytics, consistent with variables in § [sec:var] and objectives in § [sec:obj].
Tricky because you can compensate antenna quality with software. So the equations must capture this trade off.
論澎湖西嶼新發現之「皇明洪門楊氏」墓
and 4 collaborators
Oliver Streiter 奧利華,國立高雄大學 Sandy Lin 林莉倫,國立高雄大學 Nai-Yu Chen 陳乃瑜,國立中興大學 James X. Morris,國立政治大學 Yaqing Zhan 詹雅晴,國立臺北教育大學
Microbial Community Structure of Submerged Aquatic Vegetation in the Potomac River
and 1 collaborator
Submerged aquatic vegetation (SAV) are plants that are rooted in sediment and fully submerged most of the time, and have many adaptations for coping with varied salinity and osmotic conditions. We focus here on one aspect of SAV - their microbiome - which was studied in the Potomac River along a salinity gradient as the river empties into the Chesapeake Bay. The goal was to find a link between the microbial communities on different SAV species and the changing salinity across the river.
One of the four successfully sampled sites was very different from the rest in terms of microbial community and water/sediment chemistry, clustering separately from the other sites on PCoA plots. Methylotenera, Planctomyces, Rhodobacter, and Providencia are commonly found amongst most SAV species across all sites, and sulfur oxidizing bacteria were present in high relative abundance in the roots of Potamogeton perfoliatus at one site.
Site location, which had distinct water and sediment chemistries, was a main driver of the microbial community structure. Host species of SAV and sample types (leaves or roots) also have different microbial communities. Due to the small sample size in this study, it is difficult to draw robust conclusions about the impact of salinity on microbial community structure. Therefore, future efforts will sample more thoroughly along the Potomac river, as well as along the length of the James River, which provides a nearby, parallel salinity gradient.
淺談極座標繪圖
本文旨在用最精簡的方式介紹極座標參數式的繪圖方法,所針對的情形為微積分課本中常見的範例,不見得適用於一般的通式。希望大家在閱讀完畢之後,能對極座標繪圖有初步的概念。底下我們針對r = 2cos3θ這個參數式的作圖來說明。
Step 1. 決定θ的範圍。微積分課本中常見的範例,其圖形大多為週期,也就是說,我們只需要考慮有限的θ範圍即可繪出完整的圖形。要決定θ的範圍,首先得將f(θ)=2cos3θ的圖形給描繪出來。 圖1
上圖中,x軸的刻度是以$\dfrac{\pi}{6}$為單位,可以看到的是,我們將前兩個週期的圖形分別給了由(1)到(8)的編號。為什麼我們要這樣編號呢?因為從(1)到(2)的過程中,r經歷了由正轉負,而(3)到(4)則是由負轉正;這邊要小心一個地方:因為x = rcosθ,y = rsinθ,所以θ的範圍也會影響描點時的相對位置。編號(1)到(3)對應到的θ範圍分別為$\left[0, \dfrac{\pi}{6}\right]$、$\left[\dfrac{\pi}{6}, \dfrac{\pi}{3}\right]$、$\left[\dfrac{\pi}{3}, \dfrac{\pi}{2}\right]$,θ位於第一象限,因此x-y的相對位置完全由r的正負號決定。而(4)到(6)對應到的θ範圍分別為$\left[\dfrac{\pi}{2}, \dfrac{2\pi}{3}\right]$、$\left[\dfrac{2\pi}{3}, \dfrac{5\pi}{6}\right]$、$\left[\dfrac{5\pi}{6}, \pi\right]$,此時θ位於第二象限,因此y的座標與r的正負號正好相反。
我們可以觀察到,(7)、(8)兩個區域的θ正好與(1)、(2)兩個區域的θ相差π,而且r的正負號也恰恰相反;換言之,(7)的圖形會重複(1)的圖形,而(8)的圖形會重複(2)的圖形。因此,我們可以得到底下這個結論:極座標參數式r = 2cos3θ的圖形,其θ的範圍為[0, π],且可細分為六個區域繪圖。
Step 2. 匯出具有代表性的參考點。這些參考點,基本上就是步驟一當中所得到的六個區域的端點,將這七個點在座標平面標示出來後,圖形也就呼之欲出了。
Step 3. 描繪最終圖形。將步驟二的參考點依序連接,即可得到最終我們所要的圖形。 圖2
RvEBV
and 1 collaborator
The community of astronomical strawmen says that $\RV$ should correlate with ISM density – dust grows and/or agglomerates in dense structures. We can look for this correlation in the intersection of the Valencic+ (year) and Jenkins (2009) samples. Valencic+ provides extinction information, including $\RV$, and Jenkins provides ISM density and dust-to-gas ratio information along the same lines of sight. However, while there is a clear correlation between density and the dust-to-gas ratio (Figure [fig:nH_F]), there does not appear to be a correlation between density and $\RV$ (Figure [fig:lognH_RV]).
By looking at Eddie’s map of $\RV$ over a large area, we can generate the hypothesis that there’s not a clear density-$\RV$ correlation because much of the $\RV$ variation in Eddie’s map happens on much larger spatial scales than the angular size of a dense ISM structure. There could be a density-$\RV$ relationship on top of this large-scale variation, but there isn’t a clear $\EBV$ vs. $\RV$ correlation because the large-scale variation has a higher magnitude.
So, if we want to look for an $\EBV$ vs. $\RV$ correlation, we need to filter out the large scale structure. One way to do this is to look at differences in $\EBV$ vs. differences in $\RV$ between pairs of sightlines as a function of the sightlines’ separations.
Speed Dating Tool- Authorea
and 1 collaborator
\cite{di_ferrante_ehlers-danlos_1975}
Hi, I am Aliza and I will tell you a little bit about my friend Authorea!
Are you writing a thesis?
Are you a frequent LaTeX user? It’s okay |we support markdown, bibtex etc... (let me just show you)
Would you like to learn Latex? Here is link that will help you out!
Are you using any programming language to handle all your quantitative data i.e. javascript or Ipython notebook? Here is a cool link you will love!!
Authorea is a collaborative research tool. It will save you from doing mundane tasks. Authorea provides a platform for:
Authorea provides a platform for collaborative writing and review of your manuscript
It has an easy automated citation mechanism
It is a one-stop repository for all your figures’ data, code, and editing, and even lets you get pre-publication feedback from your peers.
version control helps you keep track of what changes you have made. Have you hear of Git
Produce neat readable work
Invite co-authors and work on the document at the same time
Attach interactive graphs to your article
Work offline using git (need to know a little about git and github)
Add and manage citations
Write mathematical equations
Add comments!
Export files in PDF and other formats.
follow and unfollow to know who made changes via email...
If you like you need help go here
But that all the boring stuff... lets do something cool
Proposal idea for a new experiment
SMMP - Stochastic Methods for Molecular Properties
Possible titles:
Stochastic Methods for Molecular Properties (SMMP)
Stochastic Methods for Chiroptical Properties (SMCP or ChiroStoch)
Deterministic methods need large Hilbert spaces for effective expansions of the many-electron wave function
This is however largely redundant \cite{Ivanic_2001}
Stochastic algorithms are highly parallelizable in the number of walkers.
I will develop my skills in parallel programming techniques by developing this project.
Research questions:
Objectives of the project:
The calculation of molecular properties with high accuracy and for systems of relevant size
Devise the appropriate stochastic approach to the solution of response equations
The creation of the appropriate software toolbox with good scalability.
General background on quantum chemistry:
State of the art:
QMC:
Properties by QMC
Chiroptical properties:
Response theory:
Problems to address:
TODO:
King Chicken Theorem
Detailed Reviewer Responses
and 3 collaborators
We would like to thank the reviewers for their insightful comments. The major points that have been addressed are as follows:
It was not our intention to give the impression that one needs to scan human calibration phantoms at each site to properly power a multisite study with nonstandardized parameters, which is very costly. The statistical model which takes MRI bias into account has been emphasized instead. The bias that was measured and validated via calibration served to corroborate the scaling assumption of the statistical model. For other researchers planning multisite studies, the statistical model we proposed with the biases we reported should help plan and power a study.
Our measurements have been compared with other harmonization efforts, specifically \cite{cannon2014, jovicich2013brain} and \cite{Schnack_2004}.
The scanning parameters of our consortium have been better specified.
The independence assumption between the unobserved effect and the scaling factor for a particular site have been addressed. Specifically, we emphasized that this assumption could hold for MS patients based on our experiment. The need to validate this assumption for other situations by scanning human phantoms was recommended, and the equation of variance without the independence assumption has been provided for the readers.
Blog Post 10
1 $\underline{\text{Trees-A Branch of Discrete Mathematics}}$ Trees provide poets with inspiration as they sway through the breeze and their leaves, bursting with color, rustle in the wind. It is no wonder, then, that mathematicians coined the term “tree” in describing special classes of structured graphs. One author, Joe Malkevitch, makes it his goal to “convince you (readers) that mathematical trees are no less lovely than their biological counterparts.”
In discrete mathematics, and more specifically graph theory, a tree is a connected graph with no cycles. When the graph is not connected, naturally we call this a forest. In addition, a vertex of degree 1 is called a leaf. These kind of mathematical structures were first studied by mathematician Arthur Cayley. In 1889 Cayley published a formula stating that for n ≥ 1, the number of trees with n vertices is nn − 2.
A few other properties of trees include the following:
Given two vertices, x and y, there is a unique path from x to y
If we remove any edge of a tree, the graph is no longer connected
If a tree has n vertices, then it has n − 1 edges
The concept of mathematical trees has applications in various fields including science, the enumeration of saturated hydrocarbons, the study of electrical circuits, and many more (Harary, 1994, p. 4).
Functional consequence of SNPs on the Tuberculosis drug metabolising enzyme, human arylamine N-acetyltransferase 1
Tournament Graphs
Autopledge
and 2 collaborators
Even in following good coding practices, arbitrary code execution bugs can still exist. By leveraging pledge(2) system calls and a static analysis framework, we attempt to mitigate these bugs by automatically inserting pledge statements. Although an algorithm was devised to do this, time limitations prevented its full implementation.