Jack O'Brien edited untitled.tex  almost 8 years ago

Commit id: 28eae382f5bf9750314c690932e37c66467b51a9

deletions | additions      

       

\section{Data}  Our sample consists of 56spectroscopically  confirmed quasar light curves from the seventh data release of the Sloan Digital Sky Survey (Schneider et al. 2010). These quasars range in redshift from $z = 0.19$ to $z = 3.83$ and in luminosity from $L = 10^{15} L_{\odot}$ to $L = 10^{20} L_{\odot}$. % These samples were taken from the Southern Equitorial Stripe known as Stripe 82. Stripe 82 is a $275^{2}$ degree area of sky with repeated sampling centered on the celestial equator. It reaches 2 magnitudes deeper than SDSS single pass data going as deep as magnitude 23.5 in the r-band for galaxies with median seeing of 1.1'' (Annis et al. 2014).  Each light curve contains photometric information from two bands (g and r) with as much as 10 years of data. The number of epochs of data range from 29 observations to 81 observations in all photometric bands with sampling intervals ranging from one day to two years. This inconsistent sampling will lead to issues with our analysis which will be discussed later. %  PSF magnitudes are calibrated using a set of standard stars (Ivezic et al 2007) to reduce the error in our data down to 1\%. We then convert these magnitudes to fluxes for our analysis and convert observed time sampling intervals to the rest frame of the quasar. %  We use asinh magnitudes (also referred to as "Luptitudes") for flux conversion (York et al. 2000) (Lupton, Gunn, \& Szalay 1999) as is standard for SDSS.   These samples were chosen because they also  exist in within  the field of the Kepler K2 mission's campaign 8. The Kepler space telescope's original purpose was planet finding, which required that it quickly and consistently take many exposures over long periods of time in order to look for small periodic dips in the light from potential planetary systems. After the failure of two out of the four reaction wheels, the telescope was limited in it's its  pointing to it's its  orbital plane. plane, which is approximately the ecliptic.  Due to it's limited mobility, the K2 project was started which involved having the telescope observe single regions of the sky for approximately 75 days at a time with observations every 30 minutes. (Howell et al. 2014). Fortunately, one of these regions happens to overlap with Stripe 82, which means short term time-series data will soon be available for these objects. This increase in time sampling rate will allow us to probe far deeper into the short-term variability properties of these objects as compared to SDSS. %Talk more about Kepler 

%**Talk about using white noise to drive the process  %**mention light curves as the time-series we're talking about  Analysis of astronomical time-dependent data sets requires methods of quantifying and parameterizing the properties of the observed process in order to learn about the underlying physical processes driving them. The most common method of parameterization of light curve variability currently is to compute the structure function. The structure function is useful because it allows us to relate the variation in brightness to the period of time over which we observe the change. The structure function is characterized by a power law with a slope and y-intercept as free parameters. While a useful tool, it lacks the sophistication required to probe the complex behavior of AGN which require far more parameters to effectively model. Fortunately, there model due to their complicated structure. Instead, we look to other tools commonly used in time-series analysis.   There  exist a class of finite difference equations used in the analysis of discrete time series known as autoregressive-moving average (ARMA) processes. These processes give us a way of inspecting the behavior of time-series with a simple but thoroughly descriptive parametric structure. A stationary process $\{X_t\}$ is can be modeled by  an ARMA(p,q) process if at every time $t$ $$X_t - \phi_1X_{t-1} - ... - \phi_pX_{t-p} = Z_t + \theta_{t-1} + ... + \theta_qZ_{t-q} $$  where $\{Z_t\}$ is a white noise process with zero mean and variance $\sigma^2$. For any autocovariance function $\gamma(h)$ for which $\lim_{h\to\infty}\gamma(h) = 0$ and any integer $k > 0$, there exists an ARMA(p,q) process with autocovariance function $\gamma_X(h)$ such that $\gamma_X(h) = \gamma(h)$ for all $h \leq k$. These relations make the ARMA process a very useful tool in the analysis and modeling of many different time-series.   %(reword this blurb, too much directly from the text)   Unfortunately, In reality,  the underlying  phenomena we are observing driving AGN variability  are not discrete processes, so in processes. In  order to get a proper understanding of the underlying physics, physics and structure of the AGN,  we need a continuous analog to the ARMA process. %\!\!BE SURE TO CITE BROCKWELL AND DAVIS TIME SERIES THEORY AND METHODS  %**Quick intro to ARMA (an maybe a bit of stochastic analysis) 

  $$d^{P}f(t) + \alpha_{P-1} d^{P-1}f(t) + ... + \alpha_{0} f(t) = \beta_{Q}d^{Q}w(t) + \beta_{Q-1} d^{Q-1}w(t) + ... + w(t)$$    Where $d$ represents a change in a variable between times $t$ and $t + dt$, $f$ represents the state of the system minus the mean, and $w \sim IID\ N(0, \sigma^{2})$ continuous-time white noise random process representing the driving noise in the system due to non-linear effects. In this case, it will represent temperature fluctuations due to magnetohydrodynamic instabilities in the accretion disk. In order for the process to be stationary, $p \lt q$.  The most well known example is the case where $p=1$ and $q=0$ CAR(1) process also known in the astronomical community as a damped random walk (DRW).   We construct the auto-regressive polynomial as the characteristic polynomial of the auto-regressive side of the CARMA process.    

  \section{The Kalman Filter}  AGN light curves only give us a picture of the effects that the underlying physical processes have on the AGN itself, but they don't directly tell us what those physical processes are or how they work. evolve.  Futher more, observation error makes it even more difficult to determine their structure. To understand these processes, we need a method of inspecting the state of the system based on our external  observations that can also take into account our observation error. The Kalman filter is a digital filter that attempts to reduce observation error and intrinsic system noise by making predictions about future values of the time-series based on an assumed model using Markov chains.  Using a Kalman filter together with our CARMA model, we can accurately model theses underlying physical processes and being begin  analyzing the true structure of the driving  system. In order to effectively use a Kalman filter, we must transform our observations, $y(t)$, into state space, $x(t)$.  $$ y(t) = b x(t) + \epsilon(t)$$  Where $\epsilon(t) \sim IID\ N(0, \sigma^{2})$ random process.   %prediction  %correction  %kalman gain  The Kalman filter can be used to estimate the likelihood of a proposed CARMA model by calculating the residuals of the CARMA model with the input time-series.  %need more research on kfilter  \section{Fitting CARMA Models}  \subsection{CARMA_Pack}  Computationally analyzing and fitting CARMA models requires a fairly sophisticated package. Though many scripting and data analysis languages offer libraries for ARMA analysis such as R and Matlab, few offer tools for the analysis of continuous continuous-time  data sets. For our analysis, we have decided to use a python library written in C++ called carma_pack carma_pack. Carma_pack was originally written for the purpose of analyzing astronomical time-series, specifically, AGN. It has been used successfully to derive the parametric structure of light curves of AGN taken by the Kepler satellite, and has shown to be a useful tool in the analysis of SDSS light curves  (Kelly et. al  2014). This package allows us to easily and conveniently fit CARMA process to light curves, determine the optimal model orders, and perform basic stochastic analysis of our best fit models. Carma_pack fits CARMA processes to light curves using the Metropolis-Hastings Markov-chain Monte-Carlo method (citation needed) with a Kalman filter providing the likelihood estimations for the proposals. \subsection{Model Orders}  In order to determine the best CARMA(p,q) model that fits a specific light curve, we need to know what the best possible values of $p$ and $q$ are. One method of doing this is to attempt to fit CARMA models to every possible combination of $p$ and $q$ up to some limit and compare the accuracy of each model by some metric. The metric by which we compare models should be one that takes into account the complexity of the chosen model, as increases in model complexity will cause us to lose information about the process to system noise. A commonly used metric for the comparison of time-series models is the Akaike Information Criterion or AIC. This criterion allows to to select the simplest model with the best fit by comparing likelihood estimations for each model and penalizing against model complexity. The best model is the one which minimized the AIC. The AIC for a specific CARMA model is  $$AIC = 2k - 2ln(L)$$  where $k$ is the number of model parameters, in this case $ k = p + q + 1 $, and $L$ is the likelihood of the CARMA process produced by the Kalman filter. Since we have a finite number of samples for each light curve we analyze, we need to use a corrected version of the AIC called the AICc which takes into account the sample size, $n$.  $$AICc = AIC + \frac{2k(k+1)}{n-k-1} $$  \section{Canonical Light Curves}  \subsection{SDSS J134342.5-004243.8 (sample 36 in r-band)}