Public Articles

Statistics 10 Chapter 8 notes

CHAPTER 8:HYPOTHESIS TESTING FOR POPULATION PROPORTIONS Testing a claim we calculated a 95% confidence interval for the true population proportion of UCLA students who travelled outside the US. \[(0.26,0.44)\] A 95% confidence interval of 26% to 44% means that - We are 95% confident that the true population proportion of UCLA students who travelled outside the US is between 26% and 44%. - 95% of random samples of size n = 100 will produce confidence intervals that contain the true population proportion. - The true population proportion,p, may be outside the interval,but we would expect it to be somehwat close to $$ - In our random sample of 100 students we had found that 35 of them have at some point in their lives travelled outside the US,$$= 0.35. - It is difficult to decide how close is close enough, or how far is too far, and this decision should not be made subjectively. HYPOTHESIS TESTING - In Statistics when testing claims we use an objective method called hypothesis testing - Given a sample proportion, ￼ , and sample size, n, we can test claims about the population proportion, p. - We call these claims hypotheses - Our starting point, the status quo, is called the null hypothesis and the alternative claim is called the alternative hypothesis. - If our null hypothesis was that p = 0.35 and our sample yields ￼ = 0.35, then the data are consistent with the null hypothesis, and we have no reason to not believe this hypothesis. - This doesn’t prove the hypothesis but we can say that the data support it. If our null hypothesis was different than p=0.35, lets say p=30 and our sample yields $$, then the data are not consistent with the hypothesis and we need to mke choices as to whethere this inconsistency is large enough to not believe the hypothesis. - If the inconsistency is significant, we reject the null hypothesis Example for hypothesis testing for one proportion Research conducted a few years ago showed that 35% of UCLA students had travelled outside the US. UCLA has recently implemented a new study abroad program and results of a new survey show that out of the 100 randomly sampled students 42 have travelled abroad. Is there significant evidence to suggest that the proportion of students at UCLA who have travelled abroad has increased after the implementation of the study abroad program? - population proportion used to 0.35 - new sample proportion is 0.42 - Testing the claim that the population proportion is now greater than 0.35 . - But is this difference statistically significant, i.e. are the data inconsistent enough? - We do a formal hypothesis test to answer this question. SETTING UP HYPOTHESES - Null hypothesis, denoted by Ho, specifies a population model parameter of interest and proposes a value for that parameter (p). \[H_0: p=0.35\] - Alternative hypothesis, denoted by HA, is the claim we are testing for.\[H_A :p > 0.35\] - Even though we are testing for the alternative hypothesis, we check to see whether or not the null hypothesis is plausible. - If the null hypothesis is not plausible, we reject the null hypothesis and conclude that there is sufficient evidence to support the alternative. If the null hypothesis is plausible, we fail to reject the null hypothesis and conclude that there isn’t sufficient evidence to support the alternative. Why do we check if Ho is plausible and not if HA is plausible? The same logic used in jury trials is used in statistical tests of hypothesis - We begin by assuming that the null hypothesis is true. - Next we consider whether the data are consistent with this hypothesis. - If they are, all we can do is retain the hypothesis we started with. If they are not, then like a jury, we ask whether they are unlikely beyond a reasonable doubt. - Hypothesis testing is very much like a court trial - this doesnt prove the hypothesis but we can say that the data supports it - H_0:Defendant is innocent H_A:Defendant is guility - We then present the evidence-collect data - Then we judge the evidence - “Could these data plausibly have happened by chance if the null hypothesis were true? - If they were very unlikely to have occurred, then the evidence raises more than a reasonable doubt in our minds about the null hypothesis. - Ultimately we must make a decision. How unlikely is unlikely? If the evidence is not strong enough to reject the presumption of innocent, the jury returns with a verdict of “not guilty” - The jury does not say that the defendant is innocent. - All it says is that there is not enough evidence to convict, to reject innocence. - The defendant may, in fact, be innocent, but the jury has no way to be sure. - Said statistically, we fail to reject the null hypothesis. - We never declare the null hypothesis to be true, because we simply do not know whether it’s true or not. - Therefore we never“accept the null hypothesis” How do we determine if H_0 is plausible? - In Statistics we can quantify our level of doubt. - How unlikely is it to get a random sample of 100 students where 42 have travelled abroad if in fact the true population proportion is 35%? - How unlikely is it to get a random sample of 100 students where 42 have travelled abroad if in fact the true population proportion is 35%? To answer this question we use the model proposed by the null hypothesis as a given and calculate the probability that the event we have witnessed could happen. Prob( Observed or more extreme outcome | H_0 true)=Prob($$ > 0.42 | p=0.35) - This probability quantifies exactly how suprised we are to see our results and is called the P-VALUE Calculating the p-value - First we calculate the test statistic(Z-score) - Test statistic used for hypothesis testing for proportions is a z-score \[ {SD}\] - Remember CLT: in calculating z-score we use the mean and SD by THE CLT, and observed value is \[ \approx N\left(mean=p,SD={n}}\right)\] P COMES FROM NULL HYPOTHESIS Check the Conditions for the CLT - Random and Independent: The sample is collected randomly and the trials are independent of each ot - Large Sample: - Sample has at least 10 successes, np ≥ 10 - at least 10 failures n(1 − p)≥10 - Large Population: If the sample is collected without replacement, then the population size is at least 10 times the sample size. Example of calculating p-value for study abroad stats We have that H₀ : p = 0.35 and H_A:p > 0.35 \[z-score=-p}{{n}}}={{100}}}=1.47\] and p-value\[Prob( > 0.42 \vert p=0.35=prob(z>1.47)=1-0.9292=0.0708\] Decision based on the p-value - When the data are consistent with the model from the null hypothesis, the p-value is high and we are unable to reject the null hypothesis. - In that case, we have to“retain” the null hypothesis. - We can’t claim to have proved it; instead we fail to reject the null hypothesis and conclude that the difference we observed between the null hypothesis (p = 0.35) and the observed outcome ($$￼ = 0.42) is due to natural sampling variability (or chance). - If the p-value is low enough, we reject the null hypothesis since what we observed would be very unlikely if in fact the null model was true. - We call such results statistically significant How low a p-value is low enough? - we compare the p value to a give α: if pvalue < α → Reject H_0 - There is sufficient evidence to suggest to H_A is plausible if p value > α → fail to reject H_0 - There is not sufficient evidence to suggest that HA is plausible. The difference we are seeing between the null model and the observed outcome is due to natural sampling variability. - When p-value is low, it indicates that obtaining the observed $$ or even a more extreme outcome is highly unlikely under the assumption that Ho is true, therefore we reject that assumption. α level is the complement of the confidence level. Remember: WHEN CONSTRUCTING CONFIDENCE INTERVALS IF A CONFIDENCE LEVEL IS NOT SPECIFIED USE 95 PERCENT CONFIDENCE - Since 1 − 0.95 = 0.05, if a α level is not specified use α= 0.05 Decision based on the p-value with a p-value of 0.0708, do we reject of fail to reject H_0? Since p value is greater than α than we fail to reject H_0 What does the conclusion of the hypothesis mean in context of the research question? Answer: The data do not provide convincing evidence to suggest that the true population proportion of UCLA students who have travelled outside the US has increased. Interpreting the p-value - A p-value is a conditional probability - the probability of the observed or a more extreme outcome given that the null hypothesis is true. Prob(observed or more extreme outcome | H_0 true) - the p value is NOT the probability that null hypothesis is true - It’s not even the conditional probability that the null hypothesis is true given the data The following is the correct interpretation of the p value? If in fact the true population proportion of UCLA students who have travelled outside the US is 0.35, the probability of getting a random sample where the sample proportion is 0.42 or higher would be 0.0708. - When testing for population proportions, there are three possible alternative hypotheses: - H_A: pp_0 - H_A:p ≠ p_0 - We decide on which alternative hypothesis to use based on we hypothesize (or what the wording of the question instructs as to hypothesize): - Smaller, less, decreased, fewer - Larger, greater, more, increased - Different, not equal to, changed - We DO NOT decide on which alternative hypothesis to use based on what the data suggest. - HA: parameter ≠ hypothesized value is known as a two-sided alternative because we are equally interested in deviations on either side of the null hypothesis value. - For two-sided alternatives, the p-value is the probability of deviating in either direction from the null hypothesis value. - The other two alternative hypotheses are called - A one-sided alternative focuses on deviations from the null hypothesis value in only one direction. - Thus, the p-value for one-sided alternatives is the probability of deviating only in the direction of the alternative away from the null hypothesis value. THERES MORE TO DO HERE BUT FOR NOW SKIP TO CHAPTER 9

Autonomous and Scalable Computation of Spatial Correlation Function from Spatiotemporal Materials Data

and 2 collaborators

Material science is experiencing pressure to deliver new and improvement materials to the marketplace in half the time and at half the cost.[ref2] To reach this goal, a hurdle that will need to be overcome is the increasing dissonance between the strength of material science in generating massive stores of empirical and simulated physical information and its weakness to utilize the data at scale. Materials science is challenged by the 3 V’s of Big Data as the Volumes of the unstructured datasets are reaching terabyte sizes, as high performance computing is increasing the Velocity which information is generated, and with the Variety of information being generated by conduits in the field from both experiment and simulation. The current subjective, undocumented, and ad-hoc data analysis tools cannot be scaled to suit these needs. New data science techniques will need to be explored to offer objective statistical representations of materials science information.[ref]

Microstructure Informatics is an emerging framework that provides a growing suite of data science tools to extract bi-directional structure-property/processing relationships for most classes of materials science information. Microstructure Informatics uses digital signal processing and advanced statistics to properly encode material information into useful forms for machine learning, data structures, and algorithms that extract added value from the information. In microstructure informatics, the material structure, or microstructure, is considered to be the independent of the dependent property or processing response. Spatial statistics (i.e. Pair Correlations, N-Point statistics) are commonly used statistical utilities as it provides an objective statistical description of the material structure. The following successful case studies have been illustrated to shown the effective of spatial statistics in (1) determining objective microstructure comparisons in heat treated α-β Titanium [ref], (2) building regression models for the homogenized structure-property connections between the internal structure of fuel cell materials and their diffusivity [ref], and (3) determining the variance of properties associated with individual microstructures [ref].

This paper discusses fast algorithms to encode material structure information using parameterized basis function and spatial correlation functions. The concept of the microstructure function is employed to parameterize the material information and produce a digital microstructure signal; encoding can be performed on most classes of experimental and simulated material structure information.[ref] The digital signals are convolved using embarrassingly parallel Fast Fourier methods to compute the spatial statistics of the digitized microstructure. The following properties of the spatial correlation functions make them a worthy candidate for an objective material structure descriptor: several widely used statistical metrics are embedded in the correlations such as volume fraction [ref] and specific surface area [ref]; for binary images, they contain information about the original material structure information within a translation [ref]; and they can describe most types of materials science information in raw signal is processed appropriately. N case studies will be presented to illustrate the generality of the technique when applied to things.

This paper will begin with a discussion on classifications of materials science information and their proposed conversions to a digital signal using the microstructure function. Once the raw material structure information is digitized, they will form the foundations that allow spatial statistics to be computed using fast, scalable FFT algorithms. N case studies will be shown to express the diversity and general applicability of the spatial correlation functions in structure-structure comparison.

VLA FRB May 2014 Report

We report on the state of observing for the VLA FRB project, also known as 13B-409 and 14A-425. We detected no FRBs in the first 76 hours observed under an approved DDT proposal (13B-409). Comparing our rate constraint to published rates revealed inconsistencies in published rates that led to an overestimate of the chance for a VLA detection. We discuss an improved rate estimate and find that completing the 147-hour campaign under program 14A-425 will give us a 50% chance at detecting an FRB. The completed VLA observations are the first interferometric search for FRBs and have already highlighted the value of using an interferometer to define a robust FRB rate limit. Ensuring that 14A-425 is observed to completion will significantly improve our chances of making the first interferometric detection of an FRB; an extremely exciting scientific result.

Observations took place between late September 2013 and mid January 2014 (see Table [fields]). The array was in CnB configuration for the first 10 hours observed, was being reconfigured during the next 10 hours of observing, and was in B configuration for the final 56 hours observed. We observed for a total of 76 hours and were on our target fields for 63.3 hours for an observing efficiency of 83%.

Name | RA | Dec | Time |

(J2000) | (J2000) | (hrs) | |

RA02 | 2:27:52.7 | +9:13:24.3 | 2 |

CDF-South | 3:32:28.0 | –27:48:30.0 | 4 |

RA05 | 5:04:37.0 | –30:50:0.1 | 16 |

COSMOS | 10:00:28.6 | +2:12:21.0 | 10 |

RA12 | 12:00:7.2 | +5:53:12.0 | 4 |

FRB120127 | 23:15:00 | –18:25:00 | 40 |

\label{fields}

All 63.3 hours of time on extragalactic fields has been searched for transients with dispersion measures from 0 to 3000 pc cm^{−3} at a timescale of 5 ms. Figure [snrhist] shows the typical SNR histogram of candidates greater than 6.5*σ*, which are saved for analysis. Nearly all candidates are consistent with thermal noise. Eight candidates deviated slightly from the thermal noise distribution and were inspected in detail. All of these candidates were found to be affected by RFI or were highly sensitive to flagging or imaging parameters.

Our analysis shows that we can exclude the presence of astrophysical transients on timescales of 5 milliseconds and below. We measured data quality at regular intervals throughout the search and found that roughly 1% of images had noise that was more than twice the median image noise. Our flux-calibrated observations have a median image noise of 12–14 mJy, as expected for 5-ms, L-band images made with data from 26 good antennas and 230 MHz of bandwidth. To include variance in the noise measurements, we define a 96% completeness for a 1*σ* image sensitivity of 15 mJy or an 8*σ* flux limit of 120 mJy. Observations of pulsar B0355+54 at a range of offset positions shows that imaging sensitivity scales as expected for the VLA primary beam gain pattern. This end-to-end test also confirms that our transient search pipeline works as expeted.

Figure [rate_pub] summarizes the published FRB event rates and the VLA rate limit to FRBs shorter than our integration time of 5 ms. In constructing this figure, we discovered that the sensitivity of published surveys are defined inconsistently. \citet{2014arXiv1404.2934S} calculate the mean beam gain within the FWHM. Burke-Spolaor & Bannister (2014, submitted; hereafter “BSB14”) use half the main beam gain. \citet{2007Sci...318..777L} use the measured fluence of their detection to define a fluence limit. Finally, \citet{2013Sci...341...53T} don’t report a fluence limit at all, but instead measure the mean fluence of all detections. For this figure, we use the mean primary beam gain, as in \citet{2014arXiv1404.2934S}, although this clearly overestimates the sensitivity at the half-power point, as demonstrated in our pulsar tests.

BlowJob:Blowable Interaction without Hands for Smartwatches

and 2 collaborators

ABSTRACT A central problem in convex algebra is the extension of left-smooth functions. Let $$ be a combinatorially right-multiplicative, ordered, standard function. We show that ℓI, Λ ∋ 𝒴U, 𝔳 and that there exists a Taylor and positive definite sub-algebraically projective triangle. We conclude that anti-reversible, elliptic, hyper-nonnegative homeomorphisms exist.

Validation of methods for Low-volume RNA-seq

and 1 collaborator

Recently, a number of protocols extending RNA-sequencing to the single-cell regime have been published. However, we were concerned that the additional steps to deal with such minute quantities of input sample would introduce serious biases that would make analysis of the data using existing approaches invalid. In this study, we performed a critical evaluation of several of these low-volume RNA-seq protocols, and found that they performed slightly less well in metrics of interest to us than a more standard protocol, but with at least two orders of magnitude less sample required. We also explored a simple modification to one of these protocols that, for many samples, reduced the cost of library preparation to approximately $20/sample.

**A575 Final Project: The occurrence rate of planets in the habitable zone based on data from the NASA Kepler mission**

and 7 collaborators

INTRODUCTION Petigura et al. (2013) calculated the frequency of Earth-like planets ($R=1-2\Rearth$) in the habitable zone (HZ) of their host star to be 22 ± 8%, with their HZ defined to be $0.25-4.0\Fe$, where $\Fe$ is the incident flux Earth receives from the Sun. The authors created an independent transit search algorithm and fitting routine called TERRA and applied it to the _Kepler_ light curves of the “Best42k”, or the brightest (Kp = 10 − 15) 42,557 Sun-like stars (Teff = 4100 − 6100K, logg = 4.0 − 4.9) exhibiting the lowest photometric noise, selecting those stars on the main sequence or just starting the subgiant phase, according to the initial stellar parameters. After false positive checks and other vetting procedures, 603 “eKOIs” remained, in analogy to the _Kepler_ Objects of Interest (KOIs). The key method that allows Petigura et al. (2013) to calculate the occurrence rate of Earth-like planets in the HZ is a thorough analysis of their completeness. They measured completeness by injecting 40,000 planets at different periods and planet radii into the light curves of the actual _Kepler_ light curves of the Best42k stars. They then applied TERRA to check how many of their injected transits they could recover. They are then able to correct for transiting planets missed by their TERRA code by dividing the number of planets recovered by the completeness value. One can then correct this number by the geometric probability that a planet would transit: PT = R*/a, where R* is the stellar radius and a is the semi-major axis of the planet. They use these corrections to calculate the frequency of planets in the inner HZ, which they define as $1-4\Fe$, where they claim a high enough completeness to get an accurate measurement of the occurrence rate of Earth-like planets. Their estimate of the occurrence rate for Earth-like planets in the inner HZ is 11 ± 4%. For the outer HZ, which they define to be $0.25-1.00\Fe$, they must rely on an extrapolation. Petigura et al. (2013) find a roughly constant occurrence rate in equally binned log(P), which translates into a roughly constant occurrence rate in equally binned log(F). Therefore, since $1-4\Fe$ and $0.25-1\Fe$ are equally spaced in log(F) (a factor of 4), the occurrence rate in the outer HZ is also 11 ± 4%. That makes the overall occurrence rate in the HZ 22 ± 8%. We aim to use the data available in Petigura et al. (2013) to redo the analysis as best we can by attempting to calculate the occurrence rate ourselves with different assumptions. For example, a HZ extending to $4\Fe$ is almost certainly too hot to be habitable for most terrestrial worlds, although particularly dry planets may be an exception. Another problem is that the completeness drops drastically at the exact point where the inner HZ begins. As such, Petigura et al. (2013) had to extrapolate to achieve an occurrence rate for their outer HZ, $0.25-1\Fe$, which is actually where the bulk of the HZ is according to many HZ calculations, such as Kopparapu et al. (2013). We would like to explore different definitions of what is habitable, what is Earth-like, and the methods to calculate the occurrence rate in the full HZ.

A Novel Machine Learning Based Approach for Retrieving Information from Receipt Images

In this paper we are approaching, from a machine learning perspective, the problem of performing optical character recognition on receipt images and then extracting structured information from the obtained text. Tools that have not been trained specifically for this kind of images do not handle them well usually, because receipts have custom fonts and, due to size constraints, many letters are close to each other. In this paper we adapt existing methods for doing OCR, in order to achieve better performance than off-the-shelf commercial OCR engines and to be able to extract the most accurate information from receipts. Document layout analysis is performed on the receipts, then lines are segmented into characters using Random Forests and finally they are classified using Linear Support Vector Machines. We provide an experimental evaluation of the proposed approach, as well as an analysis of the obtained results.

Soil Moisture Predictions Using Mixed Effects Models and Kriging

INTRODUCTION 1.Soil moisture patterns are important to predicting water quality of the runoff generated. Much attention in hydrology is paid to soil moisture as a risk indicator of surface runoff generation e.g. . Surface runoff quickly transports water to streams causing pulses of high flow that increase flooding potential, impact wildlife habitat, and transport sediment and dissolved contaminants from the land surface. The ability to predict where and when runoff is likely to be generated guides the planning and implementation of management practices that reduce these negative impacts. Traditionally, the temporal and spatial variability of soil moisture patterns is identified through two approaches. Complex, distributed hydrologic models such as the Soil Moisture Routing model predict patterns fairly well, but require climate and landscape data at fine resolutions to be reliable. In contrast, indices that draw on watershed-level hydrologic drivers such as terrain and soil properties can provide fast, simple estimates of soil moisture patterns. These topographic indices, originally developed by , can be used to quickly estimate patterns in regions where soil moisture spatial variability is driven by topographic changes and shallow soil depths. In the Northeastern United States, topographic indices have been shown to work well to predict soil moisture patterns . Soil moisture measurements can be collected rapidly and easily in the field, however due to signfiicant spatial and temporal variability, achieving a coverage for watersheds is a time-intensive endevor. Remote sensing techniques are in development, but the resolution of their predicitions is too coarse for meaningful guidance in management practices . In this project we are interested in using geostatistics and the topographic index models to extroplate soil moisture measurements for comparison with patterns predicted by a more complicated, uncalibrated hydrologic model. To do this, we use a geostatistical tool for characterizing spatial patterns, the semivariogram model. Semivariograms quantify the change in variance between two points in a field based on their distance. The sill represents the distance at very large distances (large being relative to the data). The distance at which the sill is stable is considered the range. At this distance, spatial correlation no longer exists. The variance in repeated measurements at the same location is given as the nugget of the semivariogram. Kriging is a geostatistical technique that makes use of the semivariogram for interpolating data where it does not exist. Kriging with additional information, such as the output of the topographic index models, can be used to influence the spatial pattern. This is particularly useful where linear distance is not the only important correlation for a spatial pattern. In the case of soil moisture, a location in a stream may be continually saturated to a similar degree as a location far downstream that is also along a high accumalting flowpath. A location much closer in spatial distance to the saturated area may be significantly drier, perhaps because of a steep slope that transmits water away quickly. For this reason, straight kriging with the soil moisture data will present limited spatial results. In some variograms the range will occur at very close distances or spatial correlation will not be apparent. In this project we investigate the locations and dates where good spatial correlation is evident and use topographic index information to improve kriging predictions.

Thermohaline mixing in evolved low-mass stars

Thermohaline mixing has recently been proposed to occur in low-mass red giants, with large consequence for the chemical yields of low-mass stars. We investigate the role of thermohaline mixing during the evolution of stars between 1$\mso$ and 3$\mso$, in comparison with other mixing processes acting in these stars. We use a stellar evolution code which includes rotational mixing, internal magnetic fields and thermohaline mixing. We confirm that during the red giant stage, thermohaline mixing has the potential to decrease the abundance of ³He, which is produced earlier on the main sequence. In our models we find that this process is working on the RGB only in stars with initial mass $ \le 1.5\mso$. Moreover we report that thermohaline mixing is also present during core He-burning and beyond, and has the potential to change the surface abundances of AGB stars. While we find rotational and magnetic mixing to be negligible compared to the thermohaline mixing in the relevant layers, the interaction of thermohaline motions with the differential rotation may be essential to establish the timescale of thermohaline mixing in red giants. To explain the surface abundances observed at the bump in the luminosity function, the speed of the mixing process needs to be more than two orders of magnitude higher than in our models. However it is not clear if thermohaline mixing is the only physical process responsible for these surface-abundance anomalies. Therefore it is not possible at this stage to calibrate the efficiency of thermohaline mixing against the observations.

Splitting of the Number Line

and 2 collaborators

We introduce the novel concept of a split number line, in which the positive part is as usual and the negative part undergoes an either symmetric or asymmetric bifurcation at zero. An algrbra is made between the two kinds of negative components and positive compoments in the form of a stable triplex. The implications of this create a further abstraction on the concept of rings over number fields, such as to narrow the application of the Frobenius Theorem.

According to [Source: Ben] a stable simplex on a number field cannot be created, implications of Frobenius Theorem leading any real division algebra isomorphic to either ℝ, ℂ or ℍ where, the set of quaternions ℍ are the only non-Abelian algebra. Exclusion of hypercomplex, and therefore non-associative alegbras as we stick to real division algebra only.

We form an abstraction on the signality of a number *n* ∈ *N*, where *N* is a ring over real numbers.

Instead of the binary +,- number precursors indicating greater or less than zero we allow, +,-,| where + indicated greater than zero and - and | both indicate a number less than zero. There is a component form to these negative numbers, the “full” value is defined to be \begin{equation} -n \to a\hat{e_{-}} + b\hat{e_{|}} \end{equation}

with a,b positive constants, and $\hat{e_{-}}, \hat{e_{|}}$ basis elements that transform according to some algebraic rule along with the basis element $\hat{e_{+}}$

Algorithms - Lab 4, Problem 1

and 1 collaborator

INTRODUCTION This is a hand-in written by GROUP 7 (FIRE) for the Algorithms course (TIN092) at Chalmers University of Technology. Group 7 consists of: - MAZDAK FARROKHZAD - 901011-0279 - twingoow@gmail.com - Program: IT - NICLAS ALEXANDERSSON - 920203-0111 - nicale@student.chalmers.se - Program: IT

Deep learning

In the last couple of years a new approach has been used with great success, called deep learning. Deep learning usually consists of neural networks with at least 3 layers, which are used to build hierarchical representations of data. Some of the deep learning methods are unsupervised and they are used to learn better features that are than used to solve a machine learning problem. Other deep learning methods are supervised and can be used directly to solve classification problems. Deep learning systems have been used to obtain the state of the art in different problems, such as object recognition in images or speech recognition, tested on standard datasets such as MNIST, CIFAR-10, Pascal VOC and TIMIT.