Loop Functions

  • lapply -> loop over a list and evaluate a function on each element

  • saplly -> same as lapply but tries to simplify the result

  • apply -> apply the functions over the margins of an array

  • tapply -> apply the functions over a subsets of a vector

  • mapply -> multivariate version of lapply

lapply takes three arguments:

  1. a list x

  2. a function FUN

  3. ...

It always returns a list regardless of the primary input.

You can create a function “on the fly” inside lapply that only exists inside lapply <- Anonymous function

sapply is similar to lapply but simplifies the results:

  • If the results is a list where every element is length 1, then a vector is returned

  • If the results is a list where every element is a vector of the same length (>1), a matrix is returned

  • If it can’t figure it out, a list is returned

apply is used to evaluate a function over the margins of an array:

  • Often used to apply a function to the row or columns of a matrix

  • Can be used with general arrays

  • Not faster than a loop but it is shorter

apply arguments are:

  1. an array x

  2. an integer MARGIN vector indication which margins should be retained

  3. the function

  4. ...

  • Often used to apply a function to the row or columns of a matrix

  • Can be used with general arrays

  • Not faster than a loop but it is shorter

mapply multivariate apply which applies a function in parallel over Arguments:

  1. the function FUN

  2. ...

  3. a list of other arguments for the function MoreArgs = NULL

  4. SIMPLIFY indicates whether the result should be simplified

tapply is used to apply a function over subsets of a vector

Arguments:

  1. a vector x

  2. a factor or a list of factors Index - if not factors they will be coerced by as.factors()

  3. ...

  4. SIMPLIFY

Split takes a vector or other objects and splits in into groups determined by a factor or list of factors. After you split you these objects you can then use a loop function (often is lapply)...

Arguments:

  1. a vector, list or data frame x

  2. a factor (or else as.factor()) f

  3. drop = FALSE, indicates whether empty factors should be dropped

You can combine more than one factor when splitting passing a list of factors to the argument f

Debugging tools: Diagnosing the problem

How to know that there is a problem?

  1. message - A generic notification produced by the message function – execution of the function continues

  2. warning - An indication that something is wrong but not necessarily fatal produced by the warning function – execution continues

  3. error - An indication that a fatal problem has occurred produced by the stop function – execution stops

All of this are considered CONDITIONS which is a generic concept for indicating that something unexpected can occur – this can be created by the programmer according to their own conditions

Debugging Tools in R:

  1. traceback - prints out the function stack only after an error

  2. recover - allows you to modify the error behavior so that you can browse the function call stack

  3. debug - flags a function for “debug” mode which allows you to step through execution of a function one line at a time

  4. browser - suspends the execution of a function wherever is called and puts the function in “debug” mode

  5. trace - allows you to insert debugging code into a function at specific places

Generating Random Numbers

:

  • rnorm -> generate random Normal variates with a given mean and SD

  • dnorm -> evaluate the Normal probability density (with a given mean and SD) at a point (or vector of points)

  • pnorm -> evaluate the cumulative distribution function for a Normal distribution

  • rpois -> generate random Poisson variates with a given rate

set.seed() -> always set the seed before running random numbers so you can replicate after

sample() -> draw randomly from a specified set of objects

R Profiler

Profiling is a systematic way to examine how much time is spend in different parts of a program (useful for optimization).

system.time() -> takes an R expression as input and returns the amount of time taken to evaluate the expression (if there is an error it will return the time until it occurred.

  1. user time: time charged to the CPU(s) for this expression

  2. elapsed time:“wall clock” time

  • Elapsed time > user time -> CPU spends a lot of time waiting around

  • Elapsed time < user time -> Machine has multiole cores/processors

Rprof() -> start the profiler in R summaryRprof() -> summarizes into readable form the output from Rprof()

Rprof() keeps track to the function call stack at regularly sampled intervals and tabulates how much time is spend in each function

summaryRprof() has two methods:

  1. “by.total” divides the time spend in each function by the total run time

  2. “by.self” does te same but first substracts out the time spent in fucntions above the call stack