Loop Functions

lapply -> loop over a list and evaluate a function on each element
saplly -> same as lapply but tries to simplify the result
apply -> apply the functions over the margins of an array
tapply -> apply the functions over a subsets of a vector
mapply -> multivariate version of lapply

lapply takes three arguments:

a list x
a function FUN
...

It always returns a list regardless of the primary input.

You can create a function “on the fly” inside lapply that only exists inside lapply <- Anonymous function

sapply is similar to lapply but simplifies the results:

If the results is a list where every element is length 1, then a vector is returned
If the results is a list where every element is a vector of the same length (>1), a matrix is returned
If it can’t figure it out, a list is returned

apply is used to evaluate a function over the margins of an array:

Often used to apply a function to the row or columns of a matrix
Can be used with general arrays
Not faster than a loop but it is shorter

apply arguments are:

an array x
an integer MARGIN vector indication which margins should be retained
the function
...

Often used to apply a function to the row or columns of a matrix
Can be used with general arrays
Not faster than a loop but it is shorter

mapply multivariate apply which applies a function in parallel over Arguments:

the function FUN
...
a list of other arguments for the function MoreArgs = NULL
SIMPLIFY indicates whether the result should be simplified

tapply is used to apply a function over subsets of a vector

Arguments:

a vector x
a factor or a list of factors Index - if not factors they will be coerced by as.factors()
...
SIMPLIFY

Split takes a vector or other objects and splits in into groups determined by a factor or list of factors. After you split you these objects you can then use a loop function (often is lapply)...

Arguments:

a vector, list or data frame x
a factor (or else as.factor()) f
drop = FALSE, indicates whether empty factors should be dropped

You can combine more than one factor when splitting passing a list of factors to the argument f

Debugging tools: Diagnosing the problem

How to know that there is a problem?

message - A generic notification produced by the message function – execution of the function continues
warning - An indication that something is wrong but not necessarily fatal produced by the warning function – execution continues
error - An indication that a fatal problem has occurred produced by the stop function – execution stops

All of this are considered CONDITIONS which is a generic concept for indicating that something unexpected can occur – this can be created by the programmer according to their own conditions

Debugging Tools in R:

traceback - prints out the function stack only after an error
recover - allows you to modify the error behavior so that you can browse the function call stack
debug - flags a function for “debug” mode which allows you to step through execution of a function one line at a time
browser - suspends the execution of a function wherever is called and puts the function in “debug” mode
trace - allows you to insert debugging code into a function at specific places

Generating Random Numbers

rnorm -> generate random Normal variates with a given mean and SD
dnorm -> evaluate the Normal probability density (with a given mean and SD) at a point (or vector of points)
pnorm -> evaluate the cumulative distribution function for a Normal distribution
rpois -> generate random Poisson variates with a given rate

set.seed() -> always set the seed before running random numbers so you can replicate after

sample() -> draw randomly from a specified set of objects

R Profiler

Profiling is a systematic way to examine how much time is spend in different parts of a program (useful for optimization).

system.time() -> takes an R expression as input and returns the amount of time taken to evaluate the expression (if there is an error it will return the time until it occurred.

user time: time charged to the CPU(s) for this expression
elapsed time:“wall clock” time

Elapsed time > user time -> CPU spends a lot of time waiting around
Elapsed time < user time -> Machine has multiole cores/processors

Rprof() -> start the profiler in R summaryRprof() -> summarizes into readable form the output from Rprof()

Rprof() keeps track to the function call stack at regularly sampled intervals and tabulates how much time is spend in each function

summaryRprof() has two methods:

“by.total” divides the time spend in each function by the total run time
“by.self” does te same but first substracts out the time spent in fucntions above the call stack