Statistical Computing for Data Analysis
if() and elseUse if() and else to decide whether to
evaluate one block of code or another, depending on a condition
## [1] 0.5
if() needs to give one TRUE
or FALSE valueelse statement is optionalelse if()We can use else if() arbitrarily many times following an
if() statement
## [1] 5
else if() only gets considered if the conditions
above it were not TRUEelse statement gets evaluated if none of the above
conditions were TRUEelse statement is optionalAnother example:
today <- "Tuesday"
if (today == "Monday") {
writeLines("Tell me a joke")
} else if (today == "Tuesday") {
writeLines("Work work")
} else if (today == "Friday") {
writeLines("Ready to break")
} else {
writeLines("Have some fun")
}## Work work
In the ifelse() function we specify a condition, then a
value if the condition holds, and a value if the condition fails
## [1] 2
Exactly equivalent to:
## [1] 2
One advantage of ifelse() is that it vectorizes
nicely
We now look at && and ||, which are short-circuiting logical operations
## [1] 0.88384946 0.66995706 -0.82529685 -0.97272948 -0.09800783 0.60526684
## [7] -0.82985530 0.42879982 0.38784757 -0.75970069
## [1] 0.8838495 0.6699571 -0.8252969 -0.9727295 999.0000000 0.6052668
## [7] -0.8298553 999.0000000 999.0000000 -0.7597007
# Safe guard using &&: If length(u.vec) > 0 is FALSE, R never evaluates min(u.vec)
if (length(u.vec) > 0 && min(u.vec) < 0) {
print("Vector has at least one negative value.")
}## [1] "Vector has at least one negative value."
## [1] 0.8838495 0.6699571 -0.8252969 -0.9727295 999.0000000 0.6052668
## [7] -0.8298553 999.0000000 999.0000000 -0.7597007
# Check whether we should skip further computation
if (any(u.vec > 0.9) || mean(u.vec) > 0.5) {
print("Large values detected.")
}## [1] "Large values detected."
Similarly, for ||, if the first condition evaluates to TRUE, then the second condition will not be evaluated
Rule of thumb: use & and | for
indexing or subsetting, and && and ||
for conditionals
Computers excel at doing the same simple operation repeatedly without getting tired or distracted. Humans do not. Iteration is therefore one of the core ideas behind programming: we write a rule once, and let the computer apply it many times.
In R, there are several ways to express iteration. Choosing the right one affects clarity, correctness, and performance.
Main iteration paradigms in R:
for() and while()
loops: explicit, step-by-step control of repetition
Vectorization: operating on entire vectors at once (preferred whenever feasible)
The apply() family: base R
alternatives to explicit loops
We will start with explicit loops, because they make the logic of iteration transparent.
for() loopsA for() loop iterates a counter variable over a
vector. On each iteration, the loop executes a block of code (the
body) using the current value of the counter.
n <- 10
log.vec <- vector(length = n, mode = "numeric")
for (i in 1:n) {
log.vec[i] <- log(i)
}
log.vec## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
## [8] 2.0794415 2.1972246 2.3025851
i is the counter1:n is the vector being iterated over{} is the body of the
loopThis pattern—initialize, iterate, update—is extremely common.
breakSometimes we want to stop iterating as soon as a condition is met.
The break statement immediately exits the loop.
n <- 10
log.vec <- vector(length = n, mode = "numeric")
for (i in 1:n) {
if (log(i) > 2) {
cat("Stopping early: log(i) exceeded 2\n")
break
}
log.vec[i] <- log(i)
}## Stopping early: log(i) exceeded 2
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
## [8] 0.0000000 0.0000000 0.0000000
This is useful when:
continuing the loop would be wasteful, or
the stopping point is data-dependent
The counter variable does not have to be numeric. It simply takes values from a vector.
This is often clearer than looping over indices.
An even safer version is to use seq_along.
A loop body can itself contain another loop.
## 1
## 1 2 3 4
## 1 2 3 4 5 6 7 8 9
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
while() loopsA while() loop repeats its body as long as a
condition remains true. The number of iterations is not fixed
in advance.
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
Key features:
The stopping rule is evaluated before each iteration
You must manually update variables so that the condition eventually becomes false
for() versus while()Use a for() loop when: - the number of
iterations is known in advance, or
Use a while() loop when: - you only
know how to stop once you get there
Conceptually:
Every for() loop can be rewritten as a
while() loop
Not every while() loop can be rewritten as a
for() loop
for() loops are the default choice when structure is
clearwhile() loops handle data-dependent stopping rules