Writing Functions in R

Statistical Computing for Data Analysis

Why write functions?

As your code grows, copy/paste becomes a problem:

A function lets you:

The basic template

name <- function(arg1, arg2) {
  # compute something
  result <- ...
  return(result)  # optional (R returns the last expression)
}

Key pieces:

A first example: Celsius → Fahrenheit

c_to_f <- function(celsius) {
  9/5 * celsius + 32
}

c_to_f(0)
## [1] 32
c_to_f(c(0, 10, 20))
## [1] 32 50 68

Notes:

Arguments and defaults

You can give arguments default values.

clip <- function(x, lo = 0, hi = 1) {
  pmin(pmax(x, lo), hi)
}

clip(c(-2, -0.1, 0.3, 1.2))
## [1] 0.0 0.0 0.3 1.0
clip(c(-2, -0.1, 0.3, 1.2), lo = -1, hi = 2)
## [1] -1.0 -0.1  0.3  1.2

Defaults make functions easier to use while still flexible.

Named arguments vs position

clip(x = c(-1, 0.2, 5), lo = 0, hi = 1)
## [1] 0.0 0.2 1.0
clip(c(-1, 0.2, 5), 0, 1)
## [1] 0.0 0.2 1.0

Recommendation for readability:

Returning multiple values

Return a list if you want more than one thing.

summarize_vec <- function(x) {
  list(
    n = length(x),
    mean = mean(x),
    sd = sd(x),
    min = min(x),
    max = max(x)
  )
}

out <- summarize_vec(c(1, 2, 3, 10))
out
## $n
## [1] 4
## 
## $mean
## [1] 4
## 
## $sd
## [1] 4.082483
## 
## $min
## [1] 1
## 
## $max
## [1] 10
out$mean
## [1] 4

Scope: variables inside vs outside

Variables created inside a function typically stay inside.

f <- function(x) {
  y <- x^2
  y
}

f(3)
## [1] 9
# y  # would error here

This is a feature: functions help avoid accidental name collisions.

Input checks (defensive programming)

A function should fail early with a useful message.

safe_log <- function(x) {
  if (!is.numeric(x)) stop("x must be numeric")
  if (any(x <= 0)) stop("x must be positive")
  log(x)
}

safe_log(c(1, 2, 10))
## [1] 0.0000000 0.6931472 2.3025851
safe_log(c(1, -2, 3)) ## this line works interactively. But if you tried to knit, any error stops the entire document build unless you explicitly allow errors.
## Error in `safe_log()`:
## ! x must be positive

stop(), warning(), and message()

A slightly bigger example: standardize a numeric vector

Goal: convert to z-scores: \(z_i = (x_i - \bar{x})/s\)

zscore <- function(x, na.rm = TRUE) {
  if (!is.numeric(x)) stop("x must be numeric")

  mu <- mean(x, na.rm = na.rm)
  s  <- sd(x, na.rm = na.rm)

  if (s == 0) stop("sd is 0; cannot standardize")

  (x - mu) / s
}

zscore(c(10, 12, 14))
## [1] -1  0  1
zscore(c(10, NA, 14))
## [1] -0.7071068         NA  0.7071068

Writing loop-based functions (when needed)

Sometimes you need explicit loops (e.g., custom algorithms).

cumprod_loop <- function(x) {
  if (!is.numeric(x)) stop("x must be numeric")

  out <- numeric(length(x))
  prod_so_far <- 1

  for (i in seq_along(x)) {
    prod_so_far <- prod_so_far * x[i]
    out[i] <- prod_so_far
  }

  out
}

cumprod_loop(c(2, 3, 4))
## [1]  2  6 24

A common undergrad pitfall: growing objects in a loop

This is slow:

# DON'T
x = rnorm(10)
out <- c()
for (i in seq_along(x)) {
  out <- c(out, f(x[i]))
}

Do this instead (pre-allocate):

x = rnorm(10)
out <- numeric(length(x))
for (i in seq_along(x)) {
  out[i] <- f(x[i])
}

Designing good functions

A good function is: