Data Structures: List and Data Frames

Statistical Computing for Data Analysis

Data Structures

Data types are the fundamental buildings. Data structures organize these data types. ## Common data structures in R

Lists

A list is the most general form of vectors in R.

List entries can be of any type and can have mixed types

l <- list(1:2, c("hat","mat","dat"))
l
## [[1]]
## [1] 1 2
## 
## [[2]]
## [1] "hat" "mat" "dat"

List entries can be named:

lNamed <- list(foo = 1:2, bar = c("hat","mat","dat"))

lNamed
## $foo
## [1] 1 2
## 
## $bar
## [1] "hat" "mat" "dat"

Most of what you can do with vectors you can also do with lists

Accessing pieces of lists

l[2]
## [[1]]
## [1] "hat" "mat" "dat"
l[[2]]
## [1] "hat" "mat" "dat"
l[[2]][1]
## [1] "hat"

Expanding and contracting lists

Add to lists with c() (also works with vectors):

l1 <- c(list(TRUE),l)
l1
## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] 1 2
## 
## [[3]]
## [1] "hat" "mat" "dat"
str(l1)
## List of 3
##  $ : logi TRUE
##  $ : int [1:2] 1 2
##  $ : chr [1:3] "hat" "mat" "dat"

append(x, values, after) works similarly:

l2 <- append(l, list(TRUE),after =0)
## after specifies the subscript position 
l2
## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] 1 2
## 
## [[3]]
## [1] "hat" "mat" "dat"
str(l2)
## List of 3
##  $ : logi TRUE
##  $ : int [1:2] 1 2
##  $ : chr [1:3] "hat" "mat" "dat"

Set a list entry NULL in order to remove it:

l1[2:3] <- NULL
str(l1)
## List of 1
##  $ : logi TRUE

Flattening lists

unlist flattens a list into vector. If it contains mixed types, type conversion will be done automatically.

l3 <- c(list(1) ,l)
l3
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 1 2
## 
## [[3]]
## [1] "hat" "mat" "dat"
unlist(l3)
## [1] "1"   "1"   "2"   "hat" "mat" "dat"

Names in lists

We can name some or all of the elements of a list:

my.dist = list("exponential", 7, FALSE)
my.dist
## [[1]]
## [1] "exponential"
## 
## [[2]]
## [1] 7
## 
## [[3]]
## [1] FALSE
names(my.dist) = c("family","mean","is.symmetric")
my.dist
## $family
## [1] "exponential"
## 
## $mean
## [1] 7
## 
## $is.symmetric
## [1] FALSE
my.dist[["family"]]
## [1] "exponential"
my.dist["family"]
## $family
## [1] "exponential"

===

In addition to indexing, lists have a special shortcut way of using names, with $:

my.dist[["family"]]
## [1] "exponential"
my.dist$family
## [1] "exponential"

Adding named elements:

my.dist$was.estimated = FALSE
my.dist[["last.updated"]] = "2026-01-01"

Key-value pairs

# Defining a list with named components (key-value pairs)
normal_dist <- list(
  family = "Gaussian",
  mean = 0,
  sd = 1,
  link = "identity"
)

# Retrieval by name (position-agnostic)
normal_dist$family
## [1] "Gaussian"
# [1] "Gaussian"

# Alternative syntax using character strings
normal_dist[["family"]]
## [1] "Gaussian"
# [1] "Gaussian"

Data frames

Matrix vs. Data frame

a.mat = matrix(c(35,8,10,4), nrow=2)
colnames(a.mat) = c("v1","v2")
a.mat
##      v1 v2
## [1,] 35 10
## [2,]  8  4
a.mat[,"v1"] 
## [1] 35  8
# Try a.mat$v1 and see what happens
a.mat$v1

Error in a.mat$v1 : $ operator is invalid for atomic vectors
a.df = data.frame(a.mat,logicals=c(TRUE,FALSE))
a.df
##   v1 v2 logicals
## 1 35 10     TRUE
## 2  8  4    FALSE
a.df$v1
## [1] 35  8
a.df[,"v1"]
## [1] 35  8
a.df[1,]
##   v1 v2 logicals
## 1 35 10     TRUE
colMeans(a.df)
##       v1       v2 logicals 
##     21.5      7.0      0.5

Similar to working on a matrix