Statistical Computing for Data Analysis
Data types are the fundamental buildings. Data structures organize these data types. ## Common data structures in R
Vector
A one-dimensional sequence of values, all of the same type
Matrix
A two-dimensional array of values, all of the same type
List
A collection of objects that can be of different types and
structures
Data frame
A table-like structure where each column is a vector; different columns
may have different types
Factor
A special object used to represent categorical data
A list is the most general form of vectors in R.
List entries can be of any type and can have mixed types
## [[1]]
## [1] 1 2
##
## [[2]]
## [1] "hat" "mat" "dat"
List entries can be named:
## $foo
## [1] 1 2
##
## $bar
## [1] "hat" "mat" "dat"
Most of what you can do with vectors you can also do with lists
[ ] as with vectors[[ ]], but only with a single index
[[ ]] drops names and structures, [ ] does
not## [[1]]
## [1] "hat" "mat" "dat"
## [1] "hat" "mat" "dat"
## [1] "hat"
Add to lists with c() (also works with vectors):
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] 1 2
##
## [[3]]
## [1] "hat" "mat" "dat"
## List of 3
## $ : logi TRUE
## $ : int [1:2] 1 2
## $ : chr [1:3] "hat" "mat" "dat"
append(x, values, after) works similarly:
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] 1 2
##
## [[3]]
## [1] "hat" "mat" "dat"
## List of 3
## $ : logi TRUE
## $ : int [1:2] 1 2
## $ : chr [1:3] "hat" "mat" "dat"
Set a list entry NULL in order to remove it:
## List of 1
## $ : logi TRUE
unlist flattens a list into vector. If it contains mixed
types, type conversion will be done automatically.
## [[1]]
## [1] 1
##
## [[2]]
## [1] 1 2
##
## [[3]]
## [1] "hat" "mat" "dat"
## [1] "1" "1" "2" "hat" "mat" "dat"
We can name some or all of the elements of a list:
## [[1]]
## [1] "exponential"
##
## [[2]]
## [1] 7
##
## [[3]]
## [1] FALSE
## $family
## [1] "exponential"
##
## $mean
## [1] 7
##
## $is.symmetric
## [1] FALSE
## [1] "exponential"
## $family
## [1] "exponential"
===
In addition to indexing, lists have a special shortcut way of using
names, with $:
## [1] "exponential"
## [1] "exponential"
Adding named elements:
family,
we can look that up by name, without caring where it is (in what
position it lies) in the list# Defining a list with named components (key-value pairs)
normal_dist <- list(
family = "Gaussian",
mean = 0,
sd = 1,
link = "identity"
)
# Retrieval by name (position-agnostic)
normal_dist$family## [1] "Gaussian"
## [1] "Gaussian"
lm(),
ggplot2) are designed specifically to operate on data
frames as the primary input.rowSums(),
dim()) and summary tools (e.g., summary(),
str()) to audit your data efficiently.## v1 v2
## [1,] 35 10
## [2,] 8 4
## [1] 35 8
# Try a.mat$v1 and see what happens
a.mat$v1
Error in a.mat$v1 : $ operator is invalid for atomic vectors## v1 v2 logicals
## 1 35 10 TRUE
## 2 8 4 FALSE
## [1] 35 8
## [1] 35 8
## v1 v2 logicals
## 1 35 10 TRUE
## v1 v2 logicals
## 21.5 7.0 0.5
df[,"col"] return the column as a vectordf[,"col", drop=FALSE] returns a data frame containing
a single column. - For a computing task, you need to know what version
you need.