R Basics: Vectors and Matrices

Statistical Computing for Data Analysis

Outline

Data Structures

Data types are the fundamental buildings. Data structures organize these data types. ## Common data structures in R

Vectors

x = c(7, 8, 10, 45)
x
## [1]  7  8 10 45
is.vector(x)
## [1] TRUE

vector(length=n) returns an empty vector of length n; helpful for filling things up later

weekly.hours = vector(length=5)
weekly.hours
## [1] FALSE FALSE FALSE FALSE FALSE
weekly.hours[5] = 8
weekly.hours
## [1] 0 0 0 0 8

Other Functions for Building a Vector

Vector arithmetic

Arithmetic operator apply to vectors in a “componentwise” fashion

y = c(-7, -8, -10, -45)
x + y
## [1] 0 0 0 0
x * y
## [1]   -49   -64  -100 -2025

Are vectors in R only contain numeric data types?

No - but they must contain elements of the type

c(TRUE,FALSE,TRUE,FALSE)
## [1]  TRUE FALSE  TRUE FALSE
c("red","blue","green","orange")
## [1] "red"    "blue"   "green"  "orange"

Vectorized Operations

Most of the R operations and functions are vectorized. Namely, they apply to vectors and not just scalars.

(1:10)^2
##  [1]   1   4   9  16  25  36  49  64  81 100

Basic R operations follow a recyling rule: if two vectors have different lengths, the shorter one will be repeated to match the length of the longer one.

x + c(-7,-8)
## [1]  0  0  3 37
x^c(1,0,-1,0.5)
## [1] 7.000000 1.000000 0.100000 6.708204

Single numbers are vectors of length 1 for purposes of recycling:

2 * x
## [1] 14 16 20 90

===

Can do component wise comparisons with vectors:

x > 9
## [1] FALSE FALSE  TRUE  TRUE

Logical operators also work elementwise:

(x > 9) & (x < 20)
## [1] FALSE FALSE  TRUE FALSE

===

Functions on vectors

Many functions can take vectors as arguments:

Indexing vectors

Vector of indices:

x[c(2,4)]
## [1]  8 45

Vector of negative indices:

x[c(-1,-3)]
## [1]  8 45

Indexing with a Boolean vector:

x
## [1]  7  8 10 45
x > 9
## [1] FALSE FALSE  TRUE  TRUE
x[x > 9]
## [1] 10 45
y[x > 9]
## [1] -10 -45

which() gives the elements of a Boolean vector that are TRUE:

places = which(x > 9)
places
## [1] 3 4
y[places]
## [1] -10 -45

Named components

We can give names to elements/components of vectors, and index vectors accordingly

names(x) = c("v1","v2","v3","fred")
names(x)
## [1] "v1"   "v2"   "v3"   "fred"
x[c("fred","v1")]
## fred   v1 
##   45    7

Note: here R is printing the labels, these are not additional components of x

names() returns another vector (of characters):

names(y) = names(x)
sort(names(x))
## [1] "fred" "v1"   "v2"   "v3"
which(names(x) == "fred")
## [1] 4

Matrix

A matrix is a specialization of a 2d array (or a 2d generalization of a vector). - Handy for algebraic operation. - Highly efficient for numerical computation - Matrices do not need to be numeric (could be a matrix of characters). But all the elements must be of the same data type.

Creating a matrix

Useful to think of a matrix as a long vector that is being wrapped into a pre-specified number of rows and columns.

z.mat = matrix(c(40,1,60,3), nrow=2)
z.mat
##      [,1] [,2]
## [1,]   40   60
## [2,]    1    3
is.array(z.mat)
## [1] TRUE
is.matrix(z.mat)
## [1] TRUE

Combining matrices and coverting matrices

Matrix multiplication

Matrices have its own special multiplication operator, written %*%:

six.sevens = matrix(rep(7,6), ncol=3)
six.sevens
##      [,1] [,2] [,3]
## [1,]    7    7    7
## [2,]    7    7    7
z.mat %*% six.sevens # [2x2] * [2x3]
##      [,1] [,2] [,3]
## [1,]  700  700  700
## [2,]   28   28   28

Can also multiply a matrix and a vector

Row/column manipulations

Row/column sums, or row/column means:

rowSums(z.mat)
## [1] 100   4
colSums(z.mat)
## [1] 41 63
rowMeans(z.mat)
## [1] 50  2
colMeans(z.mat)
## [1] 20.5 31.5

Indexing matrices

Very similar to a vector, except now you need [row,col].

z.mat 
##      [,1] [,2]
## [1,]   40   60
## [2,]    1    3
z.mat[1,2]
## [1] 60

Can also index by names or by boolean vector.

which(z.mat>10)
## [1] 1 3
which(z.mat>10, arr.ind=TRUE)
##      row col
## [1,]   1   1
## [2,]   1   2