Statistical Computing for Data Analysis
Data types are the fundamental buildings. Data structures organize these data types. ## Common data structures in R
Vector
A one-dimensional sequence of values, all of the same type
Matrix
A two-dimensional array of values, all of the same type
List
A collection of objects that can be of different types and
structures
Data frame
A table-like structure where each column is a vector; different columns
may have different types
Factor
A special object used to represent categorical data
## [1] 7 8 10 45
## [1] TRUE
c() function returns a vector containing all its
arguments in specified order1:5 is shorthand for c(1,2,3,4,5), and so
onx[1] would be the first element, x[4] the
fourth element, and x[-4] is a vector containing all
but the fourth elementvector(length=n) returns an empty vector of length
n; helpful for filling things up later
## [1] FALSE FALSE FALSE FALSE FALSE
## [1] 0 0 0 0 8
c() (concatenate function):seq(): for creating a sequence of valuesrep(): repeating a valueifelse(): creating a vector where the value of each
entry depends on a logical rule.Arithmetic operator apply to vectors in a “componentwise” fashion
## [1] 0 0 0 0
## [1] -49 -64 -100 -2025
No - but they must contain elements of the type
## [1] TRUE FALSE TRUE FALSE
## [1] "red" "blue" "green" "orange"
Most of the R operations and functions are vectorized. Namely, they apply to vectors and not just scalars.
## [1] 1 4 9 16 25 36 49 64 81 100
^2 applies to each element in the vector.Basic R operations follow a recyling rule: if two vectors have different lengths, the shorter one will be repeated to match the length of the longer one.
## [1] 0 0 3 37
## [1] 7.000000 1.000000 0.100000 6.708204
Single numbers are vectors of length 1 for purposes of recycling:
## [1] 14 16 20 90
===
Can do component wise comparisons with vectors:
## [1] FALSE FALSE TRUE TRUE
Logical operators also work elementwise:
## [1] FALSE FALSE TRUE FALSE
===
Many functions can take vectors as arguments:
mean(), median(), sd(),
var(), max(), min(),
length(), and sum() return single numberssort() returns a new vectorhist() takes a vector of numbers and produces a
histogram, a highly structured object, with the side effect of making a
plotecdf() similarly produces a cumulative-density-function
objectsummary() gives a five-number summary of numerical
vectorsany() and all() are useful on Boolean
vectorsVector of indices:
## [1] 8 45
Vector of negative indices:
## [1] 8 45
## [1] 7 8 10 45
## [1] FALSE FALSE TRUE TRUE
## [1] 10 45
## [1] -10 -45
which() gives the elements of a Boolean vector that are
TRUE:
## [1] 3 4
## [1] -10 -45
We can give names to elements/components of vectors, and index vectors accordingly
## [1] "v1" "v2" "v3" "fred"
## fred v1
## 45 7
Note: here R is printing the labels, these are not additional
components of x
names() returns another vector (of characters):
## [1] "fred" "v1" "v2" "v3"
## [1] 4
A matrix is a specialization of a 2d array (or a 2d generalization of a vector). - Handy for algebraic operation. - Highly efficient for numerical computation - Matrices do not need to be numeric (could be a matrix of characters). But all the elements must be of the same data type.
Useful to think of a matrix as a long vector that is being wrapped into a pre-specified number of rows and columns.
matrix(x, nrow = , ncol =):## [,1] [,2]
## [1,] 40 60
## [2,] 1 3
## [1] TRUE
## [1] TRUE
ncol for the number of columnsbyrow=TRUEmatrix will recycle the inputs until it has
nrow*ncol elements, following the recycling rule.cbind combines two (or more) matrices by column
(i.e. matrices are put side-by-side and stuck together).rbind combines two (or more) matrices by row
(i.e. matrices are vertically stacked together).c(A) will flatten a matrix into a vector. Equivalently,
as.vector(A) does the same thing.as.matrix type-casts a vector into a matrix with a
single column (values remain unchanged).Matrices have its own special multiplication operator, written
%*%:
## [,1] [,2] [,3]
## [1,] 7 7 7
## [2,] 7 7 7
## [,1] [,2] [,3]
## [1,] 700 700 700
## [2,] 28 28 28
Can also multiply a matrix and a vector
Row/column sums, or row/column means:
## [1] 100 4
## [1] 41 63
## [1] 50 2
## [1] 20.5 31.5