Statistical Computing for Data Analysis
Data types are the fundamental buildings. Data structures organize these data types. ## Common data structures in R
Vector
A one-dimensional sequence of values, all of the same type
Matrix
A two-dimensional array of values, all of the same type
List
A collection of objects that can be of different types and
structures
Data frame
A table-like structure where each column is a vector; different columns
may have different types
Factor
A special object used to represent categorical data
A type of data structure to indicate categories
Stores both levels and the order of the levels
It is more than a character vector (which does not store the order information)
The factor levels affect the behavior of the downstream function.
## 'data.frame': 400 obs. of 11 variables:
## $ Income : num 14.9 106 104.6 148.9 55.9 ...
## $ Limit : num 3606 6645 7075 9504 4897 ...
## $ Rating : num 283 483 514 681 357 569 259 512 266 491 ...
## $ Cards : num 2 3 4 3 2 4 2 2 5 3 ...
## $ Age : num 34 82 71 36 68 77 37 87 66 41 ...
## $ Education: num 11 15 11 11 16 10 12 9 13 19 ...
## $ Own : Factor w/ 2 levels "No","Yes": 1 2 1 2 1 1 2 1 2 2 ...
## $ Student : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 1 2 ...
## $ Married : Factor w/ 2 levels "No","Yes": 2 2 1 1 2 1 1 1 1 2 ...
## $ Region : Factor w/ 3 levels "East","South",..: 2 3 3 3 2 2 1 3 2 1 ...
## $ Balance : num 333 903 580 964 331 ...
factor creates a new factor with specified
levels
table, str inspect a factor
levels extracts and levels <- set
the levels
reorder(factor, numbers, FUN) reorders a factor
according to a summary statistic FUN of numbers.
Keep in mind character, numeric variables, and factors are
treated different in R.
Many R functions behave nicely when applied on a
factor.
The order of factor levels can affect statistical interpretations and visualizations.