Data Structures: Factors in R

Statistical Computing for Data Analysis

Data Structures

Data types are the fundamental buildings. Data structures organize these data types. ## Common data structures in R

Factor

Penguins dataset

library(ISLR2) 
str(Credit)
## 'data.frame':    400 obs. of  11 variables:
##  $ Income   : num  14.9 106 104.6 148.9 55.9 ...
##  $ Limit    : num  3606 6645 7075 9504 4897 ...
##  $ Rating   : num  283 483 514 681 357 569 259 512 266 491 ...
##  $ Cards    : num  2 3 4 3 2 4 2 2 5 3 ...
##  $ Age      : num  34 82 71 36 68 77 37 87 66 41 ...
##  $ Education: num  11 15 11 11 16 10 12 9 13 19 ...
##  $ Own      : Factor w/ 2 levels "No","Yes": 1 2 1 2 1 1 2 1 2 2 ...
##  $ Student  : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 1 2 ...
##  $ Married  : Factor w/ 2 levels "No","Yes": 2 2 1 1 2 1 1 1 1 2 ...
##  $ Region   : Factor w/ 3 levels "East","South",..: 2 3 3 3 2 2 1 3 2 1 ...
##  $ Balance  : num  333 903 580 964 331 ...

Working with factors

Factors in R functions

Many R functions behave nicely when applied on a factor.

The order of factor levels can affect statistical interpretations and visualizations.