R Basics: Data Types and Operators

Statistical Computing for Data Analysis

How R Runs Code

R runs code interactively
- You type a line of code and R evaluates it immediately.
- Results appear right away in the Console.
There is no separate “compile” step
- You do not need to compile an entire program before running code.
- Each command is evaluated as it is submitted.
Your work lives in a session
- Objects you create (data, models, plots) stay in memory.
- You can build on previous commands step by step.
This supports exploration
- Easy to inspect data, try ideas, fix mistakes, and rerun code.
- Well-suited for data analysis and statistics.
R Markdown uses the same model—but with discipline
- When you knit, R starts a fresh session and runs the document from top to bottom.
- This forces all data, packages, and objects to be created explicitly, improving reproducibility.

Outline

Data Types
Numerical and logical operations
Inspecting data
Assignment
Workspace and reproducibility

Two basic types of things/objects: data and functions

Data: things like 7, “seven”, \(7.000\), and \(\left[ \begin{array}{ccc} 7 & 7 & 7 \\ 7 & 7 & 7\end{array}\right]\)
Functions: things like log, +, <, %% , and mean.

A function is a tool that takes input (arguments), applies a rule, and returns an output. Some functions also produce side effects, such as making a plot or printing output.

Programming is writing functions to transform inputs into outputs.

In R, everything is an object: data, functions, and results
Objects can exist with or without names
Functions take objects as input and return new objects
Complex programming tasks are built by combining simple functions and understanding how objects in R behave.

The trick to good programming is to take a big transformation and break it down into smaller ones, and then break those down, until you come to tasks which are easy (using built-in functions). This is refer to as modularity. It also makes code much more readable and easier to debug.

Data Types

At a low level, computers store all data using bits and manipulate data values.

In R, data has specific types. When we talk about data types in R, we mean things like:

"logical" (e.g. TRUE, FALSE)
"integer"
"numeric" (e.g. 3.245, pi, floating-point values)
"character" (e.g ``hello world”)

These describe how values are stored in memory, not how they are interpreted.

Operators are functions that act on data
The type of the data determines:
- Which operators are allowed
- What the result will be

Operators

Unary: take just one argument. E.g., - for arithmetic negation, ! for Boolean negation
Binary: take two arguments. E.g., +, -, *, and / (though this is only a partial operator). Also, %% (for mod), and ^ (again partial)

-7

## [1] -7

7 + 5

## [1] 12

7 - 5

## [1] 2

===

7 * 5

## [1] 35

7 ^ 5

## [1] 16807

7 / 5

## [1] 1.4

7 %% 5

## [1] 2

Comparison operators

These are also binary operators; they take two objects, and give back a Boolean

7 > 5

## [1] TRUE

7 < 5

## [1] FALSE

7 >= 7

## [1] TRUE

===

7 <= 5

## [1] FALSE

7 == 5

## [1] FALSE

7 != 5

## [1] TRUE

Warning: == is a comparison operator, = is not!

Logical operators

These basic ones are & (and) and | (or)

(5 > 7) & (6 * 7 == 42)

## [1] FALSE

(5 > 7) | (6 * 7 == 42)

## [1] TRUE

Note: The double forms && and || are different! We’ll see them later.

Type-telling functions

Certain functions can tell you whether the data is of a certain type:

The typeof() function returns the data type
is.foo() functions return Booleans for whether the argument is of type foo
is.numeric()
is.character()

This can also be extended to data structures (as we will see later.)

typeof(7)

## [1] "double"

is.numeric(7)

## [1] TRUE

is.na(7)

## [1] FALSE

is.na(7/0)

## [1] FALSE

is.na(0/0)

## [1] TRUE

is.character("FALSE")

## [1] TRUE

is.character(FALSE)

## [1] FALSE

===

is.character(7)

## [1] FALSE

is.character("7")

## [1] TRUE

is.character("seven")

## [1] TRUE

is.na("seven")

## [1] FALSE

Type-casting functions

These functions can convert one type type to another data type. We’ll see that this is incredibly useful later when you need objects to be of certain types.

as.character(5/6)

## [1] "0.833333333333333"

as.numeric(as.character(5/6))

## [1] 0.8333333

6 * as.numeric(as.character(5/6))

## [1] 5

5/6 == as.numeric(as.character(5/6))

## [1] FALSE

Data can have names

We can give names to data objects; these give us variables. Some variables are built-in:

pi

## [1] 3.141593

Variables can be arguments to functions or operators, just like constants:

pi * 10

## [1] 31.41593

cos(pi)

## [1] -1

===

We create variables with the assignment operator, <- or =

approx.pi = 22/7
approx.pi

## [1] 3.142857

diameter = 10
approx.pi * diameter

## [1] 31.42857

The assignment operator also changes values:

circumference = approx.pi * diameter
circumference

## [1] 31.42857

circumference = 30
circumference

## [1] 30

===

The code you write will use variables with descriptive names
Easier to design, easier to debug, easier to improve, and easier for others to read
This is a first step toward abstraction: focusing on what a value represents, rather than how it is computed

R workspace

Your current R environment is called the workspace. It contains all the variables (named data types), data structures, and functions you defined.

Your workspace lives in your compute memory and will be cleared once you exit RStudio provided you do not save the workspace image.

To see what variables are in your workspace, use

x = 5
ls()

## [1] "approx.pi"     "circumference" "diameter"      "x"

Getting rid of variables:

rm(x)
ls()

## [1] "approx.pi"     "circumference" "diameter"

rm(list=ls()) # Be warned! This erases everything
ls()

## character(0)

Reproducibility

For reproducibility, you want any meaningful work to be fully repeatable by others. Namely, results should be reproduced by running the code you saved, not by saving the results directly.
Thus, as already mentioned, saving your workspace onto your hard disk is not a good idea.
What is a good idea? Writing clear, well-organized, well-documented code in .R or .Rmd file.