Statistical Computing for Data Analysis
log,
+, <, %% , and
mean.A function is a tool that takes input (arguments), applies a rule, and returns an output. Some functions also produce side effects, such as making a plot or printing output.
The trick to good programming is to take a big transformation and break it down into smaller ones, and then break those down, until you come to tasks which are easy (using built-in functions). This is refer to as modularity. It also makes code much more readable and easier to debug.
At a low level, computers store all data using bits and manipulate data values.
In R, data has specific types. When we talk about data types in R, we mean things like:
"logical" (e.g. TRUE, FALSE)"integer""numeric" (e.g. 3.245, pi, floating-point values)"character" (e.g ``hello world”)These describe how values are stored in memory, not how they are interpreted.
-
for arithmetic negation, ! for Boolean negation+,
-, *, and / (though this is only
a partial operator). Also, %% (for mod), and ^
(again partial)## [1] -7
## [1] 12
## [1] 2
===
## [1] 35
## [1] 16807
## [1] 1.4
## [1] 2
These are also binary operators; they take two objects, and give back a Boolean
## [1] TRUE
## [1] FALSE
## [1] TRUE
===
## [1] FALSE
## [1] FALSE
## [1] TRUE
Warning: == is a comparison operator, = is
not!
These basic ones are & (and) and |
(or)
## [1] FALSE
## [1] TRUE
Note: The double forms && and ||
are different! We’ll see them later.
Certain functions can tell you whether the data is of a certain type:
typeof() function returns the data typeis.foo() functions return Booleans for whether the
argument is of type foois.numeric()is.character()This can also be extended to data structures (as we will see later.)
## [1] "double"
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] FALSE
===
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] FALSE
These functions can convert one type type to another data type. We’ll see that this is incredibly useful later when you need objects to be of certain types.
## [1] "0.833333333333333"
## [1] 0.8333333
## [1] 5
## [1] FALSE
We can give names to data objects; these give us variables. Some variables are built-in:
## [1] 3.141593
Variables can be arguments to functions or operators, just like constants:
## [1] 31.41593
## [1] -1
===
We create variables with the assignment operator,
<- or =
## [1] 3.142857
## [1] 31.42857
The assignment operator also changes values:
## [1] 31.42857
## [1] 30
===
Your current R environment is called the workspace. It contains all the variables (named data types), data structures, and functions you defined.
Your workspace lives in your compute memory and will be cleared once you exit RStudio provided you do not save the workspace image.
To see what variables are in your workspace, use
## [1] "approx.pi" "circumference" "diameter" "x"
Getting rid of variables:
## [1] "approx.pi" "circumference" "diameter"
## character(0)
For reproducibility, you want any meaningful work to be fully repeatable by others. Namely, results should be reproduced by running the code you saved, not by saving the results directly.
Thus, as already mentioned, saving your workspace onto your hard disk is not a good idea.
What is a good idea? Writing clear, well-organized,
well-documented code in .R or .Rmd
file.