<- 5
x x
[1] 5
Created by Ross Ithaka and Robert Gentlemen of University of Auckland in 1993. It was derived from commercial S programming language (no kidding) which was created in 1976.
Version 1.0.0 is released in 2000. Current version is 4.1.1.
In 2017, CRAN (official package manager) had more than 10,000 packages. Today it has 18,214 packages on CRAN.
Ranked as the 9th most popular language in TIOBE index as of September 2021.
(Personal opinions)
One of the two most powerful scripting languages in data analysis with Python. (Julia, first released in 2012, is an emerging third.)
Syntax and style focused on more non-computer scientists. (Especially tidyverse)
Excellently curated and managed package manager (CRAN).
A powerhouse focused on data analytics. Many packages include implementations of novel research papers which cannot be found elsewhere.
Supported by a powerful IDE (RStudio).
Low learning curve for data analysis, visualization, publishing and interactive analysis.
Note: Python and R are not competitors. In many cases they complement each other. It is highly recommended to learn both.
Not quite popular as Python in CS community. Support is lagging behind in some areas (especially in cloud computing) compared to Python.
Despite a very convenient web framework (shiny), not greatly suited for scalable web applications without heavy modifications. (Still a great start)
Parallel computing is not native in R. So, speed can be an issue.
R keeps data in-memory.
Each disadvantage can be alleviated using a package or a solution. Its benefits far outweigh its disadvantages.
class: inverse, center, middle
R is a vector based language. When you call a function or do an operation, it is usually done for every member of the vector. (It is a powerful feature which requires some time to learn.)
Main data types are numeric
, character
and logical
. But factor
, integer
, date
, dttm
(date-time) and some other types are also very common.
Main object types are vector
, matrix
, data.frame
and list
.
Assignment operators are “<-
” and “=
”. Aside from rare exceptions, they are the same (x <- 5
is the same as x = 5
). Please be consistent in its use.
<- 5
x x
[1] 5
Numeric (double
): 1.33, 5422.22…
integer
: 3, 5, 6…Character (character
): “a”, “course”, “pizza”…
Boolean (logical
): Either TRUE
or FALSE
.
Date (date
) and date-time (dttm
): “2020-07-28
”, “2020-07-29 14:00:05.12 UTC+3
”
Factor (factor
): Numeric levels with labels of any kind.
Vector is the foundation stone of R object types. A variable with a single value is called “atomic” vector.
Vectors with multiple values can be defined using c()
(“combine”) function.
<- c("a","b","c")
x x
[1] "a" "b" "c"
A vector can have only a single data type. R conveniently converts vectors to the most appropriate data type.
<- c(1,"hi",FALSE) ## Vector of numeric, character and logical values
x ## converted to all character x
[1] "1" "hi" "FALSE"
Matrix is simply a two dimensional special vector.
<-matrix(1:9, ncol=3, nrow=3)
mat1 mat1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
We can get a value from a matrix by providing its location as row/column coordinates or by simply by treating it as a vector.
2,2] mat1[
[1] 5
5] mat1[
[1] 5
Data frame object type is still two dimensional but each column can be of a different data type.
<- data.frame(some_numbers=1:3,
df1 some_names=c("Blood","Sweat","Tears"),
some_logical=c(TRUE,FALSE,TRUE))
df1
some_numbers some_names some_logical
1 1 Blood TRUE
2 2 Sweat FALSE
3 3 Tears TRUE
Data frames are extremely powerful structures. Most of our work will be on data frames.
Note: In dplyr
package we will see a special version of data frames: tibble
.
Lists are like vectors but they can hold any object (including lists). You can also add names to lists.
<- list(data_frame = df1,matrix = mat1,vector= x)
list1 list1
$data_frame
some_numbers some_names some_logical
1 1 Blood TRUE
2 2 Sweat FALSE
3 3 Tears TRUE
$matrix
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
$vector
[1] "1" "hi" "FALSE"
Functions are very useful types as they allow to run reusable code with dynamic inputs. For example, let’s write a function to calculate the area of a triangle.
<- function(height,base_length){
area_of_triangle <- height*base_length/2
area return(area) ## Return value using return command
}## You can assign the result of a function to a variable
<- area_of_triangle(height = 3, base_length = 4)
x x
[1] 6
Rule of thumb is “If you need to copy paste the same code three times, write a function instead.”
R has thousands of predefined functions to make life easier.
If you want to return multiple values return a list.
class: center, middle, inverse
Complete base R document before attempting to solve these.
Write a function to convert Fahrenheit to Celsius and Celsius to Fahrenheit.
(X°C × 9/5) + 32 = Y°F
<- function(x, F_to_C = TRUE){
convert_temperature if(F_to_C){
return((x-32)*5/9)
else{
}return(x*9/5 + 32)
} }
convert_temperature(30,F_to_C = FALSE)
[1] 86
convert_temperature(86,F_to_C = TRUE)
[1] 30
Write a function to calculate the future value of an investment given annually compounding interest over an amount of years.
\[FV = X * (1 + i) ^T\]
<-
calculate_future_value function(investment, interest, duration_in_years){
return(investment * ((1 + interest) ^ duration_in_years))
}
## 100 units of investments 7% interest rate over 5 years
calculate_future_value(
investment = 100, interest = 0.07, duration_in_years = 5)
[1] 140.2552
Write a function to randomly generate n color hex codes. You can use letters
predefined vector.
<- function(n=1){
generate_hex_code <- c(0:9,letters[1:6])
hex_vec <- c()
colors for(i in 1:n){
<- c(colors,
colors paste0("#",
paste0(sample(hex_vec,6,replace=TRUE),collapse="")))
}return(colors)
}
generate_hex_code(n=3)
[1] "#364395" "#e210ea" "#cce007"
Write a function which calculates the probability of getting k sixes in n throws of a die. Hint: Use binomial distribution.
<- function(k,n){
get_prob_dice <- factorial(n)/(factorial(k) * factorial(n-k))
combination <- (1/6)^k * (5/6)^(n-k)
probability return(combination*probability)
}
get_prob_dice(3,5)
[1] 0.03215021
dbinom(3,5,prob=1/6) ## or simply use dbinom
[1] 0.03215021
Write a rock scissors paper game which computer randomly chooses
<- function(user,choices=c("rock","scissors","paper")){
rsp_game if(!(user %in% choices))
return("Choose only rock, scissors or paper as input.")
<- sample(choices,1)
response if(user == response)
return("I chose the same. Tie!")
if((user == "rock" & response == "scissors") |
== "scissors" & response == "paper") |
(user == "paper" & response == "rock")){
(user return(paste0("I chose ", response, ". You win!"))
else{
}return(paste0("I chose ", response, ". You lose!"))
} }
rsp_game("rock")
[1] "I chose scissors. You win!"
Check course webpage for more exercises!