1  Base R

1.1 Brief History of R

  • Created by Ross Ithaka and Robert Gentlemen of University of Auckland in 1993. It was derived from commercial S programming language (no kidding) which was created in 1976.

  • Version 1.0.0 is released in 2000. Current version is 4.1.1.

  • In 2017, CRAN (official package manager) had more than 10,000 packages. Today it has 18,214 packages on CRAN.

  • Ranked as the 9th most popular language in TIOBE index as of September 2021.

1.1.1 Sources

1.2 What is there to like R?

(Personal opinions)

  • One of the two most powerful scripting languages in data analysis with Python. (Julia, first released in 2012, is an emerging third.)

  • Syntax and style focused on more non-computer scientists. (Especially tidyverse)

  • Excellently curated and managed package manager (CRAN).

  • A powerhouse focused on data analytics. Many packages include implementations of novel research papers which cannot be found elsewhere.

  • Supported by a powerful IDE (RStudio).

  • Low learning curve for data analysis, visualization, publishing and interactive analysis.

Note: Python and R are not competitors. In many cases they complement each other. It is highly recommended to learn both.

1.3 What are the disadvantages of R?

  • Not quite popular as Python in CS community. Support is lagging behind in some areas (especially in cloud computing) compared to Python.

  • Despite a very convenient web framework (shiny), not greatly suited for scalable web applications without heavy modifications. (Still a great start)

  • Parallel computing is not native in R. So, speed can be an issue.

  • R keeps data in-memory.

Each disadvantage can be alleviated using a package or a solution. Its benefits far outweigh its disadvantages.

class: inverse, center, middle

1.4 Basic Features

  • R is a vector based language. When you call a function or do an operation, it is usually done for every member of the vector. (It is a powerful feature which requires some time to learn.)

  • Main data types are numeric, character and logical. But factor, integer, date, dttm (date-time) and some other types are also very common.

  • Main object types are vector, matrix, data.frame and list.

  • Assignment operators are “<-” and “=”. Aside from rare exceptions, they are the same (x <- 5 is the same as x = 5). Please be consistent in its use.

x <- 5
x
[1] 5
  • R console is completely interactive. You can run anything line by line.

1.5 Data Types

  • Numeric (double): 1.33, 5422.22…

    • There is also integer: 3, 5, 6…
  • Character (character): “a”, “course”, “pizza”…

  • Boolean (logical): Either TRUE or FALSE.

  • Date (date) and date-time (dttm): “2020-07-28”, “2020-07-29 14:00:05.12 UTC+3

    • This part is a bit complicated with POSIXct and POSIXt types.
  • Factor (factor): Numeric levels with labels of any kind.

    • Encountered rarely in this course.

1.6 Object Types

1.6.1 Vector

Vector is the foundation stone of R object types. A variable with a single value is called “atomic” vector.

Vectors with multiple values can be defined using c() (“combine”) function.

x <- c("a","b","c")
x
[1] "a" "b" "c"

A vector can have only a single data type. R conveniently converts vectors to the most appropriate data type.

x <- c(1,"hi",FALSE) ## Vector of numeric, character and logical values
x ## converted to all character
[1] "1"     "hi"    "FALSE"

1.6.2 Matrix

Matrix is simply a two dimensional special vector.

mat1<-matrix(1:9, ncol=3, nrow=3)
mat1
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

We can get a value from a matrix by providing its location as row/column coordinates or by simply by treating it as a vector.

mat1[2,2]
[1] 5
mat1[5]
[1] 5

1.6.3 Data Frame

Data frame object type is still two dimensional but each column can be of a different data type.

df1 <- data.frame(some_numbers=1:3,
                  some_names=c("Blood","Sweat","Tears"),
                  some_logical=c(TRUE,FALSE,TRUE))
df1
  some_numbers some_names some_logical
1            1      Blood         TRUE
2            2      Sweat        FALSE
3            3      Tears         TRUE

Data frames are extremely powerful structures. Most of our work will be on data frames.

Note: In dplyr package we will see a special version of data frames: tibble.

1.6.4 List

Lists are like vectors but they can hold any object (including lists). You can also add names to lists.

list1 <- list(data_frame = df1,matrix = mat1,vector= x)
list1
$data_frame
  some_numbers some_names some_logical
1            1      Blood         TRUE
2            2      Sweat        FALSE
3            3      Tears         TRUE

$matrix
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

$vector
[1] "1"     "hi"    "FALSE"

1.6.5 Functions

Functions are very useful types as they allow to run reusable code with dynamic inputs. For example, let’s write a function to calculate the area of a triangle.

area_of_triangle <- function(height,base_length){
  area <- height*base_length/2
  return(area) ## Return value using return command
}
## You can assign the result of a function to a variable
x <- area_of_triangle(height = 3, base_length = 4) 
x
[1] 6
  • Rule of thumb is “If you need to copy paste the same code three times, write a function instead.”

  • R has thousands of predefined functions to make life easier.

  • If you want to return multiple values return a list.

class: center, middle, inverse

1.7 Exercises

Complete base R document before attempting to solve these.

1.7.1 Temperature Conversion

Write a function to convert Fahrenheit to Celsius and Celsius to Fahrenheit.

(X°C × 9/5) + 32 = Y°F

convert_temperature <- function(x, F_to_C = TRUE){
  if(F_to_C){
    return((x-32)*5/9)
  }else{
    return(x*9/5 + 32)
  }
}
convert_temperature(30,F_to_C = FALSE)
[1] 86
convert_temperature(86,F_to_C = TRUE)
[1] 30

1.7.1.1 Future Value

Write a function to calculate the future value of an investment given annually compounding interest over an amount of years.

\[FV = X * (1 + i) ^T\]

calculate_future_value <- 
function(investment, interest, duration_in_years){
  return(investment * ((1 + interest) ^ duration_in_years)) 
}
## 100 units of investments 7% interest rate over 5 years
calculate_future_value(
  investment = 100, interest = 0.07, duration_in_years = 5)
[1] 140.2552

1.7.2 Color Hex Code

Write a function to randomly generate n color hex codes. You can use letters predefined vector.

generate_hex_code <- function(n=1){
  hex_vec <- c(0:9,letters[1:6])
  colors <- c()
  for(i in 1:n){
    colors <- c(colors,
      paste0("#",
      paste0(sample(hex_vec,6,replace=TRUE),collapse="")))
  }
  return(colors)
}
generate_hex_code(n=3)
[1] "#364395" "#e210ea" "#cce007"

1.7.3 Calculate Probability of Dice

Write a function which calculates the probability of getting k sixes in n throws of a die. Hint: Use binomial distribution.

get_prob_dice <- function(k,n){
  combination <- factorial(n)/(factorial(k) * factorial(n-k))
  probability <- (1/6)^k * (5/6)^(n-k)
  return(combination*probability)
}
get_prob_dice(3,5)
[1] 0.03215021
dbinom(3,5,prob=1/6) ## or simply use dbinom
[1] 0.03215021

1.7.4 Rock, Scissors, Paper

Write a rock scissors paper game which computer randomly chooses

rsp_game <- function(user,choices=c("rock","scissors","paper")){
  if(!(user %in% choices))
    return("Choose only rock, scissors or paper as input.") 
  response <- sample(choices,1)
  if(user == response)
    return("I chose the same. Tie!")
  if((user == "rock" & response == "scissors") | 
     (user == "scissors" & response == "paper") |
     (user == "paper" & response == "rock")){
    return(paste0("I chose ", response, ". You win!"))
  }else{
    return(paste0("I chose ", response, ". You lose!"))
  }
}
rsp_game("rock")
[1] "I chose scissors. You win!"

Check course webpage for more exercises!