R

### Create a vector of size 10, having the values 5,7,9,11,13,13,11,9,7,5. Compute the sum, mean, highest and lowest of these values. Compute the length of this vector? Find the variance and standard deviation for the data of this vector, using the formula for variance and standard deviation. Compare these values by computing the variance and standard deviation using R function. Sort this array values in decreasing order.

Step 1: Create the vector and compute the sum, mean, highest, and lowest values.

vector <- c(5, 7, 9, 11, 13, 13, 11, 9, 7, 5)
sum_value <- sum(vector)
mean_value <- mean(vector)
highest_value <- max(vector)
lowest_value <- min(vector)
length_value <- length(vector)
cat("Sum:", sum_value, "\n")
cat("Mean:", mean_value, "\n")
cat("Highest:", highest_value, "\n")
cat("Lowest:", lowest_value, "\n")
cat("Length:", length_value, "\n")


Step 2: Compute the variance and standard deviation manually.

mean_diff_sq <- (vector - mean_value)^2
variance_value <- sum(mean_diff_sq) / (length_value - 1)
standard_deviation_value <- sqrt(variance_value)
cat("Variance (Manual):", variance_value, "\n")
cat("Standard Deviation (Manual):", standard_deviation_value, "\n")


Step 3: Compute the variance and standard deviation using R functions.

variance_r <- var(vector)
standard_deviation_r <- sd(vector)
cat("Variance (R):", variance_r, "\n")
cat("Standard Deviation (R):", standard_deviation_r, "\n")

Step 4: Sort the vector in decreasing order.

sorted_vector <- sort(vector, decreasing = TRUE)
cat("Sorted Array (Decreasing Order):", sorted_vector, "\n")
------------------------------------------------------------------------------------------------------------------------------------
### Create a vector of first 50 even numbers, starting from 2. Also create a vector having values 30 down to 1, as 30, 29, …,1 in R

# Vector of first 50 even numbers starting from 2
even_numbers <- seq(from = 2, by = 2, length.out = 50)

# Vector of values from 30 down to 1
values_down_to_1 <- seq(from = 30, to = 1, by = -1)
------------------------------------------------------------------------------------------------------------------------------------

### Create a vector of size 10 with 5th and 7th values as missing (store these values as NA). Use the “is.na()” to find locations of missing data.

my_vector <- c(1, 2, 3, 4, NA, 6, NA, 8, 9, 10)
missing_locations <- which(is.na(my_vector))
print(my_vector)
print(missing_locations)

------------------------------------------------------------------------------------------------------------------------------------
### Create a vector of characters of size 5, consisting of values: “This” “is” “a” “character” “vector”. Find the index of value “is” in the vector using which() or match().


my_vector <- c("This", "is", "a", "character", "vector")
index_with_which <- which(my_vector == "is")
index_with_match <- match("is", my_vector)
print("Index of 'is' using which():")
print(index_with_which)
print("Index of 'is' using match():")
print(index_with_match)

------------------------------------------------------------------------------------------------------------------------------------
### It is always good to store numerical values rather than textual data. However, while input or output the textual values are easier to understand. An example, for this is as follows in R:
> Fivepointscale=c(1:5)
> names(Fivepointscale) = c("Not Satisfactory", "Satisfactory", "Fair", "Good", "Very Good")
> Feedback = Fivepointscale[c("Good", "Satisfactory")]
Create a 7-point scale of information input and use this scale to input feedback of 5 students about a question like “Feedback of experience of using an application (Bad, Somewhat bad, not good, ok, good, very good, excellent)”. Find the average of the feedback.

# Create a 7-point scale
Sevenpointscale <- c(1:7)
names(Sevenpointscale) <- c("Bad", "Somewhat bad", "Not good", "OK", "Good", "Very good", "Excellent")

# Input feedback from 5 students using the scale
Feedback <- Sevenpointscale[c("Good", "Somewhat bad", "Not good", "OK", "Good")]

# Find the average feedback
average_feedback <- mean(Feedback)

------------------------------------------------------------------------------------------------------------------------------------
### Create two strings and concatenate them in R.

string1 <- "Hello"
string2 <- "World"
concatenated_string <- paste(string1, string2)
print(concatenated_string)

------------------------------------------------------------------------------------------------------------------------------------
### Create a long string of words separated by punctuation marks. Replace all the punctuation marks in the string using gsub("[[:punct:]]", "", stringName) function. Find the number of words in the string without punctuation marks. Find the number of distinct words and its count, if possible in R.

longString <- "Hello, how are you? I hope you're doing well. This is an example string with some punctuations!! Can you handle it??"
cleanString <- gsub("[[:punct:]]", "", longString)
numWordsWithoutPunctuation <- length(strsplit(cleanString, "\\s+")[[1]])
distinctWords <- unique(strsplit(cleanString, "\\s+")[[1]])
numDistinctWords <- length(distinctWords)
cat("Cleaned string without punctuation marks:\n", cleanString, "\n")
cat("Number of words in the string without punctuation marks:", numWordsWithoutPunctuation, "\n")
cat("Number of distinct words:", numDistinctWords, "\n")
------------------------------------------------------------------------------------------------------------------------------------

### Store content in external files for the following types of data in R: (i) Vectors (ii) Lists (iii) Arrays (iv) Data frames (v) Factors Read those contents into R. Perform operations link sorting on vector data, finding the length of lists and adding data items in list, accessing different elements of array and comparing it to other values, accessing different components of data frames and factors.

(i) Vectors: To store a vector in an external file, you can use a simple text file with one element per line or save it as a CSV file. Lets assume we have a numeric vector called **`my_vector`**.

my_vector <- c(10, 5, 8, 3, 6)
writeLines(my_vector, "my_vector.txt")
write.csv(my_vector, file = "my_vector.csv", row.names = FALSE)

#To read the vector back into R:
my_vector_from_file <- as.numeric(readLines("my_vector.txt"))
my_vector_from_csv <- as.numeric(read.csv("my_vector.csv")$x)


(ii) Lists: For lists, you can store them in a similar way, using a text file or a CSV file. Lets assume we have a list called **`my_list`**.

my_list <- list("apple", 25, TRUE, c(1, 2, 3))
writeLines(as.character(unlist(my_list)), "my_list.txt")
write.csv(t(data.frame(my_list)), file = "my_list.csv", row.names = FALSE)

#To read the list back into R:
my_list_from_file <- strsplit(readLines("my_list.txt"), ",")
my_list_from_csv <- list(data.frame(t(read.csv("my_list.csv")))[,1])


(iii) Arrays: Arrays can be saved in external files using formats like RDS (R Data Serialization) or CSV.

my_array <- array(1:24, dim = c(2, 3, 4))
saveRDS(my_array, "my_array.rds")
write.csv(my_array, file = "my_array.csv", row.names = FALSE)


#To read the array back into R:

my_array_from_rds <- readRDS("my_array.rds")
my_array_from_csv <- as.array(read.csv("my_array.csv"))


(iv) Data frames: Data frames are commonly stored in CSV files, but you can also use RDS format.

my_dataframe <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 22),
  Grade = c("A", "B", "C")
)

saveRDS(my_dataframe, "my_dataframe.rds")
write.csv(my_dataframe, file = "my_dataframe.csv", row.names = FALSE)


#To read the data frame back into R:

my_dataframe_from_rds <- readRDS("my_dataframe.rds")
my_dataframe_from_csv <- read.csv("my_dataframe.csv")


(v) Factors: Factors can be stored using RDS format or as integers in CSV files.

my_factor <- factor(c("Male", "Female", "Male", "Female"))
saveRDS(my_factor, "my_factor.rds")
write.csv(as.integer(my_factor), file = "my_factor.csv", row.names = FALSE)

#To read the factor back into R:

my_factor_from_rds <- readRDS("my_factor.rds")

my_factor_from_csv <- factor(read.csv("my_factor.csv")$x)


Once you have the data back in R, you can perform various operations as requested:

1. Sorting a vector:
    sorted_vector <- sort(my_vector_from_file)
2. Finding the length of a list:
    list_length <- length(my_list_from_file)
3. Adding data items to a list:
    my_list_from_file[[length(my_list_from_file) + 1]] <- "banana"
4. Accessing different elements of an array and comparing it to other values:
    element_1_2_3 <- my_array_from_rds[1, 2, 3]
    comparison_result <- element_1_2_3 > 10
5. Accessing different components of data frames:
    name_column <- my_dataframe_from_csv$Name
    age_row_2 <- my_dataframe_from_csv[2, "Age"]
6. Accessing different levels of factors:
    factor_level <- levels(my_factor_from_rds)

------------------------------------------------------------------------------------------------------------------------------------
### Create two matrices of 5*5 size using R, add, subtract and multiply these two matrices .*

Step 1: Generate two 5x5 matrices with random values.
Step 2: Perform matrix addition, subtraction, and multiplication.

Heres an example R code to do this:
# Step 1: Generate two 5x5 matrices with random values
set.seed(42)  # Setting a seed for reproducibility
matrix1 <- matrix(runif(25), nrow = 5, ncol = 5)
matrix2 <- matrix(runif(25), nrow = 5, ncol = 5)

# Step 2: Perform matrix addition, subtraction, and multiplication
addition_result <- matrix1 + matrix2
subtraction_result <- matrix1 - matrix2
multiplication_result <- matrix1 %*% matrix2

# Print the matrices and their results
cat("Matrix 1:\n")
print(matrix1)

cat("\nMatrix 2:\n")
print(matrix2)

cat("\nAddition Result:\n")
print(addition_result)

cat("\nSubtraction Result:\n")
print(subtraction_result)

cat("\nMultiplication Result:\n")
print(multiplication_result)
------------------------------------------------------------------------------------------------------------------------------------

### Perform transpose of a matrix in R.

original_matrix <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2)
print("Original Matrix:")
print(original_matrix)
transposed_matrix <- t(original_matrix)
print("Transposed Matrix:")
print(transposed_matrix)

------------------------------------------------------------------------------------------------------------------------------------
### *Find the inverse of a matrix in R.*

A <- matrix(c(2, 1, 1, 3), nrow = 2)
A_inv <- solve(A)
print(A_inv)
------------------------------------------------------------------------------------------------------------------------------------

### *Create a list of a factor. Find the occurrences of each factor in the list in R.*

To create a list of factors and find the occurrences of each factor in R, you can follow these steps:

Step 1: Create a list of factors.
Step 2: Use the **`table()`** function to count the occurrences of each factor.

Heres the R code to achieve this:

my_factors <- factor(c("Apple", "Banana", "Apple", "Orange", "Banana", "Apple", "Grapes", "Grapes", "Apple"))
factor_occurrences <- table(my_factors)
print(factor_occurrences)
------------------------------------------------------------------------------------------------------------------------------------


### Write function to find the largest and smallest values in a 3-dimensional array of size 3*3*3. You should use parameter passing *in R.

find_largest_and_smallest <- function(arr) {
  if (is.array(arr) && all(dim(arr) == c(3, 3, 3))) {
    # Initialize variables to store the maximum and minimum values
    max_val <- arr[1, 1, 1]
    min_val <- arr[1, 1, 1]

    # Iterate through the elements of the array
    for (i in 1:3) {
      for (j in 1:3) {
        for (k in 1:3) {
          if (arr[i, j, k] > max_val) {
            max_val <- arr[i, j, k]
          }
          if (arr[i, j, k] < min_val) {
            min_val <- arr[i, j, k]
          }
        }
      }
    }

    # Return the results as a list
    result <- list(largest = max_val, smallest = min_val)
    return(result)
  } else {
    stop("Invalid array size. The array should be 3x3x3.")
  }
}

# Example usage:
# Create a sample 3-dimensional array
my_array <- array(data = 1:27, dim = c(3, 3, 3))

# Call the function and store the result
result <- find_largest_and_smallest(my_array)

# Print the results
print(result$largest)
print(result$smallest)


------------------------------------------------------------------------------------------------------------------------------------
### Find the eigen values and eigen vectors of a symmetric matrix *in R*.

1. Create a symmetric matrix.
2. Use the **`eigen()`** function to find the eigenvalues and eigenvectors.

matrix_data <- matrix(c(1, 2, 2, 5), nrow = 2, ncol = 2, byrow = TRUE)

eigen_result <- eigen(matrix_data)
eigenvalues <- eigen_result$values
eigenvectors <- eigen_result$vectors
print("Eigenvalues:")
print(eigenvalues)
print("Eigenvectors:")
print(eigenvectors)

------------------------------------------------------------------------------------------------------------------------------------

### Create a table of showing the States of 20 students, assume these students stay in 5 different states. Now create a factor of these states and then compute the frequency of each factor *in R* (Hint: You may use factor() and tapply() functions)

Step 1: Generate a random list of 20 students and assign them to 5 different states.
Step 2: Create a factor for the states.
Step 3: Compute the frequency of each state factor using the **`tapply()`** function.


# Step 1: Generate random data for 20 students and 5 different states
set.seed(42)  # Setting seed for reproducibility
students <- paste("Student", 1:20)
states <- sample(c("State A", "State B", "State C", "State D", "State E"), 20, replace = TRUE)

# Step 2: Create a factor for the states
states_factor <- factor(states, levels = c("State A", "State B", "State C", "State D", "State E"))

# Step 3: Compute the frequency of each state factor
frequency_table <- tapply(states_factor, states_factor, length)

# Display the frequency table
print(frequency_table)
------------------------------------------------------------------------------------------------------------------------------------

### Consider a state wise list of income of few persons. Use factor function to create a frequency division of income into 5 factor classes e.g. 10000-50000; 50000-100000 etc *in R*.

In R, you can use the **`cut()`** function to create frequency divisions of income into factor classes. Heres a step-by-step guide on how to do it:

1. Create a vector with the income data (replace this with your actual income data):
    income <- c(25000, 40000, 75000, 90000, 30000, 60000, 80000, 120000, 55000, 65000)
2. Use the **`cut()`** function to divide the income into 5 factor classes:
    income_breaks <- c(10000, 50000, 100000, 150000, Inf)
    income_factor <- cut(income, breaks = income_breaks, labels = c("10000-50000", "50000-100000", "100000-150000", "150000+", "Unknown"))
3. Print the frequency table of the income factor classes:
    income_frequency <- table(income_factor)
    print(income_frequency)


------------------------------------------------------------------------------------------------------------------------------------
### Explore different functions in R about strings, arrays, vectors, factors. You may also explore different methods of plotting the data *in R.

1. Strings:

    R provides several functions for handling strings, also known as character vectors. Here are some commonly used functions:

- **`paste()`** and **`paste0()`**: Concatenate strings together.
- **`strsplit()`**: Split a string into substrings based on a delimiter.
- **`gsub()`** and **`sub()`**: Replace substrings in a string using regular expressions.
- **`toupper()`** and **`tolower()`**: Convert characters to uppercase or lowercase.
- **`nchar()`**: Count the number of characters in a string.
- **`grepl()`** and **`grep()`**: Pattern matching using regular expressions.
- **`substring()`**: Extract a substring from a string.

Example:
# Concatenate strings
string1 <- "Hello"
string2 <- "World"
result <- paste(string1, string2)
cat(result)

# Split a string
text <- "R programming is fun"
words <- strsplit(text, " ")
print(words)

# Replace substring
text <- "Hello, my name is John"
new_text <- gsub("John", "Alice", text)
cat(new_text)


2. Arrays and Vectors:

    Arrays and vectors are fundamental data structures in R.

- **`c()`**: Create a vector by combining elements.
- **`vector()`**: Create a vector of a specified length and mode.
- **`length()`**: Get the length of a vector or array.
- **`dim()`**: Get or set the dimensions of an array.
- **`cbind()`** and **`rbind()`**: Combine vectors by column or row, respectively.
- **`rep()`**: Replicate elements of a vector.
- **`sort()`**: Sort a vector in ascending order.
- **`max()`**, **`min()`**, **`sum()`**, **`mean()`**, **`median()`**: Basic statistical functions.

Example:
# Create a vector
vec <- c(10, 20, 30, 40, 50)

# Length of the vector
length(vec)

# Replicate elements
rep_vec <- rep(1:3, times = 3)
print(rep_vec)

# Sorting
sorted_vec <- sort(vec)
print(sorted_vec)


3. Factors:

    Factors are used to represent categorical data in R. They are useful for handling data with a limited number of unique values.

- **`factor()`**: Create a factor from a vector.
- **`levels()`**: Get or set the levels of a factor.
- **`table()`**: Create a frequency table of factor levels.

Example:

# Create a factor
gender <- c("Male", "Female", "Male", "Male", "Female")
gender_factor <- factor(gender)
print(gender_factor)

4. Plotting Data:

    R provides various libraries for data visualization. Some common libraries and methods for plotting data are:

- Base R Graphics: The default R graphics system for creating plots using functions like **`plot()`**, **`hist()`**, **`barplot()`**, etc.
- ggplot2: A popular package for creating elegant and flexible data visualizations.
- lattice: Another package for creating trellis plots and conditioned plots.
- plotly: A library for creating interactive and web-based visualizations.
- ggvis: Interactive data visualizations with the grammar of graphics.

Example using ggplot2:

library(ggplot2)
data <- data.frame(x = c(1, 2, 3, 4, 5), y = c(3, 5, 2, 6, 4))
ggplot(data, aes(x = x, y = y)) +
  geom_point()
ggplot(data, aes(x = x, y = y)) +
  geom_line()
ggplot(data, aes(x = x, y = y)) +
  geom_bar(stat = "identity")

------------------------------------------------------------------------------------------------------------------------------------

### Find the details of all the vectors and other variables. Also find the data type of all variables *in R*. (Hint: use summary() and typeof(), you can also use stem().)


vector1 <- c(1, 2, 3, 4, 5)
vector2 <- c("apple", "banana", "orange")
variable1 <- 42
variable2 <- TRUE
summary(vector1)
summary(vector2)
typeof(vector1)
typeof(vector2)
typeof(variable1)
typeof(variable2)
stem(vector1)
------------------------------------------------------------------------------------------------------------------------------------

### A class has a student strength of 50 students. The marks obtained (out of 100) by the students of the class are as per the binomial distribution. You should create the sample data of marks for the 50 students using binomial distribution. Convert these marks to grades as follows:<40 D,=>40 but < 60 C, =>60 but < 80 B, => 80 A. Also, create random data for seriousness towards studies having the categories: Very Serious, Serious, Not Serious. Use chi-square testing to determine, if there is a relation between the seriousness towards learning to Grades of student, as per your data. Show and explain the results *in R.


set.seed(42)
num_students <- 50
p_passing <- 0.7
marks <- rbinom(num_students, 100, p_passing)
convert_to_grade <- function(marks) {
  if (marks < 40) {
    return("D")
  } else if (marks < 60) {
    return("C")
  } else if (marks < 80) {
    return("B")
  } else {
    return("A")
  }
}

grades <- sapply(marks, convert_to_grade)
seriousness <- sample(c("Very Serious", "Serious", "Not Serious"), num_students, replace = TRUE)
simulated_data <- data.frame(Marks = marks, Grades = grades, Seriousness = seriousness)
head(simulated_data)
chisq_result <- chisq.test(simulated_data$Grades, simulated_data$Seriousness)
print(chisq_result)

------------------------------------------------------------------------------------------------------------------------------------
### The marks of a class of 50 students are recorded as the final percentage of marks. Assuming that the percentage data is normally distributed. In addition, gender data is also stored. Create the data for the class and draw side by side box plot of Girls and Boys marks. Explain the output of the boxplots *in R*.

library(ggplot2)
set.seed(42)
num_students <- 50
girls_marks <- rnorm(num_students, mean = 75, sd = 10)
boys_marks <- rnorm(num_students, mean = 80, sd = 8)
gender <- rep(c("Girls", "Boys"), each = num_students / 2)
data <- data.frame(Gender = gender, Marks = c(girls_marks, boys_marks))
ggplot(data, aes(x = Gender, y = Marks, fill = Gender)) +
  geom_boxplot() +
  labs(title = "Marks Comparison: Girls vs. Boys",
       x = "Gender",
       y = "Marks Percentage") +
  theme_minimal()