Is a type of computation in which many calculations or process are carried out simultaneously.
01/11/2021
Is a type of computation in which many calculations or process are carried out simultaneously.
How we do this?
Multi Core Processors. Computers today have multiple cores on them.
Parallel
[R Baseline]Rmpi
future
Foreach
The parallel
package is now part of the core distribution of R. It includes a number of different mechanisms to enable you to exploit parallelism utilizing the multiple cores in your processor(s) as well as compute the resources distributed across a network as a cluster of machines.
However, in this talk, we will stick to making the most of the resources available on the machine on which you are running R.
How many cores do you have?
# Load the package library(parallel) detectCores()
## [1] 8
Starting clusters
cl <- makeCluster(2)
Sending libraries to clusters
clusterEvalQ(cl, { library(tidyverse) })
## [[1]] ## [1] "forcats" "stringr" "dplyr" "purrr" "readr" "tidyr" ## [7] "tibble" "ggplot2" "tidyverse" "stats" "graphics" "grDevices" ## [13] "utils" "datasets" "methods" "base" ## ## [[2]] ## [1] "forcats" "stringr" "dplyr" "purrr" "readr" "tidyr" ## [7] "tibble" "ggplot2" "tidyverse" "stats" "graphics" "grDevices" ## [13] "utils" "datasets" "methods" "base"
Sending variables and functions to clusters
a <- 2 square <- function(num) num**2 clusterExport(cl, c("a", "square")) # To test if it was received I run another EvalQ clusterEvalQ(cl, { print(c(a, square(a))) })
## [[1]] ## [1] 2 4 ## ## [[2]] ## [1] 2 4
Stopping a cluster
stopCluster(cl)
Making the computer sleeps 3 sec before running anything, repeating this 5 times…
Running in series
ptm <- proc.time() for (i in 1:5) Sys.sleep(3) print(proc.time()-ptm)
## user system elapsed ## 0.02 0.00 15.23
Running in parallel
library(parallel) ptm <- proc.time() cl <- makeCluster(8) invisible(parSapply(cl, rep(3,5), Sys.sleep)) #invisible here just hides the null list from Sapply stopCluster(cl) print(proc.time()-ptm)
## user system elapsed ## 0.05 0.11 5.39
Parallel processing in R by Hadley Wickham
Parallel R by Q. Ethan McCallum and Stephen Weston.
Parallel Computing for Data Science by Norm Matloff.
Parallelization in R talk by Victor Feagins with real-world coding examples.