Parallelized functions in R

I have released a new package on CRAN called ‘parfossil‘ where I have been experimenting with a number of parallelized functions. While R is a readily accessible and fast-to-code language when it comes to stats, it can be really slow when it comes to large analyses, especially on non-desktop computers. However, for my research most of the analysis time is spent in repetitive loops, like Monte Carlo or bootstrap analysis. Currently, R runs everything in serial, with each permutation in a resampling analysis done one after another. This was the only way when most machines only had one core, but with the proliferation of multicore chips even on laptops, this means we are only using a portion of our computer’s power when we run an analysis. With multicore chips we can assign different tasks to different cores, but this is often a difficult thing to code for. Luckily Revolution Analytics has made available a simple to use package called ‘foreach‘ with an included function of the same name that makes the process of parallelization much easier. So far, with the functions I have recoded to run in parallel, I am seeing a speed up of 1.5 to 1.8 times just on my dual core laptop. I would imagine that using a quad core chip would see somewhere above a 3 times speedup; that could mean several hours to even days less of waiting in some cases for some really large data sets. And over the next few years most computer chips will have even more cores available to them. The future of R computing is in parallel.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s