Tag Archives: statistics

Where am I? Or, adding a counter to a running R process

A quick post in response to a question that came up on Facebook via Liz Freedman and Nic Campione: how do you add a counter/ticker to a running process in R? This is something that can be really helpful if you’re still in the process of trying to debug some code, as a process that involves thousands of iterations may be either stuck and broken, or it could be just taking its sweet time to finish, and without any output during the iterations, you can’t know which it’s doing. So, in one line, here’s a simple way of adding a counter to a for loop (although while and other style of loops follow the same logic):

for (i in 1:1000) {
  if (i%%100==0) print(paste("This is iteration number ", i))
  #all your complex code can go down here, whatever it is
}

In this case, I just added a rule that printed the iteration number every 100 loops (the code can be run directly in a terminal if you want to see what I mean). The two percent signs right beside each other are the modulo operation, so that when the remainder of the iteration is zero (that is, every 100 loops), it prints out a message. The number can be changed easily so that it is every 1000 instead of 100, and you can also use different rules, but I find that this is a relatively quick and easy way to add a counter.

As for making the whole process run faster, using something like the foreach package (and associated packages like doMC or doParallel) can significantly speed things up if you have a multicore processor on your computer.

Advertisements

Parallelized functions in R

I have released a new package on CRAN called ‘parfossil‘ where I have been experimenting with a number of parallelized functions. While R is a readily accessible and fast-to-code language when it comes to stats, it can be really slow when it comes to large analyses, especially on non-desktop computers. However, for my research most of the analysis time is spent in repetitive loops, like Monte Carlo or bootstrap analysis. Currently, R runs everything in serial, with each permutation in a resampling analysis done one after another. This was the only way when most machines only had one core, but with the proliferation of multicore chips even on laptops, this means we are only using a portion of our computer’s power when we run an analysis. With multicore chips we can assign different tasks to different cores, but this is often a difficult thing to code for. Luckily Revolution Analytics has made available a simple to use package called ‘foreach‘ with an included function of the same name that makes the process of parallelization much easier. So far, with the functions I have recoded to run in parallel, I am seeing a speed up of 1.5 to 1.8 times just on my dual core laptop. I would imagine that using a quad core chip would see somewhere above a 3 times speedup; that could mean several hours to even days less of waiting in some cases for some really large data sets. And over the next few years most computer chips will have even more cores available to them. The future of R computing is in parallel.