Tag Archives: R

Where am I? Or, adding a counter to a running R process

A quick post in response to a question that came up on Facebook via Liz Freedman and Nic Campione: how do you add a counter/ticker to a running process in R? This is something that can be really helpful if you’re still in the process of trying to debug some code, as a process that involves thousands of iterations may be either stuck and broken, or it could be just taking its sweet time to finish, and without any output during the iterations, you can’t know which it’s doing. So, in one line, here’s a simple way of adding a counter to a for loop (although while and other style of loops follow the same logic):

for (i in 1:1000) {
  if (i%%100==0) print(paste("This is iteration number ", i))
  #all your complex code can go down here, whatever it is
}

In this case, I just added a rule that printed the iteration number every 100 loops (the code can be run directly in a terminal if you want to see what I mean). The two percent signs right beside each other are the modulo operation, so that when the remainder of the iteration is zero (that is, every 100 loops), it prints out a message. The number can be changed easily so that it is every 1000 instead of 100, and you can also use different rules, but I find that this is a relatively quick and easy way to add a counter.

As for making the whole process run faster, using something like the foreach package (and associated packages like doMC or doParallel) can significantly speed things up if you have a multicore processor on your computer.

pdflatex errors and R

I was getting an error checking packages in R on an Ubuntu 10.10 machine using the ‘R CMD check’ command saying:

LaTeX errors when creating PDF version. This typically indicates Rd problems.

I have no idea what R’s problem with pdflatex was (I’ve been using pdflatex for writing papers just fine) but once I installed the texlive-full package (a metapackage that installed all sorts of other packages) the check ran without any errors. If in doubt, this package seems to install everything that might be missing (along with the kitchen sink).

ape and geiger in R

Just a quick tip if you want to use the geiger package in R on an Ubuntu system, there are a few things you need to make sure you do. First, make sure you’ve installed the gfortran package and lapack-dev package in order to build the source packages. As well, you may need to reinstall the r-base package after you’ve done this. I was getting an error message like this:

/usr/lib/liblapack.so.3gf: undefined symbol: ATL_chemv

It seemed to go away after I reinstalled r-base and restarted the R session, although I’m not sure I needed to do both. Either way, it works now.

R and fossils

I recently had a paper that was published in Palaeontologia Electronica on my ‘fossil’ package for R. The journal is open access, so anyone can read it for free.

Vavrek, Matthew J. 2011. fossil: palaeoecological and palaeogeographical analysis tools. Palaeontologia Electronica, 14:1T. http://palaeo-electronica.org/2011_1/238/index.html

Parallelized functions in R

I have released a new package on CRAN called ‘parfossil‘ where I have been experimenting with a number of parallelized functions. While R is a readily accessible and fast-to-code language when it comes to stats, it can be really slow when it comes to large analyses, especially on non-desktop computers. However, for my research most of the analysis time is spent in repetitive loops, like Monte Carlo or bootstrap analysis. Currently, R runs everything in serial, with each permutation in a resampling analysis done one after another. This was the only way when most machines only had one core, but with the proliferation of multicore chips even on laptops, this means we are only using a portion of our computer’s power when we run an analysis. With multicore chips we can assign different tasks to different cores, but this is often a difficult thing to code for. Luckily Revolution Analytics has made available a simple to use package called ‘foreach‘ with an included function of the same name that makes the process of parallelization much easier. So far, with the functions I have recoded to run in parallel, I am seeing a speed up of 1.5 to 1.8 times just on my dual core laptop. I would imagine that using a quad core chip would see somewhere above a 3 times speedup; that could mean several hours to even days less of waiting in some cases for some really large data sets. And over the next few years most computer chips will have even more cores available to them. The future of R computing is in parallel.

fossil Package Updated to 0.2.4

I just uploaded a new version of fossil to the CRAN website, with a number of changes. There are some fixes in the way the spp.est() function was handling abundance data, and I’ve added a small species/locality dataset that I used for a number of new examples in the package. I’m also currently working on a new clustering method to include, but it’s still being worked on at the moment. Hopefully it’ll be in the package before too long.

Enjoy!