Category Archives: Uncategorized

pdflatex errors and R

I was getting an error checking packages in R on an Ubuntu 10.10 machine using the ‘R CMD check’ command saying:

LaTeX errors when creating PDF version. This typically indicates Rd problems.

I have no idea what R’s problem with pdflatex was (I’ve been using pdflatex for writing papers just fine) but once I installed the texlive-full package (a metapackage that installed all sorts of other packages) the check ran without any errors. If in doubt, this package seems to install everything that might be missing (along with the kitchen sink).

ape and geiger in R

Just a quick tip if you want to use the geiger package in R on an Ubuntu system, there are a few things you need to make sure you do. First, make sure you’ve installed the gfortran package and lapack-dev package in order to build the source packages. As well, you may need to reinstall the r-base package after you’ve done this. I was getting an error message like this:

/usr/lib/liblapack.so.3gf: undefined symbol: ATL_chemv

It seemed to go away after I reinstalled r-base and restarted the R session, although I’m not sure I needed to do both. Either way, it works now.

64-bit MrBayes

Just a quick note: if you’re running a 64-bit install of MrBayes in parallel, you will probably need to use this page to apply some patches to the source code:

http://technical.bestgrid.org/index.php/Bioinformatics_applications_at_University_of_Canterbury_HPC#Installing_MrBayes

I was having problems getting MrBayes to work on a workstation running 64-bit Ubuntu, but once I started fresh and first applied those patches, things seemed to work without throwing a ‘segmentation fault’ error during the ‘sump’ function.

R and fossils

I recently had a paper that was published in Palaeontologia Electronica on my ‘fossil’ package for R. The journal is open access, so anyone can read it for free.

Vavrek, Matthew J. 2011. fossil: palaeoecological and palaeogeographical analysis tools. Palaeontologia Electronica, 14:1T. http://palaeo-electronica.org/2011_1/238/index.html

MrBayes and multicore processors

Turns out setting up and using MrBayes on an Ubuntu system is much easier than I had thought. If all you want is the normal (serial) version of MrBayes, you can just download it from the repositories. However, if you want a serious speed up in the time it takes to get a good result, you can also run it in parallel on a multicore system (that is, pretty much any computer made in the last 4 years). To get it set up and running on Linux, I used some information I found in a forum post. To recap from there:

  1. Install the parallel libraries you need from the repositories. The package names I used were: mpich2, libmpich2-dev, and libmpich2-1.2, and libreadline6-dev.
  2. Download the source code file for MrBayes and unarchive it (on Ubuntu you can just right-click and select ‘Extract Here’
  3. Find the ‘Makefile’ in the source code and change the line that says ‘MPI ?= no’ so that it says ‘MPI = yes’
  4. Open a terminal, and navigate to the MrBayes folder (e.g. type in ‘cd /path/to/folder/mrbayes-3.1.2/’) and then make the package (type ‘Make’ at the prompt).  It might also be a good idea to change the file called ‘mb’ that is created to something like ‘mbpar’ so that you know it’s the parallel version. Also, I needed to make the file executable, so I typed ‘chmod +x mbpar’  to do that.
  5. Now, you’ll need to create a file in your home folder called ‘.mpd.conf’ with the line MPD_SECRETWORD=<secretword> in it. Change the <secretword> to something else though; it can be pretty much any word you like.
  6. ‘mpd &’ will launch the MPICH daemon, which needs to be running in order to handle communicating between the different cores.
  7. After all this, I was able to type ‘mpirun -np 2 /path/to/mrbayes/mbpar’ to run the parallel version of MrBayes in parallel on both cores of my dual core system. If you have more cores, you can always change the -np argument (e.g. to run it using 4 cores, type ‘mpirun -np 4 /path/to/mrbayes/mbpar’)

With the few tests I’ve done so far, I’ve seen about a 80% speed up by just using 2 cores instead of 1. It’s nice not to have to wait nearly so long to get my results. We’ll see what kind of time savings this could bring if I did it on an 8-core computer.

BiBTeX and the Canadian Journal of Earth Sciences

While prepping a manuscript for the Canadian Journal of Earth Sciences, I needed a BiBTeX (.bst) style file that I could use to format my references properly if I wanted to use BiBTeX and LaTeX to write the manuscript. CJES does alow you to submit LaTeX (.tex) files, however I couldn’t find a reference style file online. Instead, I used the custom-bib package to create a custom file that matched up with the CJES reference format. In order to hopefully save others from extra work, I’ve posted the resulting file here for anyone to use. Be forewarned though, some of the more awkward types of references may not be exactly as they should be, and for the final submission I think you may need to copy the bibliography and paste it directly in the LaTeX file you submit.

Enjoy.

philosophiae doctor

I have officially received my PhD from McGill University, and can now identify myself as Dr. Vavrek, although I still keep on filling out forms and checking off the ‘Mr.’ box. Along with that, I’ve managed to get hired on at the Royal Ontario Museum as an Assistant Curator, though the contract is only for a year and a half. I’ll be at the ROM helping to set up a big new exhibit that’s set to open in the summer of 2012. Big things happening!

Parallelized functions in R

I have released a new package on CRAN called ‘parfossil‘ where I have been experimenting with a number of parallelized functions. While R is a readily accessible and fast-to-code language when it comes to stats, it can be really slow when it comes to large analyses, especially on non-desktop computers. However, for my research most of the analysis time is spent in repetitive loops, like Monte Carlo or bootstrap analysis. Currently, R runs everything in serial, with each permutation in a resampling analysis done one after another. This was the only way when most machines only had one core, but with the proliferation of multicore chips even on laptops, this means we are only using a portion of our computer’s power when we run an analysis. With multicore chips we can assign different tasks to different cores, but this is often a difficult thing to code for. Luckily Revolution Analytics has made available a simple to use package called ‘foreach‘ with an included function of the same name that makes the process of parallelization much easier. So far, with the functions I have recoded to run in parallel, I am seeing a speed up of 1.5 to 1.8 times just on my dual core laptop. I would imagine that using a quad core chip would see somewhere above a 3 times speedup; that could mean several hours to even days less of waiting in some cases for some really large data sets. And over the next few years most computer chips will have even more cores available to them. The future of R computing is in parallel.

Maastrichtian Dinosaur Provinciality

Hans Larsson and I had our paper published recently in PNAS on the low beta diversity of Maastrichtian dinosaurs in the Western Interior. Hopefully a few people will read the methods and notice that we used the fossil package to do a lot of the stats in it, and others might start using it as well. If you have any questions about the paper or the package, feel free to contact me any time.

Teaching

I have been reading a bunch of articles lately on education, and how to improve it. It’s been making me think a lot about my own education, especially in university. I just want to know if the major universities are ever going to give education more than a fleeting thought. Obviously, research is very important, but at what point should it be to the exclusion of teaching? If you never take the time to teach others what you know, then what? Knowledge not passed along is knowledge lost. I would hate to have toiled away at my research, uncovering new things, but not have shared that with others before I died. And I mean more than what can be passed along in a journal article. Too many journal articles are never read anyhow; if you actually take the time to teach someone what you know, then you know that knowledge will have been passed on. And when I talk about teaching, I mean more than just standing in front of a room and saying words. There was a class in university that I went to less than half the time, and I did better for it because I didn’t get confused by the abysmal teacher reading nothing but equations off a Powerpoint slide. I think that too often, profs forget that the people they are teaching could become some of the most important people in their lives: namely, politicians. If these future politicians (and their constituents) a never taught how exciting or amazing a subject is, that subject might not get as much funding the next year, because nobody cares about it. Teaching can be in your own self interests.

Building a Better Teacher (NYT)

Why We Must Fire Bad Teachers

Fixing US STEM education is possible, but will take money