Software for Ecologists

There are three classes of software that I consider to be essential for ecologists:
There are commercial solutions for all of these (except maybe the third category), but often the Open Source tools are as powerful (or more so) than the commercial packages. I also prefer tools that are available for multiple operating systems, and tools that have the option of a command-line interface or some kind of batch mode.

That last, the need for a non-graphic interface, tends to baffle people whose sole computer experience is on Windows or a similar graphic-based operating system. It's simple, though: Like most ecologists, I work with large volumes of data - vegetation, climate, soils, GIS - and I'm firmly convinced that anything I need to do more than three times should be automated. Setting up a batch script, macro, or whatever is appropriate takes some time initially, to get everything working righ. In use, though, it has several advantages. The error rate is far reduced because the computer will do the same thing each time. My time is saved, since I don't have to spend hours clicking through menus or making manual edits. And finally, after a few years of doing this, I've build up a large collection of scripts for various situations. Most of them can be reused with only minor modifications to match a new year's field data, or to rerun an analysis on a different dataset.


General Tools

A good text editor like VIM is essential. Emacs is also popular among people who use text editors, but I've never gotten along as well with it. For either of these, the learning curve can be steep, but they are incredibly useful tools for anyone who works with large, complex datafiles, or does any programming. Some important features: pattern-based find and replace capabilities, and syntax highlighting.

Linux provides many tools for working with text files - sorting, finding items and so on - that just aren't available under Windows unless you install an add-on like the unxutils port of some key utilities, or the more elaborate cygwin. In conjunction with VIM (above), one of these packages gives me the ability to do complex data management on either Linux or Windows.

Everyone needs an office suite, if only to open all those Word attachments that people send you. I use OpenOffice.org, which includes a spreadsheet, word processor and presentation software (like PowerPoint), and does a decent job of reading and writing MicroSoft Office versions of the above.

For the usual online tasks, I prefer Firefox for web browsing and Thunderbird for email. Both are Mozilla products.

Like everyone else, I need to keep track of my piles of reprints and organize my bibliographies. I keep everything in the Bibtex format (text files, of course), using JabRef as the front end, and Pybliographer for the guts, like formatting bibliographies for manuscripts.

There are a wide variety of applications for organizing, editing and analyzing images. I take a lot of photographs in the field, and also do occasional work analyzing, say, green cover in an image. For organizing images, Picasa does a nice job on Windows. I use igal to turn my collections of photos into web pages, complete with thumbnails, but haven't gotten it to work on Windows. I consider ImageMagick (Windows and linux) to be essential for batch converting, resizing and other transformations of images. I haven't used IrfanView a lot, but it seems to provide a graphic interface to similar tasks for use on Windows. For certain kinds of analysis tasks, like counting blobs and estimating area, the NIH tool ImageJ is useful.


Data Analysis and Management

My single most important tool (well, maybe tied with a proper text editor) is the statistical software R. It is possible to do pretty much anything using this statistical programming language, and the active user community has quite likely already written a package to do whatever it is that you are looking for. It is a programming language, and has both command line and batch operation, but R Commander is one option for adding a graphic interface for commonly-used statistics. RStudio is also popular.

Spreadsheets are lousy for managing large, complex datasets. A relational database is vastly better. I've used both MySQL and PostgreSQL. For the data from an ongoing regional study, I've put together a PostgreSQL database with a web-accessible Zope frontend so that the other scientists and technicians who need to use the data don't need to learn SQL to get at it. Update: I'm no longer using Zope after an upgrade broke everything and I didn't have time to completely redo it, but Zope remains popular.

The final piece is GIS software. I've used a variety of options, but am currently using GRASS because it has good batch facilities and the capability to link directly to R. This is not easy software - it took me a long time to get used to it - but is very powerful, especially in conjunction with R.


Ecology Tools

I may eventually mention specific tools here, but for now I'll point you to two sites that list relevant software: