Scientific computing with python
with an emphasis on ocean modeling.
Python class information can be found here.
Quick one-stop shopping:
Here are some quick links for the folks who have been here before, and (almost) know what they are looking for.
Array types in python:
- NumPy. (RECOMMENDED) NumPy is the standard for array types in Python. SciPy adds quite a bit of functionality to the standard array class. You will most likely want to install SciPy as well. NumPy contains the basic (multi-dimensional) array manipulation tools and basic mathematical functions for working with these arrays.
- SciPy contains many of the more advanced numerical tools, like image manipulation, function fitting, optimization, ODE integration. SciPy is not trivial to install (mostly because of the BLAS/LAPACK/ATLAS libraries from netlib, and fftw libraries); you had better read the instructions. Both are available via svn, but many platforms have pre-built binaries. In particular, you may want to get the Enthought Python Distribution that comes with all of the standard packages used in scientific programming with Python. There is also the Python(x,y) package that contains many pre-compiled components.
Plotting tools:
- matplotlib (RECOMMENDED) is an excellent two-dimensional plotting utility, with an interface similar to MATLAB. matplotlib plays nice with NumPy. Don't forget to install the Basemap package to add support for geographic plotting. All this, and more, is available via svn from sourceforge.
- OTHER: Just about every imaginable (unix style) plotting library has been ported to Python in one way or another. For example, there is
PyGMT,
gnuplot-py,
pyngl, and
pyglet
Windowing python (used by e.g., matplotlib):
- Tkinter is included in most standard Python distributions and seems to be the one windowing environment that works on every platform. It is not as flexible as the others listed below, but still has some very powerful features that can be used from matplotlib, and it is probably the simplest to install.
- pyqt is a good alternative, although installing Qt is a bit more involved.
- wxPython is available for most major platforms, and is the foundation of many of the Entought GUI tools.
- pygtk is the gold standard of windowing environments, but GTK can be complicated to install in non-Linux environments (like Darwin).
NetCDF suport:
- netcdf4 (RECOMMENDED) is a utility to read and write NetCDF4 files. It can also be used to read and write 'classic' (version 3) NetCDF files.
- pycdf is a simple NetCDF (version 3) reader. It uses a slightly different API than the Scientific.IO package.
- pytables is a very powerful utility for working with HDF5 files. NetCDF4 files are basically HDF5 files, but it is not yet clear to me if pytables will be able to read and write NetCDF4
Introduction
MATLAB is presently the defacto standard in analysis of ROMS model output. This is in part because MATLAB is the standard analysis tool for observational oceanography (although not always), but mostly because of the Herculean efforts of Rich Signell (with much help from Chuck Denham and John Evans). Signell et al. created a usefull, flexible, and comprehensive set of tools -- even the ones that are hard to write like Seagrid, which sucks less than any other similar tool out there. However, MATLAB's dominance was not predestined. Many other numerical ocean circulation sects use ferret, or some similarly goofy tool. In the early days of ROMS, Hernan Arrango created a suit of tools based on NCAR Graphics, but that never really caught on outside of Rutgers.
There are many things wrong with MATLAB, we all know what they are: memory leaks, slowness, strange postscript, license servers, etc. We all put up with these problems because it has been the best thing out there. It is flexible, dynamic, and once you learn how to think within its framework (i.e., vectorizing) it can be fast and powerful. In short, we use matlab for the same reason we use any tool: there is nothing better available.
In fact, the MATLAB stranglehold is not tight as you might think. There are no aspects of MATLAB that are necessary to model analysis. In particular, there are no essential toolboxes -- the tools that we use for model analysis have been created by the user community, and are available for free. Most notably, these include the MexCDF toolbox and m_map.
Why use python?
Switching to another computer language involves learning a whole new set of programing techniques and writing a new suite of tools. In order make this initial investment in time worthwhile, there needs to be some clear (and large) benefit. I believe python is worth the effort to switch for the following reasons:
- Python is an extremely powerful programing language. This means that you can use python for a wide variety of programing tasks, not just model analysis.
- Python is naturally object oriented. These means programs will also tend to be stronger and more flexible because of the nature of python programing. It is simple to convert a haphazard set of tools into a cohesive toolbox.
- There are many freely available packages that can perform tasks you might want to do. In particular, matplotlib has a plotting interface very similar to that of MATLAB, meaning that the learning curve is considerably reduced.
Advantages of Python over MATLAB:
- As mentioned above, Python is a real, powerful scripting language in it's own right. Scientific modules built on top of this draw on this power. For example, creation of classes is very easy, functions may be passed as arguments, there are a number of built-in types (besides the array types) that are very powerful -- e.g., lists, dictionaries, and sets.
- It is easy to wrap existing C and FORTRAN code in python. For example, an existing FORTRAN function requires only a comment or two to specify which arguments are intended to be output. There are a number of tools for doing this, and most of the scientific python tools use some sort of wrapping so that the code runs very fast.
- Python is open source, free, and portable. Python is improving all the time. Theoretically, the tools you build could run your PDA. Practically, python runs on all modern computers, and there is an active development community.
The basics:
Pre-installation (These instructions are primarily for Mac OS X, to get you to the point where you can install the tools listed above in the one-stop shopping.) First, you will need to install a python with all the goodies (e.g., Tkinter and readline) installed. In the future, I believe this will be standard, but for now get it at undefined.org. Then you need some sort of windowing (Tk, Wx, or GTK) support. The simplest is perhaps the 'batteries-included distribution of Tcl/Tk Aqua to get Tk. Now, install NumPy/SciPy, matplotlib, and a NetCDF IO utility.
You should be hacking away in no time, but you may wish to learn more about how to actually program in python. I think you will be amazed how easy it is to learn. There are some links to get you started listed on python.org. For the programing neophyte, two of my favorites include instant hacking and how to think like a computer scientist. If you already know how to program, check out instant python, a (scientific) Python Short Course, and Python for Science. All of these links are available from the python.org Python Intros section
For an introduction to programing in python using numpy, check out the tutorial and the Cookbook.