When I need to manipulate large data files (i.e. 100’s of GBs or TBs), I usually write a script in NCL or Python to do the job. Recently I’ve realized that this is often a huge waste of time! The problem is that my approach requires the task to be broken down into chunks, which slows the process by several orders of magnitude due to the computational overhead that results from iterations, as well as my own time to write the code.
A much faster and cleaner way to manipulate large datasets is to use command line operators that are specifically designed for climate datasets.Two libraries have been independently developed to do this, specifically:
The “Climate Data Operators” (CDO) library was developed at the Max-Planck Institute in Hamburg, Germany, whereas the “NetCDF Operators” (NCO) library was developed as an open-source project by various people. They can both do the same things, but the commands look very different. They also work differently under the hood, which can result in different performance outcomes for the same calculation.
Here’s a nice list of simple NCO examples. The nice thing about NCO is that there is a short list of basic commands. In spite of this simplicity, NCO commands can be more complicated than CDO commands.
- ncap – NetCDF Arithmetic Processor
- ncatted – NetCDF Attribute Editor
- ncbo – NetCDF Binary Operator (ex. ncadd, ncmultiply)
- ncea – NetCDF Ensemble Averager
- ncecat – NetCDF Ensemble Concatenator
- ncflint – NetCDF File Interpolator
- ncks – NetCDF Kitchen Sink
- ncpdq – NetCDF Permute Dimensions Quickly, Pack Data Quietly
- ncra – NetCDF Record Averager
- ncrcat – NetCDF Record Concatenator
- ncrename – NetCDF Renamer
- ncwa – NetCDF Weighted Averager
CDO has a long list of operators, which can be hard to remember. I still need to look them up everytime I use them, but I imagine I’ll start to remember a few overtime.
Here’s a simple example of combining a list of files with different timesteps into a single output file:
cdo copy ifile1 ifile2 ifile3 ofile
This is pretty straightforward. Doing the same thing with NCOis also pretty simple
ncrcat -h ifile1 ifile2 ifile3 ofile
One instance where CDO wins over NCO is converting grib files to netcdf:
cdo -f nc copy file.grb file.nc
An instance where NCO seems to have the advantage is editing variable attributes. The NCO attribute editor “ncatted” makes this pretty simple:
ncatted -O units,U,m,c,"m/s" file.nc
In the end, it’s not a matter of choosing the “best” library to add to the toolbox. Both have various strengths and weakness that should be exploited. Either way, they can save us huge amounts of time!
Over time I will be posting a series of short articles with examples and tricks. These are mostly just for my own reference, but I hope other people will find them useful. A list of these can be found here on my Publications page.