CDO: Extracting a variable across several files

Let’s say you have a bunch of daily NetCDF files for several years, each with many variables, and you want to extract a single variable and concatenate it into a single output file in one step. I recently had this scenario dealing with model output from the NCAR Community Earth System Model (CESM).

AQUA_ZM_00_1.9x2.5_344.cam.h1.0000-01-01-00000.nc
AQUA_ZM_00_1.9x2.5_344.cam.h1.0000-01-02-00000.nc
AQUA_ZM_00_1.9x2.5_344.cam.h1.0000-01-03-00000.nc
AQUA_ZM_00_1.9x2.5_344.cam.h1.0000-01-04-00000.nc
AQUA_ZM_00_1.9x2.5_344.cam.h1.0000-01-05-00000.nc
AQUA_ZM_00_1.9x2.5_344.cam.h1.0000-01-06-00000.nc
AQUA_ZM_00_1.9x2.5_344.cam.h1.0000-01-07-00000.nc
AQUA_ZM_00_1.9x2.5_344.cam.h1.0000-01-08-00000.nc
AQUA_ZM_00_1.9x2.5_344.cam.h1.0000-01-09-00000.nc
AQUA_ZM_00_1.9x2.5_344.cam.h1.0000-01-10-00000.nc
...

Most of the time I manipulate and analyze data with NCL, which has this really great function, addfiles(), that concatenates a variables across hundreds of files. It’s really fast and works great… until you get about 3 years worth of files. At that point I was getting a segmentation fault because there were too many files!

Concatenating NetCDF files with CDO is pretty straightforward:

cdo cat *.nc outfile.nc

This will include all variables, and concatenate the time dimension.

But what if you only want to strip off one variable? That by itself would be something like:

cdo -select,name=PRECC   ifile.nc  ofile.nc

And what if you also want to limit the domain to save space? That is generally done:

cdo -sellonlatbox,0,360,-40,40 ifile.nc  ofile.nc

We can combine these two request as follows:

cdo   -sellonlatbox,0,360,-40,40 -select,name=PRECC  ifile.nc  ofile.nc

Notice here that the order matters. I don’t quite understand why, but switching the order of the operators here gives me a segmentation fault.

Now, to get what we really want. I found that I don’t need the “cat” operator if I’m selecting a specific variable:

cdo   -select,name=PRECC   AQUA*.nc    ofile.nc

However, when I try a similar request with the -sellonlatbox option I get an error:

cdo   -sellonlatbox,0,360,-40,40   AQUA*.nc    ofile.nc 

>cdo sellonlatbox (Abort): Too many streams! Operator needs 1 input and 1 output streams.

So in this case, it appears you need two steps if you really want to reduce  the area of the data.

11 thoughts on “CDO: Extracting a variable across several files

  1. Dave

    I am running into the same problem (Too many input files). Did you find a way to do it in one step?
    Also, what would be a similar code in NCL, maybe I should switch to that program?

    Reply
  2. Walter Post author

    I don’t have a good way to do it in one step. NCL is a great programming language that plays really nice with NetCDF files, but has a learning curve. You would have to write a script to load the data and calculate the average or plot it, similar to what you would do with matlab or python. If you’re not in the atmospheric sciences then you might be better off writing a script with a more general language, like python. I use python a lot, but not for working with NetCDF files. However, I’ve talked to people who use python for things like this, so I know it can be done.

    The other thing to consider is that it might be worth it to just use multiple steps and write to a temporary output file. This would avoid having to learn a new language. If you’re familiar with shell scripts you could probably write a shell script to run several CDO commands in a row.

    Reply
  3. Meriem Deli

    Thank you very much, I was really struggling with an out of memory problem in matlab when trying to extract a variable.
    To concatenate many files you use either cat as you said or also a mergetime command to merge many netcdf files in one.
    I have a question concerning how to calculate the mean temperature from a netcdf file only in the land because with matlab when I put mean(T), it calculates the mean in the whole region so in the sea and land so I do not a good agreement with observations.

    Reply
    1. Walter Post author

      Glad I could help! If you’re getting an “out of memory” error from MatLab, then I would recommend trying to do your analysis in smaller chunks. I usually work with yearly data files of sub-daily data. I also create yearly files of monthly means for when I need a long-term average. I’ve also resorted to calculating running means or running standard deviations sometimes, but it definitely takes longer.

      I’m not sure how you produce a land-only average with CDO commands. I think you’ll need MatLab or NCL for that.

      Reply
  4. Meriem Deli

    Thanks for your quick reply, working with a yearly file could help, I tried also to extract the region I need with “cdo -sellonlatbox, 0,360,-40,40 AQUA*. NC file. NC ” and the out of memory problem was solved.
    I have another question if possible, I downloaded netcdf files, but when I tried to visualize them with ncview or to apply any CDO operator, I get this message:
    Ncview: can’t recognize format of input file. NC or
    Unsupported file type (library support not compiled in)
    CDO built with a NetCDF version which doesn’t support NetCDF4 data!
    Thanks and I am sorry for any disturbance,

    Reply
    1. Walter Post author

      That’s a strange error about not recognizing “.NC”. Have you tried changing it to lower case “.nc”? I kinda doubt that would work, but worth a try I suppose.

      Otherwise I would try building a netcdf4 library. If you use Linux then you can get it through the package manager. If you’re on a mac I would recommend using the “homebrew” package manager. I’ve had great success with homebrew. If you’re on windows then… I’m not sure what to do.

      Ncview is a fantastic piece of software once you get it working!

      Reply
      1. Meriem Deli

        Hello,

        I have a netcdf file T(time,lat,lon) I need to compute mean T over a specific region(my country) using CDO, did you have an idea on how to make a mask region?

        Thanks in advance;

        Reply
  5. Meriem Deli

    I worked before with ncview when building my netcdf files with WRF simulations and it was really a good tool for quick check, but now I am using cordex datasets so I download netcdf files directly through the website, in this case, some netcdf files from one website medcordex it works fine however from another website which is eurocordex it does not work neither for CDO nor for ncview.
    Sorry for my bad English, I am trying the problem.
    Thank

    Reply
  6. Meriem Deli

    Hello,

    please I have an issue, I am trying to concatenate some nc files using cdo operator cat but I got this error

    cdo cat: 75%
    cdo cat (Abort): Grid size of the input parameters do not match!

    Could you please help me fix this,

    Reply
  7. oumaima ghanimi

    hey guys, I’ve just started using CDO, I’m wondering if there is a way to calculate percentile in each grid cell from many netcdf files that differ only in the time and get one output file.

    Could you please help me ?

    Reply
    1. Walter Post author

      I think there might be a way to do this with CDO or NCO, but I’m not sure. This type of calculation seems better suited to traditional analysis with a standard language like python or R.

      Reply

Leave a Reply to Walter Cancel reply

Your email address will not be published. Required fields are marked *