CESM: Common Errors when Building CESM 1.2 (OSX)

In a previous post, I exhaustively outlined the process of setting up the CESM model version 1.2 to run in single column mode on a Mac running OSX 10.9.5 (i.e. Darwin) with GCC 4.9.2.

This post documents some errors I ran into along the way. If I find more in the future, I’ll probably make a new post instead of updating this one, so be sure to check for new posts here, if you are looking for solutions to CESM bugs.


Deprecated parentheses warning

Use of qw(...) as parentheses is deprecated at....

This isn’t so much as an “error” as a minor annoyance. The ‘qw()‘ is actual a Perl function that stands for “quote word”. The warning only references 2-3 files, and there turns out to be a really simple fix. The message will indicate the file name and line that’s causing the warning. So just open up the 2-3 files and find the correct line. The line will look something like this:

for $x qw(a b c) { ... }

All you need to do is add parentheses around the ‘qw()’ function.

for $x (qw(a b c)) { ... }

Simple as that… except for the fact that the code that causes the problem is in the case directory, so every new case will have this problem! But, this code is just copied from the source code directory every time a case is created. Therefore the way to make it go away permanently is to find the corresponding file in the source code directory.

As an example, this file has the problem:

SCAM_TEST_T42/Tools/ConfigCase.pm

The corresponding source file can be found here

CESM_SRC/cesm1_2_0/scripts/ccsm_utils/Case.template/ConfigCase.pm

gmake can’t be found

gmake: Command not found.
cp: Makefile.conf: No such file or directory
gmake: Command not found.
This is telling you that the system is trying to use gmake, but on OSX we need to use make instead. To fix this we need to edit the build settings in the env_build.xml file, which is found in the case directory. It’s best to use the xmlchange script, especially for automating the build pocess.
./xmlchange -file env_build.xml    -id GMAKE  -val make

C compiler cannot create executables

checking whether the C compiler works... no
configure: error: in `/Users/whannah/Model/CESM/scratch/SCAM_ZM_TEST_00_T42/bld/mct':
configure: error: C compiler cannot create executables

I actually forgot how I solved this, and didn’t write it down. If I remember I’ll come back and update this post. My guess is that it had something to do with the version of GCC…?


Segmentation fault at startup

mpirun noticed that process rank 0 with PID 67038 on node <computer name> exited on signal 11 (Segmentation fault: 11).

This took me a long time to figure out, but a big part of it was that the “stack” was too small, which is also known as “stackspace exhaustion“. OSX can’t be set to “unlimited” like in Linux, but it can be increased to the max of 64MB:

ulimit -s 65532

 


 

 

18 thoughts on “CESM: Common Errors when Building CESM 1.2 (OSX)

  1. shoon

    I follow your instruction and change the GMAKE to make.
    The model successfully builds mct, gptl,pio,csm_share. However, it crashes when it builds: /cesm1_1_1/results/Test01/bld/atm.bldlog.160119-160724

    The error is very simple. Only one line:
    gmake: Command not found.

    Do you have any idea why it is?

    Reply
    1. Walter Post author

      Like I said above in the post, you need to set the build options in the env_build.xml file to use “make” instead of “gmake”, like so:
      ./xmlchange -file env_build.xml -id GMAKE -val make

      Reply
        1. Walter Post author

          hmmm, that’s very strange.
          Are you sure you changed the file for the right case?
          Or perhaps you changed it and then did one of the “clean” commands?
          Try going to the case directory and doing a command like:
          grep “GMAKE” ./* -r
          or
          grep “gmake” ./* -r
          to make sure gmake is not showing up anywhere.

          Reply
  2. shoon

    So far, the model could finish building the model. Thank you for all the help!!!
    But it is still not able to run yet.
    Could I ask where should I add “ulimit -s 65532”? is it in cesm or .cshrc file?

    Thanks!

    Reply
  3. shoon

    Hello Walter,

    I received this error message:
    Fatal error in MPI_Allreduce: Message truncated, error stack:
    MPI_Allreduce(907)…………….: MPI_Allreduce(sbuf=0x7f8ebb816500, rbuf=0x7f8ebb8164c0, count=2, MPI_INTEGER, MPI_MIN, comm=0xc4000062) failed
    MPIR_Allreduce_impl(764)……….:
    MPIR_Allreduce_intra(268)………:
    MPIR_Bcast_impl(1452)………….:
    MPIR_Bcast(1476)………………:
    MPIR_Bcast_intra(1287)…………:
    MPIR_Bcast_binomial(239)……….:
    MPIDI_CH3U_Receive_data_found(131): Message from rank 0 and tag 2 truncated; 712 bytes received but buffer size is 8

    I thought it is the same problem as you described as Segmentation fault at startup. So I type “ulimit -s 65532” before I do the *.run. But it doesn’t solve this problem.
    Do you know what this is?

    Thanks!

    Reply
    1. Walter Post author

      That’s a new one for me! I don’t know what any of that error message means. Are you running the model in serial or with multiple processors?

      Reply
      1. shoon

        I find in your previous post, it says I should uncomment this line?
        #mpirun -np 1 $EXEROOT/cesm.exe >&! cesm.log.$LID

        But mine is different, it is:
        mpirun -np 64 $EXEROOT/cesm.exe >&! cesm.log.$LID

        I don’t know why mine shows 64. Does it mean I run with 64 processors? I find I cannot simply change 64 to 1 here. Otherwise, the run message shows some other errors:
        Fatal error in MPI_Group_range_incl: Invalid argument, error stack:
        MPI_Group_range_incl(195)………: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7fff5eec6b70, new_group=0x7fff5eec6840) failed
        MPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1

        I did set up MAX_TASKS_PER_NODE = 1

        Reply
        1. Walter Post author

          Did you make sure to do this step: ” ./xmlchange -file env_build.xml -id MPILIB -val mpi-serial ” ?
          Perhaps you should try installing an alternate mpi library. If you are using mpich or openmp, try the opposite. That’s the only thing I can think to try.

          Reply
          1. shoon

            I found I put a space after mpi-serial. So it was not working correctly before, I guess.
            Now, when I run the model, it shows some other errors… I don’t know what problem is it…
            Sorry, I keep having problems. And thank you so much to spend so much time to help me.

            The last few lines are:

            Program received signal SIGFPE: Floating-point exception – erroneous arithmetic operation.

            Backtrace for this error:
            #0 0x11b4e4092
            #1 0x11b4e33b0
            #2 0x7fff8db9df19
            #3 0x1041a7e1f
            #4 0x1039e5a34
            #5 0x1037a6307
            #6 0x1037a8b23
            #7 0x103ae490d
            #8 0x1037a8cbe
            #9 0x1036617cb
            #10 0x1036204ee
            #11 0x103592460
            #12 0x1043082db

            ===================================================================================
            = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
            = PID 62798 RUNNING AT MacL3084.local
            = EXIT CODE: 8
            = CLEANING UP REMAINING PROCESSES
            = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
            ===================================================================================
            YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Floating point exception: 8 (signal 8)
            This typically refers to a problem with your application.
            Please see the FAQ page for debugging suggestions

          2. shoon

            Hello Walter,
            I eventually get my model run! Thank you so much!
            Could I ask another question? If I want to run with multiple processes on mac. Which parameter should I set? It seems with the settings you post, it automatically set the NTASKS and NTHRDS to 1 in env_mech_pes.xml. even I change them, they go back to 1 after configure and build.

            Thanks!

          3. Walter Post author

            I’m not sure, but at the very least you will need to set the MPILIB variable in the env_build.xml file to something other than “mpi-serial”. Did you already try changing that?

  4. Natsuki

    Hello walter,
    I met the problem about C compiler cannot create executables,
    and I try to install the new version pgi,but it didn’t work

    Reply
    1. Walter Post author

      I don’t use PGI anymore because I don’t have a license. I only use the GNU compiler on OSX, so I would suggest that you try installing different versions of GCC. Then make sure your compiler is set to “gnu” (see my previous post).

      Reply
  5. Shweta Das

    Hello,

    I am also facing the same seg fault issue what shoon has faced. Kindly let me know the fix. The last line says

    ===================================================================================
    = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
    = PID 89374 RUNNING AT cn14
    = EXIT CODE: 136
    = CLEANING UP REMAINING PROCESSES
    = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
    ===================================================================================
    YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Floating point exception (signal 8)
    This typically refers to a problem with your application.
    Please see the FAQ page for debugging suggestions

    Reply
    1. Walter Post author

      Sweta, you should look into using a debugger to figure out where the floating point exception is happening. You’ll need to figure out how to get a core dump file, which can be tricky on some systems.

      Reply

Leave a Reply to Walter Cancel reply

Your email address will not be published. Required fields are marked *