Gaea, Analysis and GFDL notes
This document is a collection of all the poorly documented or
undocumented things that I have come across while hacking around with
the HSt42 model in idealized.xml (basic_hs.xml and zonalacc.xml are
the resulting files that I have written). I assume that most of these
principles are fairly general to the GFDL/gaea system.
Table of Contents
1 Storage and Data
- long term fast scratch (ltfs) items are deleted after 90 days (roughly?) if not modified. This is automatic. Therefore keep my modified code in user directory. I use symbolic links in ltfs to my code to avoid having to deal with paths.
- fast scratch (fs) is cleared more frequently so shouldn't be used for anything.
- archive (see below) is backed up and permanent.
1.1 Analysis and Archive
/archive/Thomas.Flannaghan is where data is delivered after runs
finish (automatic). This directory is on the analysis machine (also
mounted as read-only on public). Data analysis should be done on the
analysis machine. Log in via public using a different RSA code.
Data is delivered to rather cryptic folders. I've written a little
tool called archtool.py to automate the organising of runs. This tool
displays a list of all runs on archive along with when they were
created. At present, the only thing that can be done is automatically
extracting the data (and run namelists) to a new sub-directory ("run
code" in the run index). The tool automatically recognises and handles
restart runs.
1.2 Copying data
- gcpallows movement between gaea, analysis and GFDL but not outside it.
- Normal scpandsshare available on GFDL workstations (i.e.public).
- archiveis mounted on- publicat- /archive/$USER/(read only) so we can do- public> scp /archive/$USER/.... atmos02:/...to move data to sayre.
2 XML, Compiling and Running
This section is valid for running the HSt42 spectral dynamical core
(defined in idealized.xml). Other models may be more complicated but
I suppose the principles are the same.
2.1 Commands
2.1.1 fremake
Performs a CVS operation (see below) used to make a new experiment. This command does not actually do the make, but instead returns a command that will run the makescript. This command should be run after any source modifications are made.
fremake -x <xmlfile> <experiment>
2.1.2 frerun
frerun does two things; first, it constructs the input namelists and
diag-table files. Then it submits the experiment as a job.
frerun --overwrite -r basic --submit-staged -x <xmlfile> <experiment>
The --overwrite flag refers to overwriting the results output on
analysis. It does not overwrite the run environment on fast scratch.
Alternatively, --unique can be used instead; this saves results to
unique locations on analysis. Use --unique to run multiple copies of
the same experiment simultaneously. The --unique flag uses the same
binary as the previous run, so don't use when code updated.
The -r basic option specifies the regression run. Omit and fremake
will do a production run. Currently I only know about regression runs,
which are good for any job that takes less than 16 hours to complete
(typically 5000 days at T42 with 60 levels).
2.1.3 Other useful commands
- showq -u $USERlists the current jobs belonging to- $USER.
- mjobctl -c gaea.xxxxxxxcancels a job.
- gcpcopies stuff between- gaeaand- analysis. Cannot do directories, so- tarfirst.
2.1.4 My custom commands
- archtool.pyis a tool for managing archive. It detects restart runs, extracts the tar files, and saves the restart information.
- add_restart.pyis a tool for setting up regression runs with restart information. See Restart runs for documentation.
2.2 Notes
All settings for compiling experiments and for running jobs can be
specified in the XML file. frerun remakes all namelist files so if
only runtime options in the XML are modified, we do not need to
remake.
The "group letter" is found in the XML inside the project tag. This
is "gfdlw". Any references to c1 and c2 should be removed from this
string. The job will silently fail if this is wrong or not set :-(. An
XML example is
... <platform name="ncrc2.default"> <directory stem="$(FRE_STEM)"/> <project>gfdl_w</project> ...
2.3 Making a new experiment
An experiment is defined as a unique codebase. We do not need new experiments if we are just changing namelists. They are defined in the XML.
- Copy an existing experiment in the XML
- Change the name and path of the experiment (see example below)
- Run fremakebut don't run command given at end of fremake
- Check to see that CVS operations took place during fremake. If not, the name or path was not changed correctly (see step 2) and an existing experiment was in the path given.
The experiment XML should end up like this:
<experiment name="{experiment name}">
  <description>
    Blah blah blah
  </description>
  <component name="fms_spectral_solo" 
             paths="$(rootDir)/{experiment name}/src" 
             includeDir="$(srcDir)/shared/include">
    <source versionControl="cvs">
      ...
where {experiment name} is the name of the experiment (must be unique
in the XML file).
2.4 Modifying the model code
2.4.1 Basic Procedure
If modifying an existing experiment, skip to step 2 below. Running
fremake again is not a good idea (although it will detect and not
overwrite existing code).
- Make a new experiment (see above)
- Change fortran code on ltfs. Save changes in home dir (ltfs is volatile).
- Run command returned by step 1 (don't need to use msub– run it directly). It's a make script and it will compile the code onltfs.
- Fiddle with namelists, output tables in the XML etc
- Submit the model job using *frerun with overwrite flag set. For some reason, using the unique flag uses the old executable.
2.4.2 Comments
When first run, fremake downloads the model code to ltfs. Other invocations of fremake do not do this. It then gives a script to compile the code. Don't run this until you've made changes to the code stored on ltfs under the correct name.
ltfs stuff is sometimes deleted. Always keep a copy of changed code (or maybe the whole tree) in your home directory.
The make script just compiles the binaries. It does not read the namelist parameters set in the xml file.
The run script does not change the ltfs code either. It uses fs.
2.5 Editing and making directly
There is an interactive mode which is good for debugging compiler and runtime errors. To get an interactive session, use
msub -I -l walltime=01:00:00,size=16
This gives a terminal that will work for 1 hour with 16 cores (the
number of cores must be the same as npes in the XML – see *Horizontal resolution). From this interactive terminal, we can make the model
using make (with env script) or using the build script, and we can run
the model executable using
aprun -n 16 fms_HSt42_.....x
Running the model produces a lot of netcdf files which need combining
and sending to archive. I'm not sure how this is done yet. Again, the
n option refers to the number of cores and must be the same as npes in
the XML.
2.6 Restart runs
All runs (regression and production) can be restarted.
2.6.1 Regression runs
The process for regression runs has now been automated by my script
xml/add_restart.py. It is used in the following manner:
frerun -u <frerun-options> | ./add_restart.py <restart-dir>
where <frerun-options> are standard frerun options but cannot include
--submit-staged, and <restart-dir> is a directory containing the
relevant restart files. The script has been tested and works
well. Suitable restart directories are stored in ~/restarts on gaea.
The procedure that the script implements is as follows:
- Find the original regression run on /fs/work/.... It should contain a RESTART directory. Note down it's path usingpwd.
- Run frerunwith the--uniqueflag and without the--submit-stagedflag. It will exit with anmsubcommand to run. do not run this msub command yet!
- Open the file that is the argument to msub(the run script). It is located in~/siena_201207/.../scripts/run, and will contain something like this:cd $workDir set dataFilesNotOK = ( ) if ( $dataFilesNotOK > 0 ) then A search for "dataFilesNotOK" will find this region. 
- Edit this region of the runscript:
cd $workDir set dataFilesNotOK = ( ) # add this line. it copies the restart files to input of new run # folder. <path_to_run> should be an absolute path. cp <path_to_run>/RESTART/* INPUT/. ### if ( $dataFilesNotOK > 0 ) then where <path_to_run>is the full path from step 1
- Issue the msubcommand given at the end of step 2
- Details
 The files needed for a correct restart areatmos_model.res,atmosphere.res.ncandspectral_dynamics.res.nc. These just contain the model state and the number of days elapsed. Namelists are not loaded from the restarted run. Therefore, we must use appropriate restarts given the namelist used.
3 Model Configuration
3.1 Horizontal resolution
The HSt42 model defined in idealized.xml gives a typical T42
resolution run, but we can modify the xml to give different
resolutions. The key spectral_dynamics_nml namelist parameters are
| namelist parameter | T21 | T42 | T85 | 
|---|---|---|---|
| num_fourier | 21 | 42 | 85 | 
| num_spectral | 22 | 43 | 86 | 
| lat_max | 32 | 64 | 128 | 
| lon_max | 64 | 128 | 256 | 
In addition, we must change the regression or production run specification to indicate the number of processors. For regression runs, the specification for T42 is something like
<regression name="..."> <run days="..." npes="16" runTimePerJob="..."/> </regression>
The npes field is something to do with the number of processors
required for the run (this is also the number used when triggering an
interactive session). For T85, increase npes to "32". When the run
actually is submitted, it typically uses twice the number of cores as
specified here.
3.2 Vertical coordinate
The spectral_dynamics_nml controls the vertical coordinate. The
vert_coord_option allows various different ways to define vertical
coordinates. The simplest of these are "even_sigma" and
"uneven_sigma".
"even_sigma" simply divides the vertical coordinate evenly in sigma,
from 0 to 1. num_levels gives the number of levels.
"uneven_sigma" allows more complicated sigma coordinates. Again,
num_levels gives the number of levels. The configuration namelist
settings are as follows
| option | meaning | Default | 
|---|---|---|
| surf_res | Confusing – see exact definitions | 0.1 | 
| exponent | Confusing – see exact definitions | 2.5 | 
| scale_heights | Top level expressed in scale heights | 4 | 
| zero_top | Forces top sigma level to be 0 | .true. | 
Reasonable (i.e. approx 1 km) resolution at the TTL/lower stratosphere
can be obtained with 60 levels, and scale_heights = 10 (the remaining
parameters are set to the defaults). Polvani-Kushner use a set of
custom specified levels but these lie closest to
=scale_heights= =11.0, =surf_res= = 0.5, exponent = 3,
These values are from Martin Jucker (he fitted these parameters to the PK levels).
The exact definitions (from atmos_spectral/init/vert_coordinate.F90):
p = a * p_ref + b * p_surf
where p_surf is the instantaneous surface pressure and p_ref is some
constant. uneven_sigma defines a and b as follows
! a is set to zero a = 0 ! b is defined by: do k = 1,num_levels zeta = 1 - float(k - 1) / float(num_levels) z = =surf_res= * zeta + (1 - surf_res) * zeta**exponent b(k) = exp(-z * scale_heights) enddo ! sort out the model surface level and top level b(num_levels+1) = 1.0 if (zero_top) b(1) = 0.0
Therefore, when surf_res = 1 we have roughly evenly spaced levels in
height (exactly even in log-pressure coordinates). When surf_res = 0
we have exponentially spaced levels. This means that surf_res doesn't
really have anything to do with the surface – it is more like saying
"how close to linear are we?". It's value affects the resolution
everywhere.
Here is a simple calculator which also gives the approximate height (log-pressure height).
# namelist parameters:
num_levels = 10
scale_heights = 11
surf_res = 0.5
exponent = 3
import math
for k in range(1, num_levels + 1):
    zeta = 1 - (k - 1) / float(num_levels)
    z = surf_res * zeta + (1 - surf_res) * zeta ** exponent
    b = math.exp(-z * scale_heights)
    b_km = 7 * z * scale_heights       
    print 'b(%d) =\t%0.3f (approx %0.1f km)' % (k, b, b_km)
3.3 Sponge
There is no sponge by default in the HSt42 model defined in
idealized.xml. There is only an optional very simple sponge available,
which acts on only the top level (defined in spectral_damping.F90). It
is defined by the following parameters in spectral_dynamics_nml:
| option | meaning | default | 
|---|---|---|
| eddy_sponge_coeff | Eddy damping in top layer | 0 | 
| zmu_sponge_coeff | Zonal mean u damping | 0 | 
| zmv_sponge_coeff | Zonal mean v damping | 0 | 
All coefficients are in units of m2/s, and are applied as damping terms like \(coeff * \Delta^2\).
The Polvani-Kushner hs_forcing.F90 file (given to me by Martin) has a
more advanced sponge. I have added it to my hs_forcing_zonalacc.F90
file. The following options control the sponge, and are given in the
hs_forcing_nml namelist.
| option | meaning | default | 
|---|---|---|
| sponge_flag | Is sponge used? | .false. | 
| sponge_tau_days | The sponge timescale (days) | 1 | 
| sponge_pbottom | The lowest level to apply sponge (Pascals) | 100 | 
The sponge is a Rayleigh friction with coefficient given by
sponge_coeff * (1 - p / sponge_pbottom) ** 2. sponge_coeff is the
inverse timescale derived from sponge_tau_days.
3.4 Held-Suarez Forcing
| option | default | |
|---|---|---|
| t_zero | Surface temperature at equator | 315 | 
| t_strat | Stratospheric temperature | 200 | 
| delh | Equator to pole temperature difference | 60 | 
| delv | Delta Theta per scale height | 10 | 
| eps | Hemispheric assymetry | 0 | 
| sigma_b | Boundary layer sigma | 0.7 | 
| ka | Newtonian damping in free atmos | -40 | 
| ks | Newtonian damping in boundary layer | -4 | 
| kf | Rayleigh damping in boundary layer | -1 | 
| do_conserve_energy | Converves energy dissipated via | .true. | 
| Rayleigh damping (and other momentum | ||
| forcings) by heating. | 
The damping rates k are given in units of days if negative. Not sure about units if positive, but probably days-1. Temperatures are all in Kelvin.
do_conserve_energy does not really affect the solution to the
local_forcing-type problems. Both have been tested with minimal
differences (<5% difference).
3.5 Local Momentum Forcing
This is my custom code documentation. I have added this behaviour to
hs_forcing_zonalacc.F90, which replaces hs_forcing.F90 (using a
symbolic link). The namelist parameters go into the hs_forcing
namelist. All options are prefixed with local_forcing and are given
here:
| local_forcing* | meaning | units | default | 
|---|---|---|---|
| _option | Defines the forcing mode | '' | |
| _xwidth | Longitudinal half-width | degrees | 10 | 
| _ywidth | Latitudinal half-width | degrees | 10 | 
| _ycenter | Central latitude | degrees | 0 | 
| _zwidth | Half-width in height | scale height | 0.5 | 
| _zcenter | Central height. | scale height | 2.2 | 
| _amp | Amplitude of forcing. | m/s/day | 1 | 
| _start | Start time of forcing. | days | -1 | 
| _end | End time of forcing. | days | -1 | 
local_forcing_start and local_forcing_end allow control of when
the forcing is turned on and off. A value of -1 (the default) removes
that time constraint. For example, switching on from 100 days would
give local_forcing_start = 100 and local_forcing_end = -1.
local_forcing_option is a string and can have several values:
| local_forcing_option | meaning | 
|---|---|
| '' | No forcing. The local_forcingcode is not run. | 
| 'Cos2Dipole' | A two-signed forcing in the vertical. | 
| For positive amplitude, negative forcing is | |
| on the top and positive forcing is on the | |
| bottom. The forcing has the form | |
| cos(x)2 cos(y)2 sin(z) | |
| 'CosDipole' | Vertical structure as above, but with | 
| cos(x) cos(y) horizontal structure. | |
| 'Cos2Single' | Single signed forcing (cos(z) vertical) | 
| 'CosSingle' | – ditto – | 
| 'CosDipoleZM' | Zonal mean CosDipole – Zonal mean caveats | 
| 'Cos2DipoleZM' | Zonal mean Cos2Dipole – Zonal mean caveats | 
3.5.1 Zonal mean caveats
local_forcing_amp in the *ZM cases is the peak zonal mean forcing but
in the other cases, it is the peak local amplitude. Here's a
calculator that works out the equivalent local amplitude equivalent to
the zonal mean case (i.e. with same zonal mean), and to get the
10S-10N equivalent amplitude,
from numpy import *
Lx = 30 # degrees
Ly = 10 # degrees
A = 0.3 # zonal mean amplitude
x = linspace(-180, 180, 5000)
y = linspace(-10, 10, 100)
f = cos(0.5 * pi * x / Lx) * cos(0.5 * pi * y[:,None] / Ly) \
    * ((abs(x) < Lx) & (abs(y[:,None]) < Ly))
f *= A / f.mean(axis=-1).max()
print 'zonal mean amplitude, A     ', f.mean(axis=-1).max()
print '10N-10S zonal mean amplitude', f.mean()
print 'local amplitude             ', f.max()
print '10N-10S local amplitude     ', f.mean(axis=0).max()
print 'Theoretical local amplitude ', A * pi * 360 / (4. * Lx)  
3.6 Local Heating
The base version of hs_forcing.F90 includes Isidoro's local
heating. In my version, I've removed this and replaced it my a local
heating routine that mirrors the local forcing routine. All namelist
variables do the same thing here as they do for local forcing and have
the same defaults, but are named local_heating_xxxx in place of
local_forcing_xxxx.
The only difference is that local_heating_amp is in units of K/day.
See above for all local_forcing settings.