Gaea, Analysis and GFDL notes

This document is a collection of all the poorly documented or undocumented things that I have come across while hacking around with the HSt42 model in idealized.xml (basic_hs.xml and zonalacc.xml are the resulting files that I have written). I assume that most of these principles are fairly general to the GFDL/gaea system.

1 Storage and Data
- 1.1 Analysis and Archive
- 1.2 Copying data
2 XML, Compiling and Running
3 Model Configuration

1 Storage and Data

long term fast scratch (ltfs) items are deleted after 90 days (roughly?) if not modified. This is automatic. Therefore keep my modified code in user directory. I use symbolic links in ltfs to my code to avoid having to deal with paths.
fast scratch (fs) is cleared more frequently so shouldn't be used for anything.
archive (see below) is backed up and permanent.

1.1 Analysis and Archive

/archive/Thomas.Flannaghan is where data is delivered after runs finish (automatic). This directory is on the analysis machine (also mounted as read-only on public). Data analysis should be done on the analysis machine. Log in via public using a different RSA code.

Data is delivered to rather cryptic folders. I've written a little tool called archtool.py to automate the organising of runs. This tool displays a list of all runs on archive along with when they were created. At present, the only thing that can be done is automatically extracting the data (and run namelists) to a new sub-directory ("run code" in the run index). The tool automatically recognises and handles restart runs.

1.2 Copying data

gcp allows movement between gaea, analysis and GFDL but not outside it.
Normal scp and ssh are available on GFDL workstations (i.e. public).
archive is mounted on public at /archive/$USER/ (read only) so we can do public> scp /archive/$USER/.... atmos02:/... to move data to sayre.

2 XML, Compiling and Running

This section is valid for running the HSt42 spectral dynamical core (defined in idealized.xml). Other models may be more complicated but I suppose the principles are the same.

2.1 Commands

2.1.1 fremake

Performs a CVS operation (see below) used to make a new experiment. This command does not actually do the make, but instead returns a command that will run the makescript. This command should be run after any source modifications are made.

fremake -x <xmlfile> <experiment>

2.1.2 frerun

frerun does two things; first, it constructs the input namelists and diag-table files. Then it submits the experiment as a job.

frerun --overwrite -r basic --submit-staged -x <xmlfile> <experiment>

The --overwrite flag refers to overwriting the results output on analysis. It does not overwrite the run environment on fast scratch. Alternatively, --unique can be used instead; this saves results to unique locations on analysis. Use --unique to run multiple copies of the same experiment simultaneously. The --unique flag uses the same binary as the previous run, so don't use when code updated.

The -r basic option specifies the regression run. Omit and fremake will do a production run. Currently I only know about regression runs, which are good for any job that takes less than 16 hours to complete (typically 5000 days at T42 with 60 levels).

2.1.3 Other useful commands

showq -u $USER lists the current jobs belonging to $USER.
mjobctl -c gaea.xxxxxxx cancels a job.
gcp copies stuff between gaea and analysis. Cannot do directories, so tar first.

2.1.4 My custom commands

archtool.py is a tool for managing archive. It detects restart runs, extracts the tar files, and saves the restart information.
add_restart.py is a tool for setting up regression runs with restart information. See Restart runs for documentation.

2.2 Notes

All settings for compiling experiments and for running jobs can be specified in the XML file. frerun remakes all namelist files so if only runtime options in the XML are modified, we do not need to remake.

The "group letter" is found in the XML inside the project tag. This is "gfdl_w". Any references to c1 and c2 should be removed from this string. The job will silently fail if this is wrong or not set :-(. An XML example is

...
<platform name="ncrc2.default">
  <directory stem="$(FRE_STEM)"/>
  <project>gfdl_w</project>
  ...

2.3 Making a new experiment

An experiment is defined as a unique codebase. We do not need new experiments if we are just changing namelists. They are defined in the XML.

Copy an existing experiment in the XML
Change the name and path of the experiment (see example below)
Run fremake but don't run command given at end of fremake
Check to see that CVS operations took place during fremake. If not, the name or path was not changed correctly (see step 2) and an existing experiment was in the path given.

The experiment XML should end up like this:

<experiment name="{experiment name}">
  <description>
    Blah blah blah
  </description>
  <component name="fms_spectral_solo" 
             paths="$(rootDir)/{experiment name}/src" 
             includeDir="$(srcDir)/shared/include">
    <source versionControl="cvs">
      ...

where {experiment name} is the name of the experiment (must be unique in the XML file).

2.4 Modifying the model code

2.4.1 Basic Procedure

If modifying an existing experiment, skip to step 2 below. Running fremake again is not a good idea (although it will detect and not overwrite existing code).

Make a new experiment (see above)
Change fortran code on ltfs. Save changes in home dir (ltfs is volatile).
Run command returned by step 1 (don't need to use msub – run it directly). It's a make script and it will compile the code on ltfs.
Fiddle with namelists, output tables in the XML etc
Submit the model job using *frerun with overwrite flag set. For some reason, using the unique flag uses the old executable.

2.4.2 Comments

When first run, fremake downloads the model code to ltfs. Other invocations of fremake do not do this. It then gives a script to compile the code. Don't run this until you've made changes to the code stored on ltfs under the correct name.

ltfs stuff is sometimes deleted. Always keep a copy of changed code (or maybe the whole tree) in your home directory.

The make script just compiles the binaries. It does not read the namelist parameters set in the xml file.

The run script does not change the ltfs code either. It uses fs.

2.5 Editing and making directly

There is an interactive mode which is good for debugging compiler and runtime errors. To get an interactive session, use

msub -I -l walltime=01:00:00,size=16

This gives a terminal that will work for 1 hour with 16 cores (the number of cores must be the same as npes in the XML – see *Horizontal resolution). From this interactive terminal, we can make the model using make (with env script) or using the build script, and we can run the model executable using

aprun -n 16 fms_HSt42_.....x

Running the model produces a lot of netcdf files which need combining and sending to archive. I'm not sure how this is done yet. Again, the n option refers to the number of cores and must be the same as npes in the XML.

2.6 Restart runs

All runs (regression and production) can be restarted.

2.6.1 Regression runs

The process for regression runs has now been automated by my script xml/add_restart.py. It is used in the following manner:

frerun -u <frerun-options> | ./add_restart.py <restart-dir>

where <frerun-options> are standard frerun options but cannot include --submit-staged, and <restart-dir> is a directory containing the relevant restart files. The script has been tested and works well. Suitable restart directories are stored in ~/restarts on gaea.

The procedure that the script implements is as follows:

Find the original regression run on /fs/work/.... It should contain a RESTART directory. Note down it's path using pwd.
Run frerun with the --unique flag and without the --submit-staged flag. It will exit with an msub command to run. do not run this msub command yet!
Open the file that is the argument to msub (the run script). It is located in ~/siena_201207/.../scripts/run, and will contain something like this:
```
cd $workDir
set dataFilesNotOK = ( )
if ( $dataFilesNotOK > 0 ) then  
```
A search for "dataFilesNotOK" will find this region.

Edit this region of the runscript:

cd $workDir
set dataFilesNotOK = ( )
# add this line. it copies the restart files to input of new run
# folder. <path_to_run> should be an absolute path.
cp <path_to_run>/RESTART/* INPUT/.
###
if ( $dataFilesNotOK > 0 ) then

where <path_to_run> is the full path from step 1

Issue the msub command given at the end of step 2

Details
The files needed for a correct restart are atmos_model.res, atmosphere.res.nc and spectral_dynamics.res.nc. These just contain the model state and the number of days elapsed. Namelists are not loaded from the restarted run. Therefore, we must use appropriate restarts given the namelist used.

3 Model Configuration

3.1 Horizontal resolution

The HSt42 model defined in idealized.xml gives a typical T42 resolution run, but we can modify the xml to give different resolutions. The key spectral_dynamics_nml namelist parameters are


namelist parameter	T21	T42	T85
`num_fourier`	21	42	85
`num_spectral`	22	43	86
`lat_max`	32	64	128
`lon_max`	64	128	256

In addition, we must change the regression or production run specification to indicate the number of processors. For regression runs, the specification for T42 is something like

<regression name="...">
  <run days="..." npes="16" runTimePerJob="..."/>
</regression>

The npes field is something to do with the number of processors required for the run (this is also the number used when triggering an interactive session). For T85, increase npes to "32". When the run actually is submitted, it typically uses twice the number of cores as specified here.

3.2 Vertical coordinate

The spectral_dynamics_nml controls the vertical coordinate. The vert_coord_option allows various different ways to define vertical coordinates. The simplest of these are "even_sigma" and "uneven_sigma".

"even_sigma" simply divides the vertical coordinate evenly in sigma, from 0 to 1. num_levels gives the number of levels.

"uneven_sigma" allows more complicated sigma coordinates. Again, num_levels gives the number of levels. The configuration namelist settings are as follows


option	meaning	Default
`surf_res`	Confusing – see exact definitions	0.1
`exponent`	Confusing – see exact definitions	2.5
`scale_heights`	Top level expressed in scale heights	4
`zero_top`	Forces top sigma level to be 0	.true.

Reasonable (i.e. approx 1 km) resolution at the TTL/lower stratosphere can be obtained with 60 levels, and scale_heights = 10 (the remaining parameters are set to the defaults). Polvani-Kushner use a set of custom specified levels but these lie closest to

=scale_heights=       =11.0,
=surf_res=            = 0.5,
exponent            = 3,

These values are from Martin Jucker (he fitted these parameters to the PK levels).

The exact definitions (from atmos_spectral/init/vert_coordinate.F90):

p = a * p_ref + b * p_surf

where p_surf is the instantaneous surface pressure and p_ref is some constant. uneven_sigma defines a and b as follows

! a is set to zero
a = 0

! b is defined by:
do k = 1,num_levels
   zeta = 1 - float(k - 1) / float(num_levels)
   z    = =surf_res= * zeta + (1 - surf_res) * zeta**exponent
   b(k) = exp(-z * scale_heights)
enddo

! sort out the model surface level and top level
b(num_levels+1) = 1.0
if (zero_top) b(1) = 0.0

Therefore, when surf_res = 1 we have roughly evenly spaced levels in height (exactly even in log-pressure coordinates). When surf_res = 0 we have exponentially spaced levels. This means that surf_res doesn't really have anything to do with the surface – it is more like saying "how close to linear are we?". It's value affects the resolution everywhere.

Here is a simple calculator which also gives the approximate height (log-pressure height).

# namelist parameters:
num_levels = 10
scale_heights = 11
surf_res = 0.5
exponent = 3

import math
for k in range(1, num_levels + 1):
    zeta = 1 - (k - 1) / float(num_levels)
    z = surf_res * zeta + (1 - surf_res) * zeta ** exponent
    b = math.exp(-z * scale_heights)
    b_km = 7 * z * scale_heights       
    print 'b(%d) =\t%0.3f (approx %0.1f km)' % (k, b, b_km)

3.3 Sponge

There is no sponge by default in the HSt42 model defined in idealized.xml. There is only an optional very simple sponge available, which acts on only the top level (defined in spectral_damping.F90). It is defined by the following parameters in spectral_dynamics_nml:


option	meaning	default
`eddy_sponge_coeff`	Eddy damping in top layer	0
`zmu_sponge_coeff`	Zonal mean u damping	0
`zmv_sponge_coeff`	Zonal mean v damping	0

All coefficients are in units of m²/s, and are applied as damping terms like $coeff * \Delta^2$.

The Polvani-Kushner hs_forcing.F90 file (given to me by Martin) has a more advanced sponge. I have added it to my hs_forcing_zonalacc.F90 file. The following options control the sponge, and are given in the hs_forcing_nml namelist.


option	meaning	default
`sponge_flag`	Is sponge used?	.false.
`sponge_tau_days`	The sponge timescale (days)	1
`sponge_pbottom`	The lowest level to apply sponge (Pascals)	100

The sponge is a Rayleigh friction with coefficient given by sponge_coeff * (1 - p / sponge_pbottom) ** 2. sponge_coeff is the inverse timescale derived from sponge_tau_days.

3.4 Held-Suarez Forcing


option		default
`t_zero`	Surface temperature at equator	315
`t_strat`	Stratospheric temperature	200
`delh`	Equator to pole temperature difference	60
`delv`	Delta Theta per scale height	10
`eps`	Hemispheric assymetry	0
`sigma_b`	Boundary layer sigma	0.7
`ka`	Newtonian damping in free atmos	-40
`ks`	Newtonian damping in boundary layer	-4
`kf`	Rayleigh damping in boundary layer	-1
`do_conserve_energy`	Converves energy dissipated via	.true.
	Rayleigh damping (and other momentum
	forcings) by heating.

The damping rates k are given in units of days if negative. Not sure about units if positive, but probably days^-1. Temperatures are all in Kelvin.

do_conserve_energy does not really affect the solution to the local_forcing-type problems. Both have been tested with minimal differences (<5% difference).

3.5 Local Momentum Forcing

This is my custom code documentation. I have added this behaviour to hs_forcing_zonalacc.F90, which replaces hs_forcing.F90 (using a symbolic link). The namelist parameters go into the hs_forcing namelist. All options are prefixed with local_forcing and are given here:


`local_forcing*`	meaning	units	default
`_option`	Defines the forcing mode		''
`_xwidth`	Longitudinal half-width	degrees	10
`_ywidth`	Latitudinal half-width	degrees	10
`_ycenter`	Central latitude	degrees	0
`_zwidth`	Half-width in height	scale height	0.5
`_zcenter`	Central height.	scale height	2.2
`_amp`	Amplitude of forcing.	m/s/day	1
`_start`	Start time of forcing.	days	-1
`_end`	End time of forcing.	days	-1

local_forcing_start and local_forcing_end allow control of when the forcing is turned on and off. A value of -1 (the default) removes that time constraint. For example, switching on from 100 days would give local_forcing_start = 100 and local_forcing_end = -1.

local_forcing_option is a string and can have several values:


`local_forcing_option`	meaning
''	No forcing. The `local_forcing` code is not run.
'Cos2Dipole'	A two-signed forcing in the vertical.
	For positive amplitude, negative forcing is
	on the top and positive forcing is on the
	bottom. The forcing has the form
	cos(x)² cos(y)² sin(z)
'CosDipole'	Vertical structure as above, but with
	cos(x) cos(y) horizontal structure.
'Cos2Single'	Single signed forcing (cos(z) vertical)
'CosSingle'	– ditto –
'CosDipoleZM'	Zonal mean CosDipole – Zonal mean caveats
'Cos2DipoleZM'	Zonal mean Cos2Dipole – Zonal mean caveats

3.5.1 Zonal mean caveats

local_forcing_amp in the *ZM cases is the peak zonal mean forcing but in the other cases, it is the peak local amplitude. Here's a calculator that works out the equivalent local amplitude equivalent to the zonal mean case (i.e. with same zonal mean), and to get the 10S-10N equivalent amplitude,

from numpy import *
Lx = 30 # degrees
Ly = 10 # degrees
A = 0.3 # zonal mean amplitude

x = linspace(-180, 180, 5000)
y = linspace(-10, 10, 100)

f = cos(0.5 * pi * x / Lx) * cos(0.5 * pi * y[:,None] / Ly) \
    * ((abs(x) < Lx) & (abs(y[:,None]) < Ly))
f *= A / f.mean(axis=-1).max()
print 'zonal mean amplitude, A     ', f.mean(axis=-1).max()
print '10N-10S zonal mean amplitude', f.mean()
print 'local amplitude             ', f.max()
print '10N-10S local amplitude     ', f.mean(axis=0).max()
print 'Theoretical local amplitude ', A * pi * 360 / (4. * Lx)

3.6 Local Heating

The base version of hs_forcing.F90 includes Isidoro's local heating. In my version, I've removed this and replaced it my a local heating routine that mirrors the local forcing routine. All namelist variables do the same thing here as they do for local forcing and have the same defaults, but are named local_heating_xxxx in place of local_forcing_xxxx.

The only difference is that local_heating_amp is in units of K/day.

See above for all local_forcing settings.