POP goes the CAMulator

CAMulator ↔︎ POP2 inside CESM2

Will Chapman | CU Boulder

The Scientific Gap

We know how to run atmosphere-only and ocean-only:

Mode	Atmosphere	Ocean
AMIP	prognostic	prescribed SST
OMIP / GIAF	prescribed winds	prognostic
This work	AI prognostic	prognostic

The problem:

In OMIP, the ocean responds to forcing — but the forcing never changes based on what the ocean does
Warm SST anomalies can’t generate more evaporation, shift the jet stream, or modify precipitation
Interactive air-sea feedback is missing

Why it matters:

ENSO is an air-sea coupled phenomenon — you can’t simulate it correctly without feedback in both directions
MJO propagation depends on air-sea heat exchange
AMOC sensitivity to wind stress changes
Climate drift in long runs is partly an air-sea feedback problem

Note

Standard GIAF runs have been the workhorse of ocean-only CESM science for 20 years. We’re upgrading the atmosphere from “prescribed file reader” to “actual physics.”

CAMulator in 30 Seconds

You know this model — just a quick frame:

Autoregressive 6-hr emulator of CAM6
1° resolution, 130 prognostic channels
Explicitly conserves dry air mass, moisture, total energy
350× speedup over CAM6
Captures ENSO, NAO, PNA, realistic variability

The limitation this coupling addresses:

CAMulator was trained and evaluated with prescribed SST — it responds to the ocean but the ocean never responds back. This is exactly the AMIP problem, now for ML models.

This work: replace the prescribed SST with a live POP2 ocean.

%%{init: {'theme': 'default'}}%%
graph TD
    A["CAM6<br/>(full physics)"] -->|"train on 20 yrs"| B["CAMulator<br/>(AI emulator)"]
    D["Prescribed SST<br/>(AMIP-style)"] -->|"training regime"| B
    B -->|"350× faster"| C["Large ensembles<br/>Centennial runs"]
    E["Live POP2 SST<br/>(this work)"] -.->|"new frontier"| B

Today’s question: Does a coupled CAMulator + POP2 system stay stable and physically meaningful?

What I Built

World’s first coupling of an ML atmosphere to a dynamic ocean + sea ice

CAMulator replaces DATM prescribed forcing entirely
POP2 + CICE receive evolving, AI-generated air-sea fluxes
Ocean SST feeds back into CAMulator every 6 hours
No coupler modifications — piggybacked on existing GIAF compset

First results:

✅ 10-day test: stable, no blow-up
✅ 25 year run completed
✅ 45 SYPD on 4 Derecho CPU nodes + 1 Casper A100

How It Works: The DATM Mode

Core idea: write a new DATM datamode

Instead of reading CORE2/JRA55 forcing files, DATM_MODE = CAMULATOR calls our Python inference server via the filesystem:

Every 6 hours:
  DATM writes  → sst_in.nc  (SST from POP2)
  DATM writes  → go.flag
  [polls every 1s, timeout 1h]
  DATM reads   ← cam_out.nc (10 flux fields)
  DATM reads   ← done.flag

Two files changed in CESM source:

New: datm_datamode_camulator.F90
Wired into: datm_comp_mod.F90 (case('CAMULATOR'))

Activated by DATM_MODE = CAMULATOR in user_nl_datm

Why file-based interface?

No Fortran↔︎Python interop (no CFFI, no sockets)
6-hr coupling interval → ~10 MB I/O overhead is negligible
Fully debuggable: inspect every sst_in.nc and cam_out.nc independently
Model-agnostic: swap camulator_server.py for any CREDIT model — Fortran side never changes

Key physics handled server-side:

SST injected into forcing tensor (not prognostic state)
FSNS → FSDS: albedo inversion so CPL7 doesn’t double-count surface reflection
Over-land SST fill for continental grid points
Dynamic boundary layer height (z_bot ∝ T_bot)

Flux Conversions: Net → Downwelling

CAMulator outputs net fluxes at the surface. CPL7 expects downwelling fluxes and applies surface albedo internally — so we have to invert before handing off.

SW — albedo inversion

\[\text{FSDS} = \frac{\text{FSNS}}{1 - \alpha_\text{sfc}}\]

\[\alpha_\text{sfc} = (1-f_\text{ice})\times 0.06 + f_\text{ice}\times 0.60\]

Passing FSNS directly would double-count the reflection — CPL7 applies albedo again internally.

\(f_\text{ice}\) is taken from CICE at the same timestep so the correction always tracks actual ice cover.

LW — Stefan-Boltzmann reconstruction

\[\text{FLNSD} = 0.99\,\sigma T_S^4 + \frac{\text{FLNS}}{\Delta t}\]

CAMulator gives net upward LW (FLNS). Adding back the upward surface emission (\(0.99\,\sigma T_S^4\)) isolates the downwelling component that CPL7 needs to force the ocean.

Dynamic \(z_\text{bot}\) and Monin-Obukhov Theory

CPL7’s COARE-style bulk scheme needs the reference height \(z_\text{bot}\) — the physical height of the lowest model level — to compute the M-O stability correction and from that the wind stress \(\tau\), sensible heat \(H\), and latent heat \(LE\).

In CAM6 it’s not a fixed number; it varies ~50–67 m with surface temperature. We derive it consistently from the hybrid coordinate:

\[z_\text{bot} \approx \frac{R_d}{g}\left(-\ln\frac{h_{b,N-1}+1}{2}\right) T_\text{bot} \approx 0.219\, T_\text{bot}\]

Z_BOT_SCALE = (287.058/9.80616) \
    * (-np.log(0.5*(hybi[-2]+1)))  # 0.2187 m/K
z_bot = Z_BOT_SCALE * T_bot        # 50–67 m

Getting \(z_\text{bot}\) wrong changes which stability regime CPL7 thinks it’s in.

In stable conditions (cold SST under warm air), a wrong \(z_\text{bot}\) artificially suppresses wind stress
In unstable conditions (warm tropical SST), it sets the strength of the convective enhancement of \(H\) and \(LE\)

These aren’t small corrections and this is the thing I’m most neverous about

For example: the Southern Ocean, Arctic margins, the MJO warm pool, and eastern boundary upwelling zones all spend significant time in strongly stratified regimes.

Speed: What Does 45 SYPD Mean?

[PLACEHOLDER — fill in with measured numbers]

Configuration	Speed (SYPD)	Hardware
Full CESM2 (CAM6 + POP2)	~?	~?? Derecho nodes
GIAF (DATM + POP2, no AI)	~?	4 Derecho nodes
CAMulator + POP2 (this work)	45	4 CPU + 1 A100

Note

Key claim: we get a prognostic, interactive atmosphere at roughly the same wall-clock cost as prescribed forcing — and orders of magnitude cheaper than running full CAM6 + POP2.

Per-step breakdown on A100 (ms):

Stage	Time
Build input tensor	~5
CAMulator inference (JIT)	~190
Inverse transform	~8
Grid remap (T62 ↔︎ 1°)	~3
Write cam_out.nc	~15
Total server time	~220

CAM6 reference: one 6-hr step ~? ms on ?? CPUs

CESM’s ocean/ice compute is the wall-clock bottleneck — not inference.

Results: Global Mean State

[PLACEHOLDER — insert figure here]

Suggested: global maps of annual-mean SST bias (CAMulator-coupled vs OMIP reference), wind stress magnitude, precipitation. Side-by-side or difference maps.

Tip

Things to show:

Is the global mean SST reasonable after 1 year?
Are the major current systems (Gulf Stream, ACC) present?
Is precipitation distribution physically plausible?

Results: Variability & Air-Sea Feedback

[PLACEHOLDER — insert figure here]

Suggested: ENSO index time series from the coupled run vs OMIP baseline. Or: Niño3.4 SST variance. Or: lagged correlation between SST and precipitation anomalies in the tropical Pacific showing the feedback is active.

Tip

This is the money slide for this audience. Even one time series showing that tropical SST variability is elevated vs OMIP-forced would be compelling.

Results: Atmospheric State

[PLACEHOLDER — insert figure here]

Suggested: zonal-mean temperature or zonal wind cross-section from the coupled run. Or: a few snapshots of 500 hPa geopotential height showing realistic synoptic variability. Or: time series of global-mean 2m temperature showing no drift.

What 45 SYPD Enables

With a 350× speedup over CAM6, coupled experiments become feasible that weren’t before:

Large ensembles with interactive ocean — 50–100 member ENSO forecasts at seasonal range
Centennial runs to study climate drift, AMOC stability, heat uptake
Parameter sensitivity sweeps — vary ocean mixing, ice albedo, test parameterization choices at low cost
Rapid prototyping of new coupling strategies before investing in full CESM runs

The flag-file interface is intentionally model-agnostic:

# Swap in any CREDIT model:
camulator_server.py        # CAMulator
→ fuxi_server.py           # FuXi
→ pangu_server.py          # Pangu-Weather
→ aurora_server.py         # Aurora

The Fortran side (datm_datamode_camulator.F90) never changes. Any model that can read sst_in.nc and write cam_out.nc can be plugged in.

Note

Planned open release: CESM patch + server script → community tool for coupling any AI atmosphere to POP2/CICE.

Next Steps

Science questions now open:

ENSO response — does CAMulator + POP2 produce realistic ENSO amplitude and period? Does it improve over OMIP?
MJO air-sea feedback — does the interactive atmosphere change MJO propagation speed or coherence?
AMOC stability — multi-decadal runs with changing wind stress
SST bias evolution — does CAMulator’s known high-latitude cold bias change ocean heat uptake?
Ensemble spread — do coupled members diverge faster than AMIP members?

Technical next steps:

Phase 2: bypass CPL7 bulk formula — pass Faxa_taux/tauy directly instead of Sa_u/Sa_v
MOM6 coupling (CESM3 ocean)
Multi-GPU simultaneous ensemble members

Community:

CESM patch + camulator_server.py → open release
Generalise server interface for any CREDIT model
Share restart-safe launch scripts with the community

Summary

We coupled an AI atmosphere to a dynamic ocean by:

Writing a new DATM datamode (CAMULATOR) — zero coupler modifications, file-based flag I/O, fully debuggable
Getting the physics right: FSNS→FSDS albedo inversion so CPL7 doesn’t double-count surface reflection; dynamic \(z_\text{bot}\) derived from hybrid coordinates for physically consistent M-O stability corrections
Running it: 45 SYPD, 25-year run stable, POP2 + CICE receiving evolving AI-generated fluxes and feeding SST back at every step

(Also had to fix a deep CPL7 pointer bug where xao_ax was never assigned for DATA ATM — see Appendix)

45 SYPD + prognostic ocean = multi-century coupled experiments at a fraction of the cost of full CESM

Code / contact:

# Implementation
/glade/work/wchapman/Roman_Coupling/
  credit_feb182026/climate/
    camulator_server.py
    datm_datamode_camulator.F90
    launch_coupled_run.sh

# Active case
/glade/work/wchapman/cesm/CREDIT/
  g.e21.CAMULATOR_GIAF_v02/

Tip

Full implementation notes in README_Coupling.md

Appendix

Slides below contain implementation detail for follow-up discussion

A: Component Glossary

Component	Role in this project
CESM2/CPL7	Framework + MCT-based Fortran coupler; remaps fields between grids, applies bulk formulae
DATM (Data ATM)	Normally reads CORE2/JRA55 prescribed forcing. Here: runs CAMULATOR mode — a new datamode we wrote
CAMulator	Autoregressive 6-hr AI atmosphere trained on CAM6 output. 192×288 (1°), 130 prognostic channels
POP2	3D ocean on gx1v7 displaced-pole grid (~1°, 320×384). Evolves SST, salinity, currents
CICE	Sea ice on same gx1v7 grid. Exchanges heat, momentum, freshwater with POP + atmosphere
T62 grid	DATM’s native Gaussian grid: 94 lat × 192 lon. All DATM→CPL fields live here
GIAF compset	DATM%IAF + POP2 + CICE — the baseline we piggyback on

B: Full Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│   CESM2 CPL7 coupled run   (Fortran/MPI, CPU ranks on Derecho)          │
│                                                                          │
│   POP2 ──SST──► CPL7 ──x2a_So_t──► DATM (CAMULATOR mode)               │
│                                          │                               │
│                              write camulator_sst_in.nc                   │
│                              write camulator_go.flag   ───► [GPU server] │
│                                          ↑                               │
│                              read  camulator_done.flag ◄─── [GPU server] │
│                              read  camulator_cam_out.nc                  │
│                                          │                               │
│   POP2 ◄──a2x fluxes──── CPL7 ◄──────────┘                              │
└─────────────────────────────────────────────────────────────────────────┘
                   ▲ shared GLADE filesystem
┌──────────────────┴──────────────────────────────────────────────────────┐
│   camulator_server.py  (Python, A100 GPU, pre-launched on Casper)        │
│                                                                          │
│   loop:  watch go.flag → read sst_in.nc → inject SST → run CAMulator    │
│          → write cam_out.nc → write done.flag                            │
└─────────────────────────────────────────────────────────────────────────┘

C: Grid Remapping

Grid	Dims	Used by
T62 Gaussian	94 × 192	DATM, CPL7
CAMulator 1°	192 × 288	AI model

Precomputed BilinearRemap weights at startup → remap ~2.5 ms (vs ~500 ms with RegularGridInterpolator every step)

T62 Gaussian latitudes computed exactly:

from scipy.special import roots_legendre
mu, _ = roots_legendre(94)
t62_lat = np.degrees(np.arcsin(mu[::-1]))

class BilinearRemap:
    """Precomputed bilinear weights for T62 ↔ CAMulator."""
    def __init__(self, src_lat, src_lon,
                       dst_lat, dst_lon):
        # compute weight matrices once at startup
        ...

    def batch(self, fields_dict: dict) -> dict:
        """Remap all fields in one vectorised call."""
        out = {}
        for name, arr in fields_dict.items():
            out[name] = (self.weights @ arr.ravel()
                         ).reshape(self.dst_shape)
        return out

D: SST Injection Detail

# 1. Read SST from sst_in.nc [K]
sst_t62 = ds['sst'].values           # 94×192

# 2. Remap T62 → CAMulator 1° grid
sst_cam = remap_t62_to_cam(sst_t62)  # 192×288

# 3. Over-land fill (POP only provides ocean points)
mask = sst_cam > 270.0               # True = ocean
sst_blended = np.where(mask, sst_from_pop, 283.0)

# 4. Normalize and inject into forcing tensor
sst_norm = (sst_blended - sst_mean) / sst_std
model_input = stepper.state_manager \
    .build_input_with_forcing(state, forcing_t, static)
accessor_input.set_state_var(model_input, 'SST', sst_norm_tensor)

# 5. Run inference
prediction = stepper.model(model_input.float())

SST is a dynamic_forcing_variable (not prognostic state) — it lives in the forcing tensor, not in the 130-channel prognostic state. This is why injection works without retraining the model.

E: Flux Conventions

Variable	CAMulator units	→ cam_out.nc
`FSNS`	J m⁻² (6h total)	÷ 21600 → W m⁻²
`FLNS`	J m⁻² upward	÷ (−21600) → W m⁻² ↓
`PRECT`	m s⁻¹ liq-eq	direct
`TAUX/TAUY`	Pa	direct

SW albedo inversion: \[\text{FSDS} = \frac{\text{FSNS}}{1 - \alpha_\text{sfc}}\] \[\alpha_\text{sfc} = (1-f_\text{ice}) \times 0.06 + f_\text{ice} \times 0.60\]

CPL7 treats Faxa_sw* as downwelling and applies albedo internally — passing FSNS directly double-counts it.

LW reconstruction: \[\text{FLNSD} = 0.99\,\sigma\,T_S^4 + \frac{\text{FLNS}}{\Delta t}\]

Dynamic boundary layer height:

Z_BOT_SCALE = (287.058/9.80616) \
    * (-np.log(0.5*(hybi[-2]+1)))  # = 0.2187 m/K

z_bot = Z_BOT_SCALE * T_bot  # 50–67 m range

F: Restart Handling

ATM restart file (camulator_atm_restart.pth):

{
    'state':     state_tensor,   # 130-ch CPU tensor
    'timestep':  int,
    'last_ymd':  int,
    'last_tod':  int,
    'cam_out':   cam_out_dict,   # re-send on CONTINUE_RUN
}

On CONTINUE_RUN=TRUE, CESM re-sends the last go.flag. Server detects current_ymd == last_ymd, re-sends saved cam_out without re-inference.

Year-boundary repeated dates:

CESM sends the year-boundary date three times during restart init. Fix: date-based forcing index (same date → same forcing slice, always).

Annual archive:

rundir/atm_archive/
  YYYY/
    camulator_atm_YYYY-001.nc
    camulator_atm_YYYY-002.nc  (daily)
    camulator_atm_restart.pth

G: Launch Workflow

sequenceDiagram
    participant U as User (login node)
    participant S as camulator_server.py (Casper A100)
    participant C as CESM (Derecho CPU nodes)
    participant G as GLADE filesystem

    U->>S: qsub server job (Casper)
    S->>S: load model, JIT trace (~60s)
    S->>G: write camulator_server_ready.flag

    U->>U: launch_coupled_run.sh polling...
    U->>C: case.submit (triggered by ready flag)

    loop Every 6 hours
        C->>G: write sst_in.nc + go.flag
        G->>S: server detects go.flag
        S->>S: inject SST, run inference (~200ms)
        S->>G: write cam_out.nc + done.flag
        G->>C: DATM reads cam_out.nc
        C->>C: CPL7 remaps, applies bulk formula
        C->>C: POP2 + CICE advance one step
    end

    C->>G: write restart files (annual)
    S->>G: write camulator_atm_restart.pth

H: The CPL7 Bug

Symptom: Ocean sees SST = 0 K on every coupling step despite POP sending valid temperatures.

Root cause — cime_comp_mod.F90:

The xao_ax pointer (ATM↔︎OCN flux object) is initialised only inside:

if (atm_prognostic) then
   ! ... init phase ...
   xao_ax => prep_aoflux_get_xao_ax()   ! line ~1974
end if

For DATA ATM (atm_prognostic = .false.), this block is never entered → xao_ax stays null().

Later at every coupling step:

if (associated(xao_ax)) then   ! .false. !
   call prep_atm_mrg(...)      ! NEVER RUNS
end if
! → x2c_cx stays zero → SST = 0

No one had ever run a DATA ATM that needed to read back ocean state — this code path had never been exercised in 20 years of CESM.

Three-part fix (all in ATM SETUP-SEND block):

Fix 1 & 2 — widen two guards from atm_prognostic to (atm_prognostic .or. atm_present):

if (atm_prognostic .or. atm_present) then
   ! prep-merge block
end if

if (atm_prognostic .or. atm_present) then
   call component_exch(atm, flow='x2c')
end if

Fix 3 — assign pointer before the associated() check:

! ROOT CAUSE FIX
xao_ax => prep_aoflux_get_xao_ax()
if (associated(xao_ax)) then
   call prep_atm_mrg(...)  ! now runs!
end if

Confirmed: 10-day coupled run at 45 SYPD ✅