Pedagogical Design: xarray Week 5
Pedagogical Design: xarray Week 5
ATOC 4815 - Built from Scratch with Proven Framework
đ Design Statistics
Created: 67 slides, ~1,900 lines Active Learning Exercises: 4 hands-on âTry It Yourselfâ Error Examples: 6 explicit error scenarios with fixes âCheck Your Understandingâ moments: 3 formative assessments Decision Guides: 3 metacognitive âwhen to useâ sections
Built from scratch using the proven pedagogical framework from Weeks 1-4
â Pedagogical Framework Applied
1. Real-World Scenario Motivation đ
Design Decision: Donât start with âhereâs xarray syntaxââstart with âhereâs a problem pandas and NumPy canât solveâ
Created Research Scenario:
Imagine: You're analyzing the 2023 North American heat wave using ERA5 reanalysis
Your data:
- 4D gridded dataset: temperature(time, level, lat, lon)
- Spatial: North America (15°N-70°N, 130°W-60°W)
- Temporal: 45 years daily (16,425 days)
- Vertical: 37 pressure levels
- Grid: 0.25° à 0.25° (~25 km)
- File size: ~50 GB
Questions:
1. Max temperature in Boulder summer 2023?
2. How does 2023 compare to 1991-2020 climatology?
3. Vertical temperature profile during heat wave?
4. Where was heat wave most intense?
5. Has frequency increased over 45 years?
Impact: Immediately establishes that they need a new toolâthis data canât be handled with pandas or numpy alone.
2. Show Why Previous Tools Fail â ď¸
Problem: Students might think âWhy not just use pandas?â
Solution: Explicit âWhy Pandas Falls Shortâ and âWhy NumPy Falls Shortâ slides
Pandas Limitations (Slide 6):
- 2D only - Canât represent 4D (time Ă level Ă lat Ă lon)
- No dimension concept - Which axis is which?
- No coordinate-based selection - Need index math for â500 hPa at Boulderâ
NumPy Limitations (Slide 7):
- No dimension labels - Is axis=1 pressure or latitude?
- No coordinate values - Manual
argminto find nearest point - Metadata loss - After slicing, what does the data represent?
Code Examples:
# Pandas fail
df = pd.DataFrame(temperature_data) # â How to structure 4D?
mean_temp = df.mean(axis=2) # â Was that lat or lon?
# NumPy fail
temp.mean(axis=1) # â Is axis=1 pressure levels or latitude?
lat_idx = np.argmin(np.abs(lat_array - 40.0)) # â Manual index math!
Impact: Students understand the problem before seeing the solution
3. Error-Driven Learning đ
Created 6 explicit error scenarios:
Error 1: Dimension Name Mismatch (Slide 23)
temp.sel(latitude=40) # â Dimension is 'lat', not 'latitude'
Fix: Always check print(data.dims) first
Error 2: isel vs sel Confusion (Slide 36)
ds.sel(time=0) # â 0 is an index, not a date!
Fix: Index number â .isel(), Coordinate value â .sel()
Error 3: Wrong Dimension Name in Reduction (Slide 41)
temp.mean(dim='times') # â Typo: should be 'time'
Fix: Use tab completion for dimension names
Error 4: Forgetting to Close Files (Slide 32)
ds = xr.open_dataset('ERA5_50GB.nc')
os.remove('ERA5_50GB.nc') # â File still open!
Fix: Use context manager with xr.open_dataset(...) as ds:
Error 5: Using .values Too Early (Slide 52)
temp_array = temp.values # â Loses all metadata!
subset = temp_array.sel(lat=40) # â Won't workâjust NumPy array!
Fix: Keep as xarray until the last possible moment
Error 6: Forgetting method=ânearestâ (Slide 34)
temp.sel(lat=40.015, lon=-105.2705) # â Exact match may not exist
Fix: Always use method='nearest' for spatial selection
Format: âPredict the outputâ â Error message â Explanation â The Fix
Impact: Students see and learn from errors before encountering them in homework
4. Active Learning Exercises đť
Exercise 1: Tool Selection Quiz (Slide 13)
Which tool for each task?
1. ERA5 temperature(time, level, lat, lon) â xarray
2. Single station hourly time series â Pandas
3. Matrix multiplication â NumPy
4. Climate model temp(time, ensemble, lat, lon) â xarray
5. CSV with station metadata â Pandas
Exercise 2: Create First DataArray (Slide 26)
# With your neighbor (5 min):
# Create DataArray for wind speed
# Dimensions: time (3 days), lat (2), lon (2)
# Boulder coordinates: lat=[40, 40.5], lon=[-105, -104.5]
Exercise 3: Practice Selection Methods (Slide 38)
# Tasks:
# 1. Select last time step using .isel()
# 2. Select all data from January 3rd using .sel()
# 3. Select temp at lat=40°N, lon=-105°W (nearest)
# 4. Select time slice: January 2-5
# 5. What happens if you try ds.sel(time=5)? Why?
Exercise 4: Heat Wave Detector (Slide 57)
# Final Challenge: Detect and visualize a heat wave
# 1. Compute time mean for each grid point
# 2. Find location with highest mean temp
# 3. Extract time series at that location
# 4. Compute anomaly from overall mean
# 5. Plot time series with anomaly highlighted
# 6. Save processed data to NetCDF
Impact: Retrieval practice every 10-15 slides; forces active engagement
5. Metacognitive âWhen to Use Each Toolâ Guidance đ§
Decision Guide 1: Mental Model Progression (Slide 9)
NumPy: "Calculator for N-D arrays"
â
Fast math, any dimensions
â No dimension names, no coordinates
Pandas: "Spreadsheet with labels"
â
Named columns, time indexing
â Only 2D
xarray: "Pandas for N-D grids"
â
Named dimensions
â
Coordinate-based selection
â
Metadata preservation
Decision Guide 2: Selection Methods (Slide 29)
| Method | Selection By | Example |
|âââ|âââââ|âââ|
| .isel() | Integer position | ds.isel(time=0) â first time |
| .sel() | Coordinate value | ds.sel(lat=40) â data at 40°N |
When to use:
- isel: âFirst 10 time stepsâ, âevery 3rd latitudeâ
- sel: âData at 500 hPaâ, âJanuary 2024â
Decision Guide 3: Tool Selection Table (Slide 62) | Data Type | Tool | Why | |ââââ|ââ|ââ| | Single station time series | Pandas | 1D, time indexing | | CSV with multiple stations | Pandas | Tabular, mixed types | | Gridded 3D+ NetCDF | xarray | Multi-dimensional | | Climate model output | xarray | 4D (time, lat, lon, level) | | Pure numerical computation | NumPy | Matrix ops, FFT |
Impact: Students know WHEN to use each tool, not just HOW
6. Progressive Scaffolding đ
Carefully designed progression:
| Stage | Slides | Content |
|---|---|---|
| 1. Motivation | 4-13 | Real scenario, tool limitations, mental model |
| 2. Fundamentals | 14-28 | DataArray, Dataset, anatomy, errors |
| 3. I/O | 29-33 | Opening NetCDF, exploring structure, file management |
| 4. Selection | 34-39 | isel vs sel, nearest neighbor, practice |
| 5. Operations | 40-48 | Reductions, groupby, climatologies, anomalies |
| 6. Plotting | 49-52 | Built-in plotting, customization, multi-panel |
| 7. Advanced | 53-57 | Dask/lazy loading, multi-file, workflows |
| 8. Synthesis | 58-67 | Best practices, error checklist, resources |
Impact: Each section builds on previous; no conceptual leaps
7. Visual & Conceptual Scaffolding đ
Design principle: Multi-dimensional data is abstractâmake it concrete
Slide 15: DataArray Anatomy
- Shows actual printed output with arrows pointing to components
- Dimensions, coordinates, attributes labeled
- âSelf-describing dataâ concept
Slide 16: Dataset Structure
- Visual comparison to dictionary of DataArrays
- Shows how multiple variables share dimensions
Slide 42-44: GroupBy Operations
- Monthly climatology example with actual code output
- Shows 1461 daily values â 12 monthly means
- Automatic alignment visualization
Slide 46: Automatic Alignment Magic
- Side-by-side arrays with different coordinates
- Shows how xarray aligns by labels, not position
- NaN where no overlapâprevents silent errors
8. Realistic Research Workflows đŹ
Slide 55: Complete Research Example
Shows full analysis pipeline:
# 1. Open multi-year dataset
ds = xr.open_mfdataset('ERA5_*.nc')
# 2. Subset to region
ds_west = ds.sel(lat=slice(32, 49), lon=slice(-125, -100))
# 3. Compute climatology (1991-2020)
ds_clim = ds_west.sel(time=slice('1991', '2020'))
climatology = ds_clim['t2m'].groupby('time.dayofyear').mean()
# 4. Select 2023 summer
summer_2023 = ds_west['t2m'].sel(time=slice('2023-06-01', '2023-08-31'))
# 5. Compute anomalies
anomaly = summer_2023.groupby('time.dayofyear') - climatology
# 6. Find peak heat wave
max_anomaly = anomaly.max(dim='time')
# 7. Plot
[creates 2-panel comparison figure]
# 8. Save results
max_anomaly.to_netcdf('heatwave_2023_anomaly.nc')
Impact: Students see the path from raw data to research results
9. Plotting Integration đ
Progressive plotting examples:
Slide 49: 1D Time Series
- Automatic axis labeling
- Uses coordinate values, not indices
Slide 50: 2D Spatial Map
- Automatic colorbar
- Proper lat/lon axes
Slide 51: Customized Plot
- Control over colormap, limits, labels
- Professional figure appearance
Slide 52: Multi-Panel Figures
- 2Ă2 panels showing time evolution
- Single shared colorbar
- Date formatting in titles
Impact: Students can create publication-quality figures immediately
10. Advanced Topics with Context đ
Lazy Loading with Dask (Slide 53)
- Problem: 50 GB file wonât fit in memory
- Solution: Chunked loading with dask
- When to use: File > RAM, need subset only
- When NOT to use: Small files, adds overhead
Multi-File Operations (Slide 54)
- Problem: One file per year (common in climate data)
- Solution:
open_mfdataset()with wildcards - Benefits: Automatic combining, lazy loading, parallel
Writing NetCDF (Slide 56)
- Complete workflow: compute â add metadata â save
- Shows how to preserve provenance information
Impact: Students ready for real research data workflows
đ Learning Science Principles Applied
1. Worked Examples Effect
- Every concept has 2-3 complete examples
- Shows process AND result
- Includes common errors and fixes
2. Cognitive Load Management
- One new concept at a time
- Progressive complexity
- Visual aids for abstract concepts (dimensions, coordinates)
3. Retrieval Practice
- Regular âCheck Your Understandingâ questions
- âPredict the outputâ before error messages
- Hands-on exercises every 10-15 slides
4. Transfer of Learning
- Every example uses atmospheric science context
- Real research scenario (ERA5, heat waves)
- Complete analysis workflows
5. Error-Driven Learning (Productive Failure)
- 6 common errors shown explicitly
- Students learn debugging patterns
- Errors normalized as learning opportunity
6. Metacognition
- Explicit âwhen to useâ guidance
- Mental model comparisons (NumPy â Pandas â xarray)
- Decision tables for tool selection
đŻ Key Design Decisions
Why Start with âWhatâs Wrong with Pandas/NumPyâ?
Alternative approach: Jump straight to xarray syntax
Our approach: Show why existing tools fail first
Rationale: Students need to understand the problem before appreciating the solution. This creates cognitive dissonance â motivation to learn.
Evidence: Pedagogical research shows problem-driven learning improves retention and transfer.
Why 6 Error Examples Instead of Just Showing Correct Code?
Alternative approach: Only show working examples
Our approach: âPredict the outputâ â Error message â Explanation â Fix
Rationale:
- Students will make these errors anyway
- Seeing errors in controlled environment builds debugging skills
- âProductive failureâ research shows learning from mistakes improves understanding
Evidence: Kapur (2008) - productive failure in problem-solving
Why âTry It Yourselfâ Every 10-15 Slides?
Alternative approach: One big exercise at end
Our approach: Frequent small hands-on moments
Rationale:
- Retrieval practice must be spaced throughout
- Catches misconceptions early
- Active learning > passive watching
Evidence: Freeman et al. (2014) - active learning increases STEM performance
Why Explicit âWhen to Use Each Toolâ Tables?
Alternative approach: Students infer when to use xarray
Our approach: Explicit metacognitive guidance
Rationale:
- Novices donât develop expert heuristics automatically
- Need explicit instruction on decision-making
- Metacognitive skills are teachable
Evidence: Flavell (1979) - metacognition in learning
đ Expected Learning Outcomes
| Outcome | Design Element | Slide(s) |
|---|---|---|
| Understand xarray motivation | Real scenario showing pandas/NumPy limits | 4-8 |
| Know when to use xarray | Decision guides, tool comparison table | 9, 13, 62 |
| Avoid common errors | 6 explicit error examples with fixes | 23, 32, 36, 41, 52, 63 |
| Select data correctly | isel vs sel guidance, practice | 29-38 |
| Compute climatologies | GroupBy examples, anomaly calculations | 42-47 |
| Create quality figures | Progressive plotting examples | 49-52 |
| Handle large datasets | Dask/chunking, multi-file operations | 53-54 |
| Build research workflows | Complete heat wave analysis example | 55, 57 |
đ Recommended Classroom Use
Before Class:
- Email students: âBring laptop, ensure xarray installedâ
- Post sample NetCDF file on Canvas
- Prepare for live coding demonstrations
During Class:
- Live code the error examples - Show yourself debugging
- Pause at âTry It Yourselfâ slides - Give full 5 minutes
- Use chalkboard for dimension diagrams - Draw 3D/4D grids
- Show real ERA5 data - Not just toy examples
- Emphasize coordinates - This is the key xarray concept
- Cold call after pair work - Keep everyone engaged
After Class:
- Post slides + sample NetCDF file immediately
- Office hours: âBring your NetCDF data questionsâ
- Canvas discussion: âWhat clicked? Whatâs still confusing?â
- Prepare similar data for homework
đ Future Enhancements
For Next Iteration:
Add:
- Comparison with CDO/NCO command-line tools
- Integration with cartopy for maps
- Performance tips for large datasets
- Common xarray + matplotlib patterns
Consider:
- Video of live NetCDF exploration workflow
- Student-contributed xarray tips
- Gallery of real research figures made with xarray
- Debugging flowchart poster
Collect feedback on:
- Which error examples were most helpful
- Time needed for âTry It Yourselfâ exercises
- Which concepts need more explanation
- Whether Dask section is too advanced
đ Pedagogical References
These design decisions align with:
- Cognitive Load Theory (Sweller, 1988)
- Progressive complexity from simple to advanced
- Visual scaffolding for multi-dimensional concepts
- Worked examples reduce cognitive load
- Retrieval Practice (Roediger & Butler, 2011)
- Frequent low-stakes âCheck Your Understandingâ
- Spaced throughout lesson
- Immediate feedback
- Productive Failure (Kapur, 2008)
- Error-driven learning: show mistakes first
- Debugging as pedagogy
- Normalizes errors as learning
- Transfer of Learning (Bransford & Schwartz, 1999)
- Atmospheric science examples throughout
- Real research scenarios (ERA5, heat waves)
- Complete workflows from data â results
- Metacognition (Flavell, 1979)
- Explicit âwhen to useâ guidance
- Decision tables
- Tool selection heuristics
- Active Learning (Freeman et al., 2014)
- Frequent hands-on exercises
- Pair programming
- Predict-then-reveal format
đ Files Created
atoc4815-week05-xarray.qmd- Complete lesson (67 slides, ~1,900 lines)atoc4815-week05-xarray.html- Rendered slides with live Python codePEDAGOGICAL_IMPROVEMENTS_XARRAY.md- This design document
Note: Built from scratch using proven framework, NOT converted from PowerPoint
đĄ Comparison to Traditional xarray Tutorials
Typical xarray tutorial:
- âHereâs a DataArray. Hereâs how to select data.â
- Focus on syntax and API
- Few error examples
- Generic data examples
- Minimal context for when/why to use
Our approach:
- Starts with real research problem that requires xarray
- Shows why pandas/NumPy fail before introducing xarray
- 6 explicit error scenarios students will encounter
- Atmospheric science context throughout
- Metacognitive guidance on tool selection
Result:
Students learn not just HOW to use xarray, but WHEN and WHYâskills that transfer to their research.
đ Instructor Notes
This lesson is the gateway to real atmospheric data science.
Before this week, students worked with:
- Single station time series (pandas)
- Simple array operations (NumPy)
After this week, they can:
- Handle gridded reanalysis data (ERA5, MERRA-2)
- Compute climatologies and anomalies
- Work with climate model output
- Create publication-quality figures
- Build complete research workflows
This is where atmospheric science and programming converge.
The pedagogical investment here pays dividends throughout their research careers. Students who master xarray can:
- Analyze their thesis data independently
- Contribute to research projects immediately
- Publish figures without manual data manipulation
- Collaborate with the broader climate science community
Youâre giving them a superpower. đ