Thursday, July 21, 2016

"old school" hydrological models

During my Masters internship I'm looking at a lot of surface water models, setting up a "scoreboard" to assess what types "perform" best under different conditions (peak flood events, long droughts), for different latitudes and regions ... and this taking place at IRSTEA in Antony (near Paris), a research organization with an extremely long history of excellent numerical models, including these:

This morning a friend sent me a podcast (99percentinvisible clearly has roots in public radio) called America's Last Top Model which I listened to on the way to work, and which is perfectly awesome -- it introduces a scale physical model of the Mississippi River basin, an amazing effort for anyone into train models, Legos, or generally playing with water in dirt...

Note - the Mississippi Basin (besides being a fun spelling test for 3rd graders) is 1 250 000 miles² (yes, 1.25 million square miles) in area -- about 3 240 000 km². Even at a 1:2000 scale, this is a BIG undertaking, which is why I imagine photos don't really do it justice... this one's not bad:

Original caption says it all! (vert scale: 1:100, horiz: 1:2000)

The Army Corps of Engineers (CoE) project was motivated by the devastating floods of 1927, and construction finally started in the early 40s -- since the US ACoE was a bit tied up at that time, they used German and Italian POW labor from Rommel's N African campaign...

Some more links:
And this student project to open up access to the public:
Now I gotta get back to work.

Thursday, June 30, 2016

Weekly report, end of June (week 17!) (of 25!!)

Rapport d’étape - Juin

Week 26, 2016

Welcome to week 17 of 25. This document updated 30 juin 2016.

thesis.duration <- 25
this.week <- 17
double.yikes <- c(this.week, thesis.duration - this.week) / thesis.duration
barplot(as.matrix(double.yikes), horiz=TRUE, beside=FALSE)

# library(ggplot2)
# ggplot(double.yikes, aes(x = "time", y = thesis.duration))

Note that I have been making a gross over-assumtion of my remaining time – this isn’t a 34-week internship, but about 26 (or 25 with vacation). The 34th week of the year (beginning August 22) will be my last week, but since my 2nd week was week 10, my plot has been quite wrong.

What’s new:

  • Discussed conditional criteria for Scoreboard
  • Designated three documents for delivery at end of stage:
  1. short userguide (part of shiny app?) - English
  2. ultra-clear Database HOWTO guide (howto add data, DDL, etc) (so MHR can direct others setting up new DBs)
  3. longer write-up for data submitters
  • Got rid of summary() field
  • Added 2nd plot window
  • using DT better
  • not complete:
  • renderUI()
  • two plot regimes:
    • stats on daily values (se, ci)
    • direct score plots
  • goal is to view skill scores (comparable) in addition to raw scores (not so comparable)
  • “All Skill Score” plot which should show red, grey, green “improving or not” scores
  • Review EVS Documentation and datafile in / out

Modified database schema:

  • Renaming?
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##     filter, lag
## The following objects are masked from 'package:base':
##     intersect, setdiff, setequal, union
library(ggplot2) <- read.delim("~/R/shinysb1/")$locationID <- as.factor($locationID)
     xlab = "Lead Times", ylab = "Score")

# get fancier
pd <- position_dodge(0.2)
min.LT <- min($leadtimeValue)
max.LT <- max($leadtimeValue)

ggplot(, aes(color = locationID, x = leadtimeValue, y = scoreValue )) +
  geom_errorbar(aes(ymin=scoreValue-ci, ymax=scoreValue+ci), position = pd) + # , color="grey"
  geom_line() +
  geom_point(aes(color = locationID), position = pd) +
  geom_hline(aes(yintercept=0), color="blue", linetype="dashed") + 
  scale_y_continuous(breaks=c(min.LT:max.LT)) +
  xlab("Lead Times") + ylab("Score")

We also simplified the wording on the display again; I need to follow this with database logic to be sure names are clear and sensible (self-documenting).

I’m also adding a layer of reactivity BEFORE the filter function; now we have a box filled by database which lists which packages are available:

Images of current interface

CRPS for 2 locs, 10 LTs

CRPS for 2 locs, 10 LTs

This week I need to:

  • prep slides for 10 minute “wave peaks” presentation
    • less technic, more “qualitative” study
    • underscore the utility in comparing score types
    • examples of other scoreboards…?
  • Enhance data import definition
    • example file
    • user preview (?) before database import run
  • Change interface:
    • reduce “summary”;
    • reduce emphasis on conf interval plot (ci and se agglom function, summarySE);
    • add ability to post 2eme Score Type to same page (or more?)

Planning: * ggplot libraries + facet; + map (GDAL);

Old Notes (for my reference):

Got access to a 2013 netCDF [development branch for SOS DB] (, so I’ll look into that this week to understand better our options for accomodating netCDF files (a soft requirement which we won’t implement without motivation!).

Keeps coming back up (particularly for multiple users in web app!) but not dealt with: +

NetCDF links: Discussion(s) of handling time using netCDF:

Some NetCDF files from our friends at ECMWF:

O’Reilly always publishes goodness, re-reminging myself to remember this later:

Something to look into on my time – confidence Intervals discussed in different context: …with nifty Shiny app to illustrate Figs 1 - 5 from article:

Tools for LaTeX: