Skip to content

2018 06 05

Andre Merzky edited this page Jun 5, 2018 · 2 revisions

Agenda / Notes

- use cases -> requirements -> architecture -> implementation -> testing -> release
- Milestones:
  - Use cases, Requirements (early April)
  - Feasibility, Prototype  (early June )
  - Implementation          (end   June )

  • updates:
    • Team 1: Ioannis, Will, Jumana
      • ticket
      • use case document
      • TODO Will: tests for slurm LRMS (other LRMS missing)
      • JD: data from BW, Titan, Stampede-2
        • TODO: use an RP script (BoT df /tmp/) to get LSF size out of band
      • TODO: look into documented expected node failure rate (focus on BW)
        • comet: failure rate 0.5%, node-MTF: 2.8 days
        • comet: less than 200GB of storage should not happen, application must clean up
        • TODO: look into other machines
        • TODO: suggest to have this documented
    • Team 2: Vivek, Srinivas, George
      • literature study
      • non-mpi tests pass
      • mpi-tests now complete.
      • WIP: integration test on RP layer, targeting BW
      • TODO begin to look into tests on remote resources (focus on BW)
    • toward integration
      • TODO IP: sync with devel
      • TODO: check tagging reqs on scheduler
      • Use case is proof, not tests and prototype
        • don't delay EnTK integration
        • how does that relate to tags?
      • TODO: check LSF update approach with use cases
      • TODO: press forward with integration


Clone this wiki locally