Merged
Conversation
EAMProject was re-projecting all input points through pyproj on every pipeline update, adding ~14 seconds per timestep on the ne1024 grid (25M points). The input geometry doesn't actually change when only the scalar variable or time index changes, so the reprojection was wasted work. Two related fixes: - EAMExtract no longer bumps its shared points' MTime on every no-crop pass-through; it only invalidates when transitioning out of a trimmed state. The unconditional Modified() was defeating EAMProject's cache on every pipeline update. - EAMProject now keys its cache on the identity of the input vtkPoints object (plus the projection/translate parameters) rather than MTime. This makes the cache immune to spurious upstream Modified() calls on the shared points — a cleaner guarantee than chasing every filter that might bump MTime. End-to-end pipeline cost on a time-slider tick (ne1024pg2, 25M cells, 2 variables enabled) drops from ~14,800 ms to ~200 ms.
EAMCenterMeridian's cached output path rebuilds each timestep's cell data by fancy-indexing the input scalars through a PedigreeIds permutation produced by vtkTableBasedClipDataSet. The permutation is almost entirely long runs of +1-stepped indices (~99.95% in our case, a handful of breaks at the seam every ~2048 cells), so the fancy index was doing far more work than necessary. Replace with a slice plan: one pass over PedigreeIds to identify the monotonic runs, then on each tick execute `len(runs)` slice copies (`out[s:e] = in[pid[s]:pid[s]+e-s]`). The plan is cached on the pedigree array's (id, MTime) so it only runs once per meridian change. Also write directly into the output vtkDataArray's buffer instead of going through dsa's __setitem__, which was doing a full numpy → VTK wrap round-trip. End-to-end: EAMCenterMeridian's cached-path cost drops from ~40 ms to ~22 ms per tick with 2 variables enabled on the ne1024pg2 grid.
The Mollweide/Robinson projection ran pyproj.Transformer.transform as a single-threaded call on ~100M points, taking ~14 seconds on the ne1024pg2 grid. That cost dominated the initial pipeline build and, worse, the per-drag cost when the user changed the crop region (every crop change invalidates EAMProject's cache because EAMExtract compacts points via RemoveGhostCells). pyproj releases the GIL inside transform(), so chunking the input and farming the chunks out to a ThreadPoolExecutor scales nearly linearly: in a standalone 100M-point bench we saw 1x / 2.0x / 3.9x / 7.4x for 1, 2, 4, 8 threads. Max_err between any pair of thread counts is 0 — the chunking produces bit-identical output. Capped at 8 threads empirically (speedup flattens there on a 10-core machine, and leaving a couple cores free keeps the UI responsive). Falls back to single-threaded for small inputs below 1M points where the pool setup overhead would dominate. Measured on ne1024pg2 (~100M points): initial full-pipeline update 17.5 s -> 5.7 s crop-drag total cost 13 s -> 3 s
The previous commit hard-capped EAMProject's ThreadPoolExecutor at 8 threads with a comment arguing that's where the speedup flattens on a 10-core laptop. On a bigger machine that leaves real perf on the table, so take two steps: - Default: max(1, cpu_count - 1), leaving one core for the UI/IO thread. - Override: QV_PROJECTION_THREADS env var for HPC nodes or to pin a specific thread count (e.g. for testing).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Four commits, all in
src/e3sm_quickview/plugins/eam_projection.py. Fixes a longstanding issue where every time-slider tick re-projected the entire ~100M-point mesh (~14 s) becauseEAMExtract's unconditionalModified()on its no-crop pass-through was invalidatingEAMProject's points cache. The fix is three-part: (1)EAMExtractnow only bumps the points MTime on the transition out of a trimmed state; (2)EAMProjectre-keys its cache from MTime to(id(input_points), project, translate), making it robust to spurious upstreamModified(); (3) the pedigree-indexed copy inadd_cell_arraysreplaces the 25M-element fancy numpy index with a cached slice plan that exploits the fact that the clip-induced pedigree permutation is 99.95 % monotonic-by-1 in long runs. Finally, the one-time projection itself (pyproj.Transformer.transform) is now fanned out across aThreadPoolExecutor— pyproj releases the GIL and scales nearly linearly; thread count defaults tocpu_count - 1and can be overridden viaQV_PROJECTION_THREADS. Net: steady-state slider tick cost on the ne1024pg2 grid drops from ~14 800 ms to ~283 ms (~52×); the initial pipeline build drops from ~17.5 s to ~5.7 s and the per-drag crop-change cost from ~14 s to ~3 s.