DrawingCanvas API: Replace imperative extension methods with stateful canvas-based drawing model#377
DrawingCanvas API: Replace imperative extension methods with stateful canvas-based drawing model#377JimBobSquarePants wants to merge 239 commits intomainfrom
Conversation
I agree that I did not go with I also considered your I’ve also updated the IntelliSense docs for the renamed extensions/processors and the related markdown/readme content to reflect the new API. |
Thanks @antonfirsov for adding the benchmarks, they're VERY useful! I dug into this a bit further and reran the throughput benchmark with the two concurrency dimensions separated:
That changes the picture quite a lot. When I hold
So I do not think the previous result supports the conclusion that our internal parallel execution is slower than serial. For the small-image case, I also would not expect MP/s to behave like a typical core pixel processor. Tiger rendering has a substantial amount of per-scene work that does not scale with image area in the same way:
That work still exists even when the target is small, so lower MP/s on small images is not by itself evidence of a regression. In absolute terms the small images are still much faster; they just amortize the fixed scene cost less effectively. The earlier throughput result appears to have been confounded by mixing inner parallelism with outer request concurrency. If I also tested the service-throughput shape explicitly on my machine:
That suggests the best throughput here comes from a balanced split between outer concurrency and inner parallelism, not from maximizing either one in isolation. On the Taken together, I think these results support keeping the current defaults for the ordinary non-concurrent case. For concurrent hosts, the optimal split between per-request parallelism and request-level concurrency is workload-dependent, so I think the right answer is exactly what we have now: sensible defaults, with |
Paint sounds awesome!
Single-user "slowness" wasn't the conclusion I wanted to imply. The data proves that given high concurrency, parallelism hurts throughput, which is something I've seen to happen only with I will try helping by running some profiling. |
Is there any more information that could be cached when the same path/stroke is being used?
In the core library, threading overhead is being reduced or sometimes eliminated for small images via. |
We memoise the result of I just ran some experiments using Using
Not using
|
I agree that high-concurrency service throughput is worth understanding, but I don't think it should automatically define the default optimization target for ImageSharp.Drawing. I don't see this library primarily as a web-server drawing component in the same way some ImageSharp image-processing workloads are. The scenarios I have in mind are things like CAD-style rendering, charts and graphs, UI generation, tooling, and other programmatic rendering workloads where single-render performance and overall rendering capability matter more than maximizing throughput under heavily concurrent request load. That makes the current single-request results directly relevant, and it also means I'm comfortable with I did explore whether the library could self-throttle under concurrent load without user tuning. The honest answer is that any in-library mechanism needs either a shared scheduler across requests or a runtime pool-pressure signal, neither of which exists in a form we can consume without significant upstream changes. The |
CPU rasterizer parallel performance
Scenarios that involve immediate, on-screen rendering are better addressed by the WebGPU renderer. Whatever is the current primary use-case of That said, I just ran the same benchmarks with V2 rasterizer, and this PR is significantly improving every aspect perf-wise (🚀), regardless if parallelism is used or not, so I'm no longer pursuing to figure this out in the PR.
I don't know if self-throttling is the only possible answer here, I would recommend to open a tracking issue once this is merged. I'll move on reviewing other aspects now. |
antonfirsov
left a comment
There was a problem hiding this comment.
Plenty of notes although the review is still incomplete. Most importantly, there seems to be a bug in the GPU renderer, see the comment on the lines demo.
| texture = flushContext.Api.DeviceCreateTexture(flushContext.Device, in textureDescriptor); | ||
| if (texture is null) | ||
| { | ||
| error = "Failed to create WebGPU composition texture."; |
There was a problem hiding this comment.
Nit: passing around these error strings is odd, it would be cleaner to emit these messages to some sort of IErrorLogger.
| toolbar.Controls.Add(this.CreateRunButton("1k", 1_000)); | ||
| toolbar.Controls.Add(this.CreateRunButton("10k", 10_000)); | ||
| toolbar.Controls.Add(this.CreateRunButton("100k", 100_000)); | ||
| toolbar.Controls.Add(this.CreateRunButton("200k", 200_000)); |
There was a problem hiding this comment.
I actually wanted to make this 1M, but that makes the implementation hit buffer limits:
WebGPU (Failed: The staged-scene path tiles buffer requires 141001808 bytes, exceeding the current WebGPU binding limit of 134217728 bytes.)
Definitely not a show stopper, but some users may be limited by this, so it's worth noting down.
There was a problem hiding this comment.
It might be possible to work around this. There are hard limits to what you can send though.
There was a problem hiding this comment.
Workaround is good, but in long term I would view it as an optimization problem. The buffers are likely heavier than needed.
| canvas.Flush(); | ||
| } | ||
|
|
||
| stopwatch.Stop(); |
There was a problem hiding this comment.
Yeah, at 200K I'm seeing about 4x. That's annoying. I'll see what I can do.
| /// Use <see cref="Run(Action{DrawingCanvas{TPixel}})"/> when you want the window to drive rendering for you, or <see cref="TryAcquireFrame(TimeSpan, out WebGPUWindowFrame{TPixel}?)"/> when you need to drive the frame loop yourself. | ||
| /// </summary> | ||
| /// <typeparam name="TPixel">The canvas pixel format.</typeparam> | ||
| public sealed unsafe class WebGPUWindow<TPixel> : IDisposable |
There was a problem hiding this comment.
This type has limited usability since it needs to create a brand-new popup window and doesn't allow rendering to a control of hosted by popular UI frameworks. I hardly see anyone using it for serious things except gamedevs who want to experiment with our API.
In #377 (review) I suggested exploring ways to create a canvas around an existing window handle that could be an entry point for WinForms/WPF/WinUI integration (maybe even Avalonia or MAUI). IMO such a feature would radically help adaption.
I can explore this, but I can't promise to do it fast.
There was a problem hiding this comment.
WebGPUWindow<TPixel> is a managed presentation surface intended for tools where we own the window. The embedding entry point you're looking for is WebGPUDeviceContext<TPixel>:
WebGPUDeviceContext(nint deviceHandle, nint queueHandle)wraps externally-owned device/queue handles without taking ownership.CreateCanvas(nint textureHandle, nint textureViewHandle, WebGPUTextureFormatId format, int width, int height, DrawingOptions options)wraps an externally-owned texture/view into aDrawingCanvas<TPixel>for one frame.
The integration shape is the same on every platform:
- The host owns the WebGPU surface, swap-chain, device, and queue.
- The render loop acquires the current texture and view from the surface.
- The four handles go into
WebGPUDeviceContext.CreateCanvas(...), draw, dispose, present through the host's loop.
So the primitive for "render to an existing window/control" is already there. Per-framework wrappers (WinForms control, WPF host, Avalonia control, etc.) that hide steps 1 and 2 would just need framework-specific swap-chain plumbing utilizing that type.
There was a problem hiding this comment.
wraps externally-owned device/queue handles
It is quite a ceremony to set-up those. I'm suggesting an integration point where all the user has is a window handle.
Even if we don't expose additional API-s, it would be still very important to make a demo to validate this works E2E and to provide a how-to to users, since IMO this is the No. 1️⃣ use-case for the WebGPU renderer. Once we have the demo we may see if it's worth to move something to the library.
I originally wanted to look into this, thus the current review over the WebGPU bits. However, I also want to look into a replacement for SixLabors/ImageSharp#3056 (see last comment) which is not an easy job no promise I can get it done fast.
So let me know where should I direct my attention.
There was a problem hiding this comment.
So let me know where should I direct my attention.
Here for now reviewing please. I really need to get this in through please have a look at my replacement for the tracking PR though as I think it's far better.
I think I might have the hosted window pattern cracked. Will post an update and sample soon.
There was a problem hiding this comment.
WebGPUHostedWindow<TPixel> is in and I've added some WinForms demos using it in a custom control. It's dead easy to use.
antonfirsov
left a comment
There was a problem hiding this comment.
(Mostly) high-level notes on IDrawingCanvas & DrawingCanvas<TPixel>.
| Image<TPixel> convertedImage = image.CloneAs<TPixel>(); | ||
| this.DrawImageCore(convertedImage, sourceRect, destinationRect, sampler, ownsSourceImage: true); |
There was a problem hiding this comment.
Any reason to pass around ownsSourceImage instead of using Image<TPixel> when it's temporary?
There was a problem hiding this comment.
DrawImageCore sometimes transfers ownership of the source image to the deferred batch (this.pendingImageResources) instead of disposing it inline. The batch's ImageBrush<TPixel> needs it to live until flush. A using at the caller would dispose it too early.
The three paths inside:
- No scaling/cropping/transforming: source goes directly into the brush; batch owns it until flush.
- An intermediate is created (scale/crop/transform); source is disposed early.
- Intermediate replaces the source and the source was caller-owned; source disposed early, intermediate handed to the batch.
using would force an extra clone on path 1 (dispose our temp, re-clone for the batch). The ownsSourceImage flag is the compact way to express dynamic ownership transfer.
Happy to rename it (callerOwnsSource / sourceIsTemporary) if the name's the issue, but the shape needs to stay.
| public int SaveLayer() | ||
| => this.SaveLayer(new GraphicsOptions()); | ||
|
|
||
| /// <inheritdoc /> | ||
| public int SaveLayer(GraphicsOptions layerOptions) | ||
| => this.SaveLayer(layerOptions, this.Bounds); |
There was a problem hiding this comment.
Would it make sense to promote all delegating implementations to default implementations on the IDrawingCanvas interface?
There was a problem hiding this comment.
Personally, I find that pattern makes discovery more difficult.
There was a problem hiding this comment.
Is that because DocFX doesn't generate documentation for inherited methods in the implementing type?
My new comments on DrawingCanvas and factory/extension utilities are related.
There was a problem hiding this comment.
I mean just for navigating code. I think all the DoxFX inheritance stuff is fixed.
| /// Use <see cref="MeasureTextBounds"/> for glyph ink bounds or | ||
| /// <see cref="MeasureTextRenderableBounds"/> for the union of logical advance and rendered bounds. | ||
| /// </remarks> | ||
| public RectangleF MeasureTextAdvance(RichTextOptions textOptions, ReadOnlySpan<char> text); |
There was a problem hiding this comment.
I'm a full font-shaping layman, but I noticed that the text measurement API-s differ a lot from their CanvasRenderingContext2D counterpart which has a single overload returning a TextMetrics instance.
Naiively it seems like they provide different features. Will the overloads be documented and explained via examples? Would it make sense to attempt some consolidation to make the API more understandable, and in - some cases - help avoiding double work? E.g. maybe a user wants to measure wants to measure bounds and count lines, and this might be cheaper to do in a single measurement run than in two consecutive calls.
There was a problem hiding this comment.
I'll need to do some upstream work in Fonts but I can actually add a TextMetrics type to the TextMeasurer there. The canvas can use that method and people can use individual methods directly via TextMeasurer
There was a problem hiding this comment.
Done. See MeasureText. This returns a method optimized to do a single loop. I've also optimized individual TextMeasurer methods to remove some allocations.
|
|
||
| // Defensive guard: built-in backends should provide either direct readback (CPU/backed surface) | ||
| // or shadow fallback. If this fails, it indicates a backend implementation bug or an unsupported custom backend. | ||
| if (!this.TryCreateProcessSourceImage(sourceRect, out Image<TPixel>? sourceImage)) |
There was a problem hiding this comment.
Apply methods are very expensive on GPU backends in comparison to GPU accelerated drawing operations. Since the IImageProcessingContext interface doesn't make it obvious there is an Image<T> backed by CPU memory, this can be even surprising. Is there any reasonable scenario where user would be willing to pay the high price? If not, I would say it makes sense to always throw a NotSupportedException on GPU.
There was a problem hiding this comment.
The cost is more bounded than it looks. Apply on GPU scissors to the path's AABB, not the full frame:
- Flush queued GPU work so the read sees the correct backdrop.
backend.TryReadRegion(...)reads only the path's bounding rectangle.Image<TPixel>.Mutate(operation)runs the processor on that sub-image.- Result is wrapped as an
ImageBrushand composited via the normal GPU fill path.
A 100×100 blur on a 1920×1080 canvas is 10k pixels readback, not 2M. The real cost is the processor itself, which is what the user asked for.
This is the bridge to the entire ImageSharp.Processing surface: blur, convolutions, color correction, quantization, dither, pixelate, resize, tone curves, edge detection, user-authored IImageProcessors. None exist as GPU-native primitives and most never would/could.
If we throw on GPU, the user wanting a blurred backdrop has to read the region back manually (usually a larger rect than the AABB), run the processor, build an ImageBrush, draw it. Same four steps, moved into every caller, with worse scissoring.
There was a problem hiding this comment.
The cost is more bounded than it looks
Have you benchmarked drawing a scene with Apply involved + same without apply?
A 100×100 blur on a 1920×1080 canvas is 10k pixels readback, not 2M.
I expect the readback to imply a significant synchronization cost regardless of the amount of data. Note that it is implemented via sync-over-async to keep the API synchronous:
the processor itself, which is what the user asked for.
For a newcomer who wants to use the WebGPU pipeline and didn't use ImageSharp before, it might not be obvious from the IImageProcessingContext API shape that processors are running on the CPU.
There was a problem hiding this comment.
A couple of random observations and follow-ups to previous discussions. The GH review UI is weird and shows comments added during my review to previous discussions two times: in those discussions and here in form of "new" comments.
Will move on to WebGPUHostedWindow<T> tomorrow.
| LineJoin = options.LineJoin switch | ||
| { | ||
| LineJoin.MiterRound => PolygonClipper.LineJoin.MiterRound, | ||
| LineJoin.Bevel => PolygonClipper.LineJoin.Bevel, | ||
| LineJoin.Round => PolygonClipper.LineJoin.Round, | ||
| LineJoin.MiterRevert => PolygonClipper.LineJoin.MiterRevert, | ||
| _ => PolygonClipper.LineJoin.Miter, | ||
| }, | ||
|
|
||
| InnerJoin = options.InnerJoin switch | ||
| { | ||
| InnerJoin.Round => PolygonClipper.InnerJoin.Round, | ||
| InnerJoin.Miter => PolygonClipper.InnerJoin.Miter, | ||
| InnerJoin.Jag => PolygonClipper.InnerJoin.Jag, | ||
| _ => PolygonClipper.InnerJoin.Bevel, | ||
| }, |
There was a problem hiding this comment.
Couple of notes:
- InnerJoin.Bevel is not propagated here, making it a dead public API. Would it be possible to do an extensive search with an LLM to find all dead API-s and take action on the findings (fix or deletion)?
- InnerJoin options Jag and Round seem to be untested.
- Unlike LineCap, InnerJoin and LineJoin are unused in the WebGPU backend. Assuming it has it's own outlining logic and doesn't use PolygonClipper, this looks like a bug. If we don't want to implement them yet, the feature gap should be documented somewhere.
| namespace SixLabors.ImageSharp.Drawing.Processing.Backends; | ||
|
|
||
| /// <summary> | ||
| /// Internal WebGPU texture upload, readback, and release helpers used by tests and benchmarks. |
There was a problem hiding this comment.
Except for trivially simle and light test-only internal APIs, it is better to place test/benchmark code to ImageSharp.Tests/TestUtilities to avoid shipping dead IL code (from users POV) -- it makes dll-s heavier.
Unless you are 100% sure this is the only instance doing this, this is another aspect to consider for an extensive search + cleanup.
|
|
||
| // Defensive guard: built-in backends should provide either direct readback (CPU/backed surface) | ||
| // or shadow fallback. If this fails, it indicates a backend implementation bug or an unsupported custom backend. | ||
| if (!this.TryCreateProcessSourceImage(sourceRect, out Image<TPixel>? sourceImage)) |
There was a problem hiding this comment.
The cost is more bounded than it looks
Have you benchmarked drawing a scene with Apply involved + same without apply?
A 100×100 blur on a 1920×1080 canvas is 10k pixels readback, not 2M.
I expect the readback to imply a significant synchronization cost regardless of the amount of data. Note that it is implemented via sync-over-async to keep the API synchronous:
the processor itself, which is what the user asked for.
For a newcomer who wants to use the WebGPU pipeline and didn't use ImageSharp before, it might not be obvious from the IImageProcessingContext API shape that processors are running on the CPU.
| /// A drawing canvas over a frame target. | ||
| /// </summary> | ||
| /// <typeparam name="TPixel">The pixel format.</typeparam> | ||
| public sealed partial class DrawingCanvas<TPixel> : IDrawingCanvas |
There was a problem hiding this comment.
| public sealed partial class DrawingCanvas<TPixel> : IDrawingCanvas | |
| public sealed class DrawingCanvas<TPixel> : IDrawingCanvas |
| /// <summary> | ||
| /// Convenience extension methods for creating <see cref="DrawingCanvas{TPixel}"/> instances from ImageSharp image types. | ||
| /// </summary> | ||
| public static class DrawingCanvasExtensions |
There was a problem hiding this comment.
[Naming] These are not extensions on DrawingCanvas. Practically the type only contains factory methods, so I would consider a name communicating this.
| /// A drawing canvas over a frame target. | ||
| /// </summary> | ||
| /// <typeparam name="TPixel">The pixel format.</typeparam> | ||
| public sealed partial class DrawingCanvas<TPixel> : IDrawingCanvas |
There was a problem hiding this comment.
Is there a good reason to publicly expose the DrawingCanvas<T> type instead of just having a factory methods returning IDrawingCanvas instances? (Which are already present in DrawingCanvasExtensions)
| public int SaveLayer() | ||
| => this.SaveLayer(new GraphicsOptions()); | ||
|
|
||
| /// <inheritdoc /> | ||
| public int SaveLayer(GraphicsOptions layerOptions) | ||
| => this.SaveLayer(layerOptions, this.Bounds); |
There was a problem hiding this comment.
Is that because DocFX doesn't generate documentation for inherited methods in the implementing type?
My new comments on DrawingCanvas and factory/extension utilities are related.
| /// <summary> | ||
| /// Convenience shape helpers that build paths and forward to the core <see cref="IDrawingCanvas"/> fill and draw primitives. | ||
| /// </summary> | ||
| public static class DrawingCanvasShapeExtensions |
There was a problem hiding this comment.
What decides if a method should go here or to IDrawingCanvas? As pointed out in another comment, DrawingCanvas<T> : IDrawingCanvas has a couple of delegating methods that don't use it's internal state directly.

Prerequisites
Breaking Changes: DrawingCanvas API
Fix #106
Fix #244
Fix #344
Fix #367
This is a major breaking change. The library's public API has been completely redesigned around a canvas-based drawing model, replacing the previous collection of imperative extension methods.
What changed
The old API surface — dozens of
IImageProcessingContextextension methods likeDrawLine(),DrawPolygon(),FillPolygon(),DrawBeziers(),DrawImage(),DrawText(), etc. — has been removed entirely. These methods were individually simple but suffered from several architectural limitations:The new model:
DrawingCanvasAll drawing now goes through
IDrawingCanvas/DrawingCanvas<TPixel>, a stateful canvas that queues draw commands and flushes them as a batch.Via
Image.Mutate()(most common)Standalone usage (without
Image.Mutate)DrawingCanvas<TPixel>can be created directly from an image or frame using theCreateCanvas(...)extensions:Canvas state management
The canvas supports a save/restore stack (similar to HTML Canvas or SkCanvas):
State includes
DrawingOptions(graphics options, shape options, transform) and clip paths.SaveLayercreates an offscreen layer that composites back onRestore.IDrawingBackend— bring your own rendererThe library's rasterization and composition pipeline is abstracted behind
IDrawingBackend. This interface has the following methods:FlushCompositions<TPixel>TryReadRegion<TPixel>Process()andDrawImage()).The library ships with
DefaultDrawingBackend(CPU, tiled fixed-point rasterizer). An experimental WebGPU compute-shader backend (ImageSharp.Drawing.WebGPU) is also available, demonstrating how alternate backends plug in. Users can provide their own implementations — for example, GPU-accelerated backends, SVG emitters, or recording/replay layers.Backends are registered on
Configuration:Migration guide
ctx.Fill(color, path)ctx.ProcessWithCanvas(c => c.Fill(Brushes.Solid(color), path))ctx.Fill(brush, path)ctx.ProcessWithCanvas(c => c.Fill(brush, path))ctx.Draw(pen, path)ctx.ProcessWithCanvas(c => c.Draw(pen, path))ctx.DrawLine(pen, points)ctx.ProcessWithCanvas(c => c.DrawLine(pen, points))ctx.DrawPolygon(pen, points)ctx.ProcessWithCanvas(c => c.Draw(pen, new Polygon(new LinearLineSegment(points))))ctx.FillPolygon(brush, points)ctx.ProcessWithCanvas(c => c.Fill(brush, new Polygon(new LinearLineSegment(points))))ctx.DrawText(text, font, color, origin)ctx.ProcessWithCanvas(c => c.DrawText(new RichTextOptions(font) { Origin = origin }, text, Brushes.Solid(color), null))ctx.DrawImage(overlay, opacity)ctx.ProcessWithCanvas(c => c.DrawImage(overlay, sourceRect, destRect))ProcessWithCanvasblock — commands are batched and flushed togetherOther breaking changes in this PR
AntialiasSubpixelDepthremoved — The rasterizer now uses a fixed 256-step (8-bit) subpixel depth. The oldAntialiasSubpixelDepthproperty (default: 16) controlled how many vertical subpixel steps the rasterizer used per pixel row. The new fixed-point scanline rasterizer integrates area/cover analytically per cell rather than sampling at discrete subpixel rows, so the "depth" is a property of the coordinate precision (24.8 fixed-point), not a tunable sample count. 256 steps gives ~0.4% coverage granularity — more than sufficient for all practical use cases. The old default of 16 (~6.25% granularity) could produce visible banding on gentle slopes.GraphicsOptions.Antialias— now controlsRasterizationMode(antialiased vs aliased). Whenfalse, coverage is snapped to binary usingAntialiasThreshold.GraphicsOptions.AntialiasThreshold— new property (0–1, default 0.5) controlling the coverage cutoff in aliased mode. Pixels with coverage at or above this value become fully opaque; pixels below are discarded.Benchmarks
All benchmarks run under the following environment.
DrawPolygonAll - Renders a 7200x4800px path of the state of Mississippi with a 2px stroke.
FillParis - Renders 1096x1060px scene containing 50K fill paths.