DrawingCanvas API: Replace imperative extension methods with stateful canvas-based drawing model by JimBobSquarePants · Pull Request #377 · SixLabors/ImageSharp.Drawing

JimBobSquarePants · 2026-03-01T12:40:25Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Breaking Changes: DrawingCanvas API

This is a major breaking change. The library's public API has been completely redesigned around a canvas-based drawing model, replacing the previous collection of imperative extension methods.

What changed

The old API surface — dozens of IImageProcessingContext extension methods like DrawLine(), DrawPolygon(), FillPolygon(), DrawBeziers(), DrawImage(), DrawText(), etc. — has been removed entirely. These methods were individually simple but suffered from several architectural limitations:

Each call was an independent image processor that rasterized and composited in isolation, making it impossible to batch or reorder operations.
State (blending mode, clip paths, transforms) had to be passed to every single call.
There was no way for an alternate rendering backend to intercept or accelerate a sequence of draw calls.

The new model: `DrawingCanvas`

All drawing now goes through IDrawingCanvas / DrawingCanvas<TPixel>, a stateful canvas that queues draw commands and flushes them as a batch.

Via `Image.Mutate()` (most common)

using SixLabors.ImageSharp.Drawing;
using SixLabors.ImageSharp.Drawing.Processing;

image.Mutate(ctx => ctx.Paint(canvas =>
{
    // Fill a path
    canvas.Fill(Brushes.Solid(Color.Red), new EllipsePolygon(200, 200, 100));

    // Stroke a path
    canvas.Draw(Pens.Solid(Color.Blue, 3), new RectangularPolygon(50, 50, 200, 100));

    // Draw a polyline
    canvas.DrawLine(Pens.Solid(Color.Green, 2), new PointF(0, 0), new PointF(100, 100));

    // Draw text
    canvas.DrawText(
        new RichTextOptions(font) { Origin = new PointF(10, 10) },
        "Hello, World!",
        brush: Brushes.Solid(Color.Black),
        pen: null);

    // Draw an image
    canvas.DrawImage(sourceImage, sourceRect, destinationRect);

    // Save/Restore state (options, clip paths)
    canvas.Save(new DrawingOptions
    {
        GraphicsOptions = new GraphicsOptions { BlendPercentage = 0.5f }
    });
    canvas.Fill(brush, path);
    canvas.Restore();

    // Apply arbitrary image processing to a path region
    canvas.Apply(path, inner => inner.Brightness(0.5f));

    // Commands are flushed on Dispose (or call canvas.Flush() explicitly)
}));

Standalone usage (without `Image.Mutate`)

DrawingCanvas<TPixel> can be created directly from an image or frame using the CreateCanvas(...) extensions:

using var canvas = image.CreateCanvas(new DrawingOptions());

canvas.Fill(brush, path);
canvas.Draw(pen, path);
canvas.Flush();

using var canvas = image.CreateCanvas(frameIndex: 0, options: new DrawingOptions());
// ...

using var canvas = frame.CreateCanvas(new DrawingOptions());
// ...

Canvas state management

The canvas supports a save/restore stack (similar to HTML Canvas or SkCanvas):

int saveCount = canvas.Save();             // push current state
canvas.Save(options, clipPath1, clipPath2); // push and replace state

canvas.Restore();              // pop one level
canvas.RestoreTo(saveCount);   // pop to a specific level

State includes DrawingOptions (graphics options, shape options, transform) and clip paths. SaveLayer creates an offscreen layer that composites back on Restore.

`IDrawingBackend` — bring your own renderer

The library's rasterization and composition pipeline is abstracted behind IDrawingBackend. This interface has the following methods:

Method	Purpose
`FlushCompositions<TPixel>`	Flushes queued composition operations for the target.
`TryReadRegion<TPixel>`	Read pixels back from the target (needed for `Process()` and `DrawImage()`).

The library ships with DefaultDrawingBackend (CPU, tiled fixed-point rasterizer). An experimental WebGPU compute-shader backend (ImageSharp.Drawing.WebGPU) is also available, demonstrating how alternate backends plug in. Users can provide their own implementations — for example, GPU-accelerated backends, SVG emitters, or recording/replay layers.

Backends are registered on Configuration:

configuration.SetDrawingBackend(myCustomBackend);

Migration guide

Old API	New API
`ctx.Fill(color, path)`	`ctx.ProcessWithCanvas(c => c.Fill(Brushes.Solid(color), path))`
`ctx.Fill(brush, path)`	`ctx.ProcessWithCanvas(c => c.Fill(brush, path))`
`ctx.Draw(pen, path)`	`ctx.ProcessWithCanvas(c => c.Draw(pen, path))`
`ctx.DrawLine(pen, points)`	`ctx.ProcessWithCanvas(c => c.DrawLine(pen, points))`
`ctx.DrawPolygon(pen, points)`	`ctx.ProcessWithCanvas(c => c.Draw(pen, new Polygon(new LinearLineSegment(points))))`
`ctx.FillPolygon(brush, points)`	`ctx.ProcessWithCanvas(c => c.Fill(brush, new Polygon(new LinearLineSegment(points))))`
`ctx.DrawText(text, font, color, origin)`	`ctx.ProcessWithCanvas(c => c.DrawText(new RichTextOptions(font) { Origin = origin }, text, Brushes.Solid(color), null))`
`ctx.DrawImage(overlay, opacity)`	`ctx.ProcessWithCanvas(c => c.DrawImage(overlay, sourceRect, destRect))`
Multiple independent draw calls	Single `ProcessWithCanvas` block — commands are batched and flushed together

Other breaking changes in this PR

AntialiasSubpixelDepth removed — The rasterizer now uses a fixed 256-step (8-bit) subpixel depth. The old AntialiasSubpixelDepth property (default: 16) controlled how many vertical subpixel steps the rasterizer used per pixel row. The new fixed-point scanline rasterizer integrates area/cover analytically per cell rather than sampling at discrete subpixel rows, so the "depth" is a property of the coordinate precision (24.8 fixed-point), not a tunable sample count. 256 steps gives ~0.4% coverage granularity — more than sufficient for all practical use cases. The old default of 16 (~6.25% granularity) could produce visible banding on gentle slopes.
GraphicsOptions.Antialias — now controls RasterizationMode (antialiased vs aliased). When false, coverage is snapped to binary using AntialiasThreshold.
GraphicsOptions.AntialiasThreshold — new property (0–1, default 0.5) controlling the coverage cutoff in aliased mode. Pixels with coverage at or above this value become fully opaque; pixels below are discarded.

Benchmarks

All benchmarks run under the following environment.

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.26200
Unknown processor
.NET SDK=10.0.103
  [Host] : .NET 8.0.24 (8.0.2426.7010), X64 RyuJIT

Toolchain=InProcessEmitToolchain  InvocationCount=1  IterationCount=40
LaunchCount=3  UnrollFactor=1  WarmupCount=40

DrawPolygonAll - Renders a 7200x4800px path of the state of Mississippi with a 2px stroke.

Method	Mean	Error	StdDev	Median	Ratio	RatioSD
SkiaSharp	42.20 ms	2.197 ms	6.976 ms	38.18 ms	1.00	0.00
SystemDrawing	44.10 ms	0.172 ms	0.538 ms	44.05 ms	1.07	0.16
ImageSharp	12.09 ms	0.083 ms	0.269 ms	12.06 ms	0.29	0.05
ImageSharpWebGPU	12.47 ms	0.291 ms	0.940 ms	12.71 ms	0.30	0.05

FillParis - Renders 1096x1060px scene containing 50K fill paths.

Method	Mean	Error	StdDev	Ratio	RatioSD
SkiaSharp	104.46 ms	0.356 ms	1.145 ms	1.00	0.00
SystemDrawing	148.53 ms	0.327 ms	1.033 ms	1.42	0.02
ImageSharp	66.32 ms	0.999 ms	3.083 ms	0.64	0.03
ImageSharpWebGPU	41.95 ms	0.457 ms	1.368 ms	0.40	0.01

B

JimBobSquarePants · 2026-04-14T13:06:27Z

API

I find the ProcessWithCanvas method oddly verbose and hard to discover. Since the user actually wants to draw things here, how about naming the method Draw? Other simple names may also do the job.
image.Mutate(ctx => ctx.Draw(canvas =>
{
    // Fill a path
    canvas.Fill(Brushes.Solid(Color.Red), new EllipsePolygon(200, 200, 100));

    // Stroke a path
    canvas.Draw(Pens.Solid(Color.Blue, 3), new RectangularPolygon(50, 50, 200, 100));

    // Draw a polyline
    canvas.DrawLine(Pens.Solid(Color.Green, 2), new PointF(0, 0), new PointF(100, 100));
}));
Also, the image.Mutate(ctx => ctx.ProcessWithCanvas(canvas => line together with the closing parantheses is somewhat of an ergonomy killer. I think most of the cases, Drawing users want to only draw things in their Mutate calls without chaining in further processing calls. It would be nice to provide helpers to avoid using double-delegates, for example .MutateDraw() would look like this:
image.MutateDraw(canvas =>
{
    // Fill a path
    canvas.Fill(Brushes.Solid(Color.Red), new EllipsePolygon(200, 200, 100));

    // Stroke a path
    canvas.Draw(Pens.Solid(Color.Blue, 3), new RectangularPolygon(50, 50, 200, 100));

    // Draw a polyline
    canvas.DrawLine(Pens.Solid(Color.Green, 2), new PointF(0, 0), new PointF(100, 100));
});

I agree that ProcessWithCanvas was too implementation-shaped for the public API. I renamed the entry point to Paint, which fits better with the rest of the ImageSharp processing surface (Crop, Resize, GaussianBlur, etc.) while still covering the full canvas model rather than only stroke-style drawing.

I did not go with Draw because the callback can also Fill, Clear, Clip, draw text, and perform other canvas operations, and Draw already has a specific meaning on IDrawingCanvas itself.

I also considered your MutateDraw-style convenience API, but I do not think it works well as the primary shape here because it pulls drawing out of the normal IImageProcessingContext pipeline and makes composition with further chained operations more awkward. Keeping this as a context extension preserves the usual processing flow.

I’ve also updated the IntelliSense docs for the renamed extensions/processors and the related markdown/readme content to reflect the new API.

JimBobSquarePants · 2026-04-14T13:50:30Z

Parallel performance

In a6de45e I pushed an equivalent of SixLabors/ImageSharp#3111: extending benchmarks with a single-threaded mode and a ManualBenchmarks project which is hosting a throughput benchmark implemented in DrawingThroughputBenchmark.cs.

I ran these on my AMD Ryzen 7 5800H with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores.

Single vs multi-threaded runs of FillTiger
|                    Method | Dimensions |        Mean |     Error |      StdDev |      Median | Ratio | RatioSD |
|-------------------------- |----------- |------------:|----------:|------------:|------------:|------:|--------:|
|                 SkiaSharp |        100 |    858.7 us |   3.06 us |     9.37 us |    856.0 us |  1.00 |    0.00 |
|             SystemDrawing |        100 |  2,075.6 us |  49.87 us |   157.60 us |  2,036.1 us |  2.43 |    0.18 |
|                ImageSharp |        100 |  4,650.1 us | 115.22 us |   369.13 us |  4,729.9 us |  5.46 |    0.43 |
| ImageSharp_SingleThreaded |        100 |  7,618.4 us | 983.83 us | 3,151.81 us |  6,174.6 us |  9.29 |    3.59 |
|          ImageSharpWebGPU |        100 |  2,425.6 us | 118.34 us |   356.26 us |  2,397.2 us |  2.83 |    0.42 |
|                           |            |             |           |             |             |       |         |
|                 SkiaSharp |       1000 |  7,399.4 us | 101.66 us |   321.24 us |  7,339.4 us |  1.00 |    0.00 |
|             SystemDrawing |       1000 | 17,367.9 us | 215.58 us |   681.25 us | 17,131.4 us |  2.35 |    0.15 |
|                ImageSharp |       1000 |  6,790.0 us | 768.78 us | 2,484.95 us |  6,386.2 us |  0.92 |    0.36 |
| ImageSharp_SingleThreaded |       1000 | 15,816.6 us | 191.37 us |   610.30 us | 15,862.8 us |  2.14 |    0.13 |
|          ImageSharpWebGPU |       1000 |  4,113.7 us | 285.05 us |   900.79 us |  3,796.4 us |  0.56 |    0.13 |
The algorithm does scale with threads, but (according to this benchmark) it performs worse for small images than competitors, unlike in the large image case.

DrawingThroughputBenchmark runs

I ran Tiger rendering for 10 seconds with

large (2000x2000) and small (200x200) images

sequential (-p 1) and parallel (-p 16) execution

I averaged out MegaPixelsPerSec values of 10 runs for each configuration.

Size/Mode Sequential (MP/s) Parallel (MP/s)
Large 540 446
Small 67 55
Parallelization results in 15-20% reduction in throughput. Moreover, small images result in significantly worse MP/s values, which is unexpected. For image processors in the core library MP/s is typically higher on small images because of better memory locality.

I think it is worth to investigate these issues.

Thanks @antonfirsov for adding the benchmarks, they're VERY useful!

I dug into this a bit further and reran the throughput benchmark with the two concurrency dimensions separated:

internal drawing parallelism: -p
outer request concurrency: -c

That changes the picture quite a lot.

When I hold -c 1 and vary only -p, internal parallelism is clearly helping rather than hurting:

Large (2000x2000): p=1 -> 160 MP/s, p=8 -> 557 MP/s, p=16 -> 828 MP/s
Small (200x200): p=1 -> 18.6 MP/s, p=8 -> 56.1 MP/s, p=16 -> 56.5 MP/s

So I do not think the previous result supports the conclusion that our internal parallel execution is slower than serial.

For the small-image case, I also would not expect MP/s to behave like a typical core pixel processor. Tiger rendering has a substantial amount of per-scene work that does not scale with image area in the same way:

iterating the SVG elements
preparing paths and strokes
recording commands
batching and flush setup
brush/pen handling
thread coordination

That work still exists even when the target is small, so lower MP/s on small images is not by itself evidence of a regression. In absolute terms the small images are still much faster; they just amortize the fixed scene cost less effectively.

The earlier throughput result appears to have been confounded by mixing inner parallelism with outer request concurrency. If -p was varied while -c was left at its default, then the "parallel" case was not "one request, parallel" but rather "parallel work inside each request on top of already-high request concurrency", which is a different workload and can absolutely reduce overall MP/s through oversubscription.

I also tested the service-throughput shape explicitly on my machine:

c=8, p=1 -> 795 MP/s
c=4, p=2 -> 854 MP/s
c=2, p=4 -> 657 MP/s
c=1, p=8 -> 543 MP/s

That suggests the best throughput here comes from a balanced split between outer concurrency and inner parallelism, not from maximizing either one in isolation.

On the 8 vs 16 question: I agree that 8 is the sensible first comparison point on an 8-physical-core machine, but in this workload 16 did materially outperform 8 for a single large request, so simultaneous multithreading is helping here rather than hurting. For the small case, 16 buys almost nothing over 8, so the benefit is workload dependent.

Taken together, I think these results support keeping the current defaults for the ordinary non-concurrent case. For concurrent hosts, the optimal split between per-request parallelism and request-level concurrency is workload-dependent, so I think the right answer is exactly what we have now: sensible defaults, with MaxDegreeOfParallelism available for callers that need to tune for their own throughput profile.

drawing-throughput-benchmark-results.md

antonfirsov · 2026-04-15T00:34:57Z

I renamed the entry point to Paint

Paint sounds awesome!

When I hold -c 1 and vary only -p, internal parallelism is clearly helping rather than hurting [...] So I do not think the previous result supports the conclusion that our internal parallel execution is slower than serial.

Single-user "slowness" wasn't the conclusion I wanted to imply. The data proves that given high concurrency, parallelism hurts throughput, which is something I've seen to happen only with Convolution2DProcessor in the core library. The reason this matters is because AFAIK server-side processing is still the primary use case of the CPU renderer, and server-side processing usually needs to scale well with concurrency. IMO good library design means delivering good defaults for the primary use case, i.e. not asking the user in the documentation to tune MaxDegreeOfParallelism for good throughput. Given you don't want to abandon parallelism, my suggestion is to root cause and fix this before the release.

I will try helping by running some profiling.

antonfirsov · 2026-04-15T00:49:03Z

For the small-image case, I also would not expect MP/s to behave like a typical core pixel processor. Tiger rendering has a substantial amount of per-scene work that does not scale with image area in the same way:
[..]

preparing paths and strokes

Is there any more information that could be cached when the same path/stroke is being used?

[..]

thread coordination

In the core library, threading overhead is being reduced or sometimes eliminated for small images via. MinimumPixelsProcessedPerTask. Is there a way to introduce a similar mechanism?

JimBobSquarePants · 2026-04-15T03:57:58Z

For the small-image case, I also would not expect MP/s to behave like a typical core pixel processor. Tiger rendering has a substantial amount of per-scene work that does not scale with image area in the same way:
[..]

preparing paths and strokes

Is there any more information that could be cached when the same path/stroke is being used?

[..]

thread coordination

In the core library, threading overhead is being reduced or sometimes eliminated for small images via. MinimumPixelsProcessedPerTask. Is there a way to introduce a similar mechanism?

We memoise the result of IPath.ToLinearGeometry() when the transform is identity but other than that there's nothing else, we can cache.

I just ran some experiments using MinimumPixelsProcessedPerTask to determine partition counts during scene execution which is the only place it made sense. It led to a noticable degrade in performance.

Using

Scenario	Size	Width	Height	Concurrent Requests	Drawing Parallelism	Total Seconds	MegaPixelsPerSec
SingleImage	Large	2000	2000	1	1	10.03	129.65
SingleImage	Large	2000	2000	1	8	10.00	420.67
SingleImage	Large	2000	2000	1	16	10.00	625.07
SingleImage	Small	200	200	1	1	10.00	15.09
SingleImage	Small	200	200	1	8	10.00	36.57
SingleImage	Small	200	200	1	16	10.00	42.25
ServiceThroughput	Large	2000	2000	8	1	10.03	710.41
ServiceThroughput	Large	2000	2000	4	2	10.01	718.99
ServiceThroughput	Large	2000	2000	2	4	10.01	536.53
ServiceThroughput	Large	2000	2000	1	8	10.01	420.91

Not using

Scenario	Size	Width	Height	Concurrent Requests	Drawing Parallelism	Total Seconds	MegaPixelsPerSec
SingleImage	Large	2000	2000	1	1	10.01	159.47
SingleImage	Large	2000	2000	1	8	10.00	557.43
SingleImage	Large	2000	2000	1	16	10.00	803.04
SingleImage	Small	200	200	1	1	10.00	18.75
SingleImage	Small	200	200	1	8	10.00	65.62
SingleImage	Small	200	200	1	16	10.00	68.08
ServiceThroughput	Large	2000	2000	8	1	10.03	782.69
ServiceThroughput	Large	2000	2000	4	2	10.01	819.33
ServiceThroughput	Large	2000	2000	2	4	10.01	662.20
ServiceThroughput	Large	2000	2000	1	8	10.00	590.34

JimBobSquarePants · 2026-04-15T05:47:50Z

I renamed the entry point to Paint

Paint sounds awesome!

When I hold -c 1 and vary only -p, internal parallelism is clearly helping rather than hurting [...] So I do not think the previous result supports the conclusion that our internal parallel execution is slower than serial.

Single-user "slowness" wasn't the conclusion I wanted to imply. The data proves that given high concurrency, parallelism hurts throughput, which is something I've seen to happen only with Convolution2DProcessor in the core library. The reason this matters is because AFAIK server-side processing is still the primary use case of the CPU renderer, and server-side processing usually needs to scale well with concurrency. IMO good library design means delivering good defaults for the primary use case, i.e. not asking the user in the documentation to tune MaxDegreeOfParallelism for good throughput. Given you don't want to abandon parallelism, my suggestion is to root cause and fix this before the release.

I will try helping by running some profiling.

I agree that high-concurrency service throughput is worth understanding, but I don't think it should automatically define the default optimization target for ImageSharp.Drawing.

I don't see this library primarily as a web-server drawing component in the same way some ImageSharp image-processing workloads are. The scenarios I have in mind are things like CAD-style rendering, charts and graphs, UI generation, tooling, and other programmatic rendering workloads where single-render performance and overall rendering capability matter more than maximizing throughput under heavily concurrent request load.

That makes the current single-request results directly relevant, and it also means I'm comfortable with MaxDegreeOfParallelism remaining configurable for hosts that do need to tune for saturated server throughput. The data you've gathered is exactly the tuning guidance those hosts need, and I'd rather surface that in the docs than bake in a default that trades away single-render performance for a server shape that isn't the primary use case.

I did explore whether the library could self-throttle under concurrent load without user tuning. The honest answer is that any in-library mechanism needs either a shared scheduler across requests or a runtime pool-pressure signal, neither of which exists in a form we can consume without significant upstream changes. The Parallel.For + MaxDegreeOfParallelism model has the same property everywhere it's used in the ecosystem, and I don't think ImageSharp.Drawing is the right place to invent a workaround for it.

antonfirsov · 2026-04-15T17:23:14Z

CPU rasterizer parallel performance

The scenarios I have in mind are things like CAD-style rendering, charts and graphs, UI generation, tooling, and other programmatic rendering workloads where single-render performance and overall rendering capability matter more than maximizing throughput under heavily concurrent request load.

Scenarios that involve immediate, on-screen rendering are better addressed by the WebGPU renderer. Whatever is the current primary use-case of ImageSharp.Drawing V2, will be the primary use case of the CPU renderer going forward (which I assume is server, but I may be wrong).

That said, I just ran the same benchmarks with V2 rasterizer, and this PR is significantly improving every aspect perf-wise (🚀), regardless if parallelism is used or not, so I'm no longer pursuing to figure this out in the PR.

I did explore whether the library could self-throttle under concurrent load without user tuning.

I don't know if self-throttling is the only possible answer here, I would recommend to open a tracking issue once this is merged.

I'll move on reviewing other aspects now.

antonfirsov

Plenty of notes although the review is still incomplete. Most importantly, there seems to be a bug in the GPU renderer, see the comment on the lines demo.

antonfirsov · 2026-04-19T00:55:50Z

+        texture = flushContext.Api.DeviceCreateTexture(flushContext.Device, in textureDescriptor);
+        if (texture is null)
+        {
+            error = "Failed to create WebGPU composition texture.";


Nit: passing around these error strings is odd, it would be cleaner to emit these messages to some sort of IErrorLogger.

antonfirsov · 2026-04-19T00:58:51Z

+        toolbar.Controls.Add(this.CreateRunButton("1k", 1_000));
+        toolbar.Controls.Add(this.CreateRunButton("10k", 10_000));
+        toolbar.Controls.Add(this.CreateRunButton("100k", 100_000));
+        toolbar.Controls.Add(this.CreateRunButton("200k", 200_000));


I actually wanted to make this 1M, but that makes the implementation hit buffer limits:

WebGPU (Failed: The staged-scene path tiles buffer requires 141001808 bytes, exceeding the current WebGPU binding limit of 134217728 bytes.)

Definitely not a show stopper, but some users may be limited by this, so it's worth noting down.

It might be possible to work around this. There are hard limits to what you can send though.

Workaround is good, but in long term I would view it as an optimization problem. The buffers are likely heavier than needed.

antonfirsov · 2026-04-19T01:59:37Z

+            canvas.Flush();
+        }
+
+        stopwatch.Stop();


We are closing down the perf gap to Skia, but it's still 6-7x on my machine. This is acceptable but worth noting down. According to the profile of rendering 200k lines, it's the buffer mapping that seems to take a lot of time:

Yeah, at 200K I'm seeing about 4x. That's annoying. I'll see what I can do.

antonfirsov · 2026-04-19T02:31:54Z

+/// Use <see cref="Run(Action{DrawingCanvas{TPixel}})"/> when you want the window to drive rendering for you, or <see cref="TryAcquireFrame(TimeSpan, out WebGPUWindowFrame{TPixel}?)"/> when you need to drive the frame loop yourself.
+/// </summary>
+/// <typeparam name="TPixel">The canvas pixel format.</typeparam>
+public sealed unsafe class WebGPUWindow<TPixel> : IDisposable


This type has limited usability since it needs to create a brand-new popup window and doesn't allow rendering to a control of hosted by popular UI frameworks. I hardly see anyone using it for serious things except gamedevs who want to experiment with our API.

In #377 (review) I suggested exploring ways to create a canvas around an existing window handle that could be an entry point for WinForms/WPF/WinUI integration (maybe even Avalonia or MAUI). IMO such a feature would radically help adaption.

I can explore this, but I can't promise to do it fast.

WebGPUWindow<TPixel> is a managed presentation surface intended for tools where we own the window. The embedding entry point you're looking for is WebGPUDeviceContext<TPixel>:

WebGPUDeviceContext(nint deviceHandle, nint queueHandle) wraps externally-owned device/queue handles without taking ownership.

CreateCanvas(nint textureHandle, nint textureViewHandle, WebGPUTextureFormatId format, int width, int height, DrawingOptions options) wraps an externally-owned texture/view into a DrawingCanvas<TPixel> for one frame.

The integration shape is the same on every platform:

The host owns the WebGPU surface, swap-chain, device, and queue.

The render loop acquires the current texture and view from the surface.

The four handles go into WebGPUDeviceContext.CreateCanvas(...), draw, dispose, present through the host's loop.

So the primitive for "render to an existing window/control" is already there. Per-framework wrappers (WinForms control, WPF host, Avalonia control, etc.) that hide steps 1 and 2 would just need framework-specific swap-chain plumbing utilizing that type.

wraps externally-owned device/queue handles

It is quite a ceremony to set-up those. I'm suggesting an integration point where all the user has is a window handle.

Even if we don't expose additional API-s, it would be still very important to make a demo to validate this works E2E and to provide a how-to to users, since IMO this is the No. 1️⃣ use-case for the WebGPU renderer. Once we have the demo we may see if it's worth to move something to the library.

I originally wanted to look into this, thus the current review over the WebGPU bits. However, I also want to look into a replacement for SixLabors/ImageSharp#3056 (see last comment) which is not an easy job no promise I can get it done fast.

So let me know where should I direct my attention.

So let me know where should I direct my attention.

Here for now reviewing please. I really need to get this in through please have a look at my replacement for the tracking PR though as I think it's far better.

I think I might have the hosted window pattern cracked. Will post an update and sample soon.

WebGPUHostedWindow<TPixel> is in and I've added some WinForms demos using it in a custom control. It's dead easy to use.

antonfirsov

(Mostly) high-level notes on IDrawingCanvas & DrawingCanvas<TPixel>.

antonfirsov · 2026-04-20T20:56:49Z

+        Image<TPixel> convertedImage = image.CloneAs<TPixel>();
+        this.DrawImageCore(convertedImage, sourceRect, destinationRect, sampler, ownsSourceImage: true);


Any reason to pass around ownsSourceImage instead of using Image<TPixel> when it's temporary?

DrawImageCore sometimes transfers ownership of the source image to the deferred batch (this.pendingImageResources) instead of disposing it inline. The batch's ImageBrush<TPixel> needs it to live until flush. A using at the caller would dispose it too early.

The three paths inside:

No scaling/cropping/transforming: source goes directly into the brush; batch owns it until flush.

An intermediate is created (scale/crop/transform); source is disposed early.

Intermediate replaces the source and the source was caller-owned; source disposed early, intermediate handed to the batch.

using would force an extra clone on path 1 (dispose our temp, re-clone for the batch). The ownsSourceImage flag is the compact way to express dynamic ownership transfer.

Happy to rename it (callerOwnsSource / sourceIsTemporary) if the name's the issue, but the shape needs to stay.

antonfirsov · 2026-04-21T01:40:29Z

+    public int SaveLayer()
+        => this.SaveLayer(new GraphicsOptions());
+
+    /// <inheritdoc />
+    public int SaveLayer(GraphicsOptions layerOptions)
+        => this.SaveLayer(layerOptions, this.Bounds);


Would it make sense to promote all delegating implementations to default implementations on the IDrawingCanvas interface?

Personally, I find that pattern makes discovery more difficult.

Is that because DocFX doesn't generate documentation for inherited methods in the implementing type?

My new comments on DrawingCanvas and factory/extension utilities are related.

I mean just for navigating code. I think all the DoxFX inheritance stuff is fixed.

antonfirsov · 2026-04-21T01:56:13Z

+    /// Use <see cref="MeasureTextBounds"/> for glyph ink bounds or
+    /// <see cref="MeasureTextRenderableBounds"/> for the union of logical advance and rendered bounds.
+    /// </remarks>
+    public RectangleF MeasureTextAdvance(RichTextOptions textOptions, ReadOnlySpan<char> text);


I'm a full font-shaping layman, but I noticed that the text measurement API-s differ a lot from their CanvasRenderingContext2D counterpart which has a single overload returning a TextMetrics instance.

Naiively it seems like they provide different features. Will the overloads be documented and explained via examples? Would it make sense to attempt some consolidation to make the API more understandable, and in - some cases - help avoiding double work? E.g. maybe a user wants to measure wants to measure bounds and count lines, and this might be cheaper to do in a single measurement run than in two consecutive calls.

I'll need to do some upstream work in Fonts but I can actually add a TextMetrics type to the TextMeasurer there. The canvas can use that method and people can use individual methods directly via TextMeasurer

Done. See MeasureText. This returns a method optimized to do a single loop. I've also optimized individual TextMeasurer methods to remove some allocations.

antonfirsov · 2026-04-21T02:24:28Z

+
+        // Defensive guard: built-in backends should provide either direct readback (CPU/backed surface)
+        // or shadow fallback. If this fails, it indicates a backend implementation bug or an unsupported custom backend.
+        if (!this.TryCreateProcessSourceImage(sourceRect, out Image<TPixel>? sourceImage))


Apply methods are very expensive on GPU backends in comparison to GPU accelerated drawing operations. Since the IImageProcessingContext interface doesn't make it obvious there is an Image<T> backed by CPU memory, this can be even surprising. Is there any reasonable scenario where user would be willing to pay the high price? If not, I would say it makes sense to always throw a NotSupportedException on GPU.

The cost is more bounded than it looks. Apply on GPU scissors to the path's AABB, not the full frame:

Flush queued GPU work so the read sees the correct backdrop.

backend.TryReadRegion(...) reads only the path's bounding rectangle.

Image<TPixel>.Mutate(operation) runs the processor on that sub-image.

Result is wrapped as an ImageBrush and composited via the normal GPU fill path.

A 100×100 blur on a 1920×1080 canvas is 10k pixels readback, not 2M. The real cost is the processor itself, which is what the user asked for.

This is the bridge to the entire ImageSharp.Processing surface: blur, convolutions, color correction, quantization, dither, pixelate, resize, tone curves, edge detection, user-authored IImageProcessors. None exist as GPU-native primitives and most never would/could.

If we throw on GPU, the user wanting a blurred backdrop has to read the region back manually (usually a larger rect than the AABB), run the processor, build an ImageBrush, draw it. Same four steps, moved into every caller, with worse scissoring.

The cost is more bounded than it looks

Have you benchmarked drawing a scene with Apply involved + same without apply?

A 100×100 blur on a 1920×1080 canvas is 10k pixels readback, not 2M.

I expect the readback to imply a significant synchronization cost regardless of the amount of data. Note that it is implemented via sync-over-async to keep the API synchronous:

ImageSharp.Drawing/src/ImageSharp.Drawing.WebGPU/WebGPUDrawingBackend.Readback.cs

Lines 182 to 193 in f1ec587

BufferMapAsyncStatus mapStatus = BufferMapAsyncStatus.Unknown;

using ManualResetEventSlim mapReady = new(false);

void Callback(BufferMapAsyncStatus status, void* userData)

{

_ = userData;

mapStatus = status;

mapReady.Set();

}

using PfnBufferMapCallback callback = PfnBufferMapCallback.From(Callback);

api.BufferMapAsync(readbackBuffer, MapMode.Read, 0, (nuint)readbackByteCount, callback, null);

if (!WaitForMapSignal(wgpuExtension, device, mapReady) || mapStatus != BufferMapAsyncStatus.Success)

the processor itself, which is what the user asked for.

For a newcomer who wants to use the WebGPU pipeline and didn't use ImageSharp before, it might not be obvious from the IImageProcessingContext API shape that processors are running on the CPU.

antonfirsov

A couple of random observations and follow-ups to previous discussions. The GH review UI is weird and shows comments added during my review to previous discussions two times: in those discussions and here in form of "new" comments.

Will move on to WebGPUHostedWindow<T> tomorrow.

antonfirsov · 2026-04-23T17:14:13Z

+            LineJoin = options.LineJoin switch
+            {
+                LineJoin.MiterRound => PolygonClipper.LineJoin.MiterRound,
+                LineJoin.Bevel => PolygonClipper.LineJoin.Bevel,
+                LineJoin.Round => PolygonClipper.LineJoin.Round,
+                LineJoin.MiterRevert => PolygonClipper.LineJoin.MiterRevert,
+                _ => PolygonClipper.LineJoin.Miter,
+            },
+
+            InnerJoin = options.InnerJoin switch
+            {
+                InnerJoin.Round => PolygonClipper.InnerJoin.Round,
+                InnerJoin.Miter => PolygonClipper.InnerJoin.Miter,
+                InnerJoin.Jag => PolygonClipper.InnerJoin.Jag,
+                _ => PolygonClipper.InnerJoin.Bevel,
+            },


Couple of notes:

InnerJoin.Bevel is not propagated here, making it a dead public API. Would it be possible to do an extensive search with an LLM to find all dead API-s and take action on the findings (fix or deletion)?

InnerJoin options Jag and Round seem to be untested.

Unlike LineCap, InnerJoin and LineJoin are unused in the WebGPU backend. Assuming it has it's own outlining logic and doesn't use PolygonClipper, this looks like a bug. If we don't want to implement them yet, the feature gap should be documented somewhere.

antonfirsov · 2026-04-23T17:23:45Z

+namespace SixLabors.ImageSharp.Drawing.Processing.Backends;
+
+/// <summary>
+/// Internal WebGPU texture upload, readback, and release helpers used by tests and benchmarks.


Except for trivially simle and light test-only internal APIs, it is better to place test/benchmark code to ImageSharp.Tests/TestUtilities to avoid shipping dead IL code (from users POV) -- it makes dll-s heavier.

Unless you are 100% sure this is the only instance doing this, this is another aspect to consider for an extensive search + cleanup.

antonfirsov · 2026-04-23T21:30:32Z

+
+        // Defensive guard: built-in backends should provide either direct readback (CPU/backed surface)
+        // or shadow fallback. If this fails, it indicates a backend implementation bug or an unsupported custom backend.
+        if (!this.TryCreateProcessSourceImage(sourceRect, out Image<TPixel>? sourceImage))


The cost is more bounded than it looks

Have you benchmarked drawing a scene with Apply involved + same without apply?

A 100×100 blur on a 1920×1080 canvas is 10k pixels readback, not 2M.

I expect the readback to imply a significant synchronization cost regardless of the amount of data. Note that it is implemented via sync-over-async to keep the API synchronous:

ImageSharp.Drawing/src/ImageSharp.Drawing.WebGPU/WebGPUDrawingBackend.Readback.cs

Lines 182 to 193 in f1ec587

BufferMapAsyncStatus mapStatus = BufferMapAsyncStatus.Unknown;

using ManualResetEventSlim mapReady = new(false);

void Callback(BufferMapAsyncStatus status, void* userData)

{

_ = userData;

mapStatus = status;

mapReady.Set();

}

using PfnBufferMapCallback callback = PfnBufferMapCallback.From(Callback);

api.BufferMapAsync(readbackBuffer, MapMode.Read, 0, (nuint)readbackByteCount, callback, null);

if (!WaitForMapSignal(wgpuExtension, device, mapReady) || mapStatus != BufferMapAsyncStatus.Success)

the processor itself, which is what the user asked for.

For a newcomer who wants to use the WebGPU pipeline and didn't use ImageSharp before, it might not be obvious from the IImageProcessingContext API shape that processors are running on the CPU.

antonfirsov · 2026-04-23T21:34:16Z

+/// A drawing canvas over a frame target.
+/// </summary>
+/// <typeparam name="TPixel">The pixel format.</typeparam>
+public sealed partial class DrawingCanvas<TPixel> : IDrawingCanvas


Suggested change

public sealed partial class DrawingCanvas<TPixel> : IDrawingCanvas

public sealed class DrawingCanvas<TPixel> : IDrawingCanvas

antonfirsov · 2026-04-23T21:41:05Z

+/// <summary>
+/// Convenience extension methods for creating <see cref="DrawingCanvas{TPixel}"/> instances from ImageSharp image types.
+/// </summary>
+public static class DrawingCanvasExtensions


[Naming] These are not extensions on DrawingCanvas. Practically the type only contains factory methods, so I would consider a name communicating this.

antonfirsov · 2026-04-23T21:43:10Z

+/// A drawing canvas over a frame target.
+/// </summary>
+/// <typeparam name="TPixel">The pixel format.</typeparam>
+public sealed partial class DrawingCanvas<TPixel> : IDrawingCanvas


Is there a good reason to publicly expose the DrawingCanvas<T> type instead of just having a factory methods returning IDrawingCanvas instances? (Which are already present in DrawingCanvasExtensions)

antonfirsov · 2026-04-23T21:44:27Z

+    public int SaveLayer()
+        => this.SaveLayer(new GraphicsOptions());
+
+    /// <inheritdoc />
+    public int SaveLayer(GraphicsOptions layerOptions)
+        => this.SaveLayer(layerOptions, this.Bounds);


Is that because DocFX doesn't generate documentation for inherited methods in the implementing type?

My new comments on DrawingCanvas and factory/extension utilities are related.

antonfirsov · 2026-04-23T21:46:50Z

+/// <summary>
+/// Convenience shape helpers that build paths and forward to the core <see cref="IDrawingCanvas"/> fill and draw primitives.
+/// </summary>
+public static class DrawingCanvasShapeExtensions


What decides if a method should go here or to IDrawingCanvas? As pointed out in another comment, DrawingCanvas<T> : IDrawingCanvas has a couple of delegating methods that don't use it's internal state directly.

JimBobSquarePants added 30 commits February 20, 2026 12:38

Begin WebGPU backend

11eabed

We have a prototype!

7ea5866

Refactor WebGPU drawing backend and shaders

08fb876

Support multiple pixel fomats

ac16b61

WebGPU: coverage scratch & dynamic uniform buffers

139cbb4

Introduce ICanvasFrame and native surface support

4d5ce89

Refactor and simplify

4414c88

Refactor GPU composition to instance-based batching

f7a8bc4

B

Refactor WebGPU backend to use FlushContext

1b3b692

Improve WebGPU readback, batching and coverage

079488a

Refactor WebGPU composite pipeline & instance data

c07f685

WebGPU: support instance buffer offsets & sessions

00b4523

Remove Configuration from FillPath; improve batching

d4ff9e1

Use List and pre-size entries in DrawTextOperations

4d807df

Cache path definition keys to avoid flattening

3255a1b

Add glyph cache, layer & path handling

4915396

Add WebGPU brush composers and pipeline infra

bdf15aa

Per-brush instance payloads for WebGPU composite

1d94c73

Move WGPUTextureFormat note to code comment

edc9b8c

Switch compositing to compute shaders

7e7ce5d

Tiled composite compute pass and brush refactor

f659a3a

Introduce CompositionScene and planning API

1eb7d56

Document and refactor WebGPU drawing backend

a0eccdd

Add composition bounds and refactor flush context

4bf2016

Add WebGPU coverage pipeline and WGSL shaders

8f35205

Composite: switch to tile-based dispatch

06a8326

Add GPU tile-count/prefix/scatter passes

bd79179

Replace PreparedComposite with fine tiled pipeline

d2d6143

Implement binning-based composite pipeline

1d41bac

Add docs and refactor WebGPU shaders/backend

00f848e

Process -> Apply

1af0996

Add useful scripts.

64edabf

antonfirsov mentioned this pull request Apr 15, 2026

[v2] Port SVG parsing fix and add SVG benchmarks #388

Closed

4 tasks

antonfirsov reviewed Apr 18, 2026

View reviewed changes

Comment thread samples/DrawShapesWithImageSharp/Program.cs Outdated

antonfirsov added 2 commits April 18, 2026 16:37

restructure solutions and fix DrawShapesWithImageSharp sample

f61551d

make the lines benchmark resizable

8ebefc2

antonfirsov reviewed Apr 19, 2026

View reviewed changes

JimBobSquarePants added 8 commits April 19, 2026 14:45

Refactor WebGPU handles and render targets

e48c02c

Use safe handle refs for WebGPU native calls via using

9637899

Use DrawingOptions in composition and stroke commands

84e2c32

Fix tests

185f7d6

Update WebGPU docs.

acca457

Align chunk tile heights to 16-row bins

529615b

Add more efficient path-row shaders and sparse tile handling

0f450fc

Fix benchmark resize handling

ecc069f

antonfirsov reviewed Apr 21, 2026

View reviewed changes

JimBobSquarePants added 5 commits April 22, 2026 17:31

Add hosted window API and samples + fix bugs/perf

d3c3979

Remove overly defensive guards

3fc9b18

Make CloneForClearOperation internal

d444734

Use IEnumerable for DrawGlyphs

b89a0af

Unify text measurement API to TextMetrics

f1ec587

antonfirsov reviewed Apr 23, 2026

View reviewed changes

		Image<TPixel> convertedImage = image.CloneAs<TPixel>();
		this.DrawImageCore(convertedImage, sourceRect, destinationRect, sampler, ownsSourceImage: true);

	BufferMapAsyncStatus mapStatus = BufferMapAsyncStatus.Unknown;
	using ManualResetEventSlim mapReady = new(false);
	void Callback(BufferMapAsyncStatus status, void* userData)
	{
	_ = userData;
	mapStatus = status;
	mapReady.Set();
	}

	using PfnBufferMapCallback callback = PfnBufferMapCallback.From(Callback);
	api.BufferMapAsync(readbackBuffer, MapMode.Read, 0, (nuint)readbackByteCount, callback, null);
	if (!WaitForMapSignal(wgpuExtension, device, mapReady) \|\| mapStatus != BufferMapAsyncStatus.Success)

	public sealed partial class DrawingCanvas<TPixel> : IDrawingCanvas
	public sealed class DrawingCanvas<TPixel> : IDrawingCanvas

Uh oh!

Conversation

JimBobSquarePants commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites

Breaking Changes: DrawingCanvas API

What changed

The new model: DrawingCanvas

Via Image.Mutate() (most common)

Standalone usage (without Image.Mutate)

Canvas state management

IDrawingBackend — bring your own renderer

Migration guide

Other breaking changes in this PR

Benchmarks

Uh oh!

JimBobSquarePants commented Apr 14, 2026

API

Uh oh!

JimBobSquarePants commented Apr 14, 2026

Parallel performance

Single vs multi-threaded runs of FillTiger

DrawingThroughputBenchmark runs

Uh oh!

antonfirsov commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antonfirsov commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JimBobSquarePants commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JimBobSquarePants commented Apr 15, 2026

Uh oh!

antonfirsov commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CPU rasterizer parallel performance

Uh oh!

Uh oh!

antonfirsov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antonfirsov Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonfirsov Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antonfirsov Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonfirsov Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

JimBobSquarePants commented Mar 1, 2026 •

edited

Loading

The new model: `DrawingCanvas`

Via `Image.Mutate()` (most common)

Standalone usage (without `Image.Mutate`)

`IDrawingBackend` — bring your own renderer

Single vs multi-threaded runs of `FillTiger`

antonfirsov commented Apr 15, 2026 •

edited

Loading

antonfirsov commented Apr 15, 2026 •

edited

Loading

JimBobSquarePants commented Apr 15, 2026 •

edited

Loading

antonfirsov commented Apr 15, 2026 •

edited

Loading

antonfirsov Apr 19, 2026 •

edited

Loading

antonfirsov Apr 19, 2026 •

edited

Loading

JimBobSquarePants Apr 19, 2026 •

edited

Loading

antonfirsov Apr 19, 2026 •

edited

Loading

antonfirsov Apr 19, 2026 •

edited

Loading

JimBobSquarePants Apr 20, 2026 •

edited

Loading

antonfirsov Apr 21, 2026 •

edited

Loading

JimBobSquarePants Apr 22, 2026 •

edited

Loading

antonfirsov Apr 21, 2026 •

edited

Loading

antonfirsov Apr 23, 2026 •

edited

Loading

antonfirsov left a comment •

edited

Loading

antonfirsov Apr 23, 2026 •

edited

Loading