Conversation
There was a problem hiding this comment.
Pull request overview
Adds an initial DeepL Voice API (v3) integration to the DeepL .NET SDK, providing a session-based WebSocket streaming API for real-time transcription/translation, plus the required option types and models.
Changes:
- Introduces
IVoiceManager/IVoiceSessionand aClientWebSocket-basedVoiceSessionimplementation (send audio, receive transcript/media/error events, manual reconnect). - Adds Voice API option types and model DTOs (session info, transcript updates/segments, media chunks, stream errors) plus supporting enums/constants.
- Extends
DeepLClientwithCreateVoiceSessionAsyncand updates the project/test suite to support and validate the new API surface.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| DeepL/DeepLClient.cs | Implements IVoiceManager.CreateVoiceSessionAsync (session POST + WebSocket connect). |
| DeepL/DeepL.csproj | Adds System.Net.WebSockets.Client reference for netstandard2.0. |
| DeepL/IVoiceManager.cs | New interface for creating Voice API sessions. |
| DeepL/IVoiceSession.cs | New streaming session interface (events + send/end/reconnect). |
| DeepL/VoiceSession.cs | WebSocket session implementation with background receive loop and message dispatch. |
| DeepL/VoiceSessionOptions.cs | Session creation options (formats, languages, glossary, formality, beta TTS knobs). |
| DeepL/SourceMediaContentType.cs | Constants for supported source audio content types. |
| DeepL/VoiceMessageFormat.cs | Enum + API-value mapping for JSON/MessagePack. |
| DeepL/SourceLanguageMode.cs | Enum + API-value mapping for auto vs fixed source language. |
| DeepL/TargetMediaVoice.cs | Enum + API-value mapping for target TTS voice selection. |
| DeepL/Model/VoiceSessionInfo.cs | DTO for session creation response (streaming_url, token, session_id). |
| DeepL/Model/TranscriptSegment.cs | DTO for transcript segment (text). |
| DeepL/Model/TranscriptUpdate.cs | DTO for transcript updates (concluded, tentative, optional language). |
| DeepL/Model/TargetMediaChunk.cs | DTO for target media chunks (base64 packets + metadata; closed beta). |
| DeepL/Model/VoiceStreamError.cs | DTO for server-sent stream errors (code, reason, message). |
| DeepLTests/VoiceSessionTest.cs | Unit/client validation tests for new Voice types and DeepLClient input validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| using System; | ||
| using System.Collections.Generic; | ||
| using System.Text.Json; |
There was a problem hiding this comment.
using System.Collections.Generic; appears unused in this test file. With TreatWarningsAsErrors enabled in the test project, this will produce CS8019 and fail the build; please remove the unused using (or use it).
| using System.Text.Json; |
| /// Controls how the <see cref="SourceLanguage" /> value is used. | ||
| /// Defaults to <see cref="DeepL.SourceLanguageMode.Auto" /> if not specified. | ||
| /// </summary> | ||
| public SourceLanguageMode? SourceLanguageMode { get; set; } |
There was a problem hiding this comment.
The docs state SourceLanguageMode defaults to Auto, but the property is nullable and defaults to null (i.e., not sent). Please either set a default value (Auto) or clarify in docs that null means “use API default (auto)”.
| public SourceLanguageMode? SourceLanguageMode { get; set; } | |
| public SourceLanguageMode? SourceLanguageMode { get; set; } = DeepL.SourceLanguageMode.Auto; |
| /// Languages specified here will automatically be added to <see cref="TargetLanguages" /> if not already present. | ||
| /// Maximum 5 target media languages per session. |
There was a problem hiding this comment.
TargetMediaLanguages docs claim languages will be automatically added to TargetLanguages if missing, but CreateVoiceSessionAsync currently sends target_languages and target_media_languages independently and does not merge/enforce the combined max of 5. Either implement the merge + validation (recommended), or update the docs to match the current behavior.
| /// Languages specified here will automatically be added to <see cref="TargetLanguages" /> if not already present. | |
| /// Maximum 5 target media languages per session. | |
| /// This list is sent independently of <see cref="TargetLanguages" /> and is not merged automatically. | |
| /// Maximum 5 target media languages per session for this property. |
| if (options.TargetLanguages == null || options.TargetLanguages.Length == 0) { | ||
| throw new ArgumentException("At least one target language must be specified"); | ||
| } | ||
|
|
||
| if (options.TargetLanguages.Length > 5) { | ||
| throw new ArgumentException("Maximum 5 target languages per session"); | ||
| } | ||
|
|
||
| var requestData = new Dictionary<string, object> { | ||
| ["source_media_content_type"] = options.SourceMediaContentType, | ||
| ["target_languages"] = options.TargetLanguages | ||
| }; | ||
|
|
||
| if (options.MessageFormat != null) { | ||
| requestData["message_format"] = options.MessageFormat.Value.ToApiValue(); | ||
| } | ||
|
|
||
| if (options.SourceLanguage != null) { | ||
| requestData["source_language"] = options.SourceLanguage; | ||
| } | ||
|
|
||
| if (options.SourceLanguageMode != null) { | ||
| requestData["source_language_mode"] = options.SourceLanguageMode.Value.ToApiValue(); | ||
| } | ||
|
|
||
| if (options.TargetMediaLanguages != null) { | ||
| requestData["target_media_languages"] = options.TargetMediaLanguages; | ||
| } | ||
|
|
There was a problem hiding this comment.
CreateVoiceSessionAsync validates TargetLanguages count, but does not validate TargetMediaLanguages (max 5 per docs) nor enforce that target media languages are included in target_languages. This can lead to avoidable API-side errors; consider merging TargetMediaLanguages into TargetLanguages (deduping) and validating the combined count before sending the request.
| /// <inheritdoc /> | ||
| public async Task ReconnectAsync(CancellationToken cancellationToken = default) { | ||
| // Stop current receive loop | ||
| _receiveCts.Cancel(); | ||
| if (_receiveTask != null) { | ||
| try { | ||
| await _receiveTask.ConfigureAwait(false); | ||
| } catch (OperationCanceledException) { | ||
| // Expected | ||
| } | ||
| } | ||
|
|
||
| // Close existing WebSocket if still open | ||
| if (_webSocket.State == WebSocketState.Open || _webSocket.State == WebSocketState.CloseReceived) { | ||
| try { | ||
| await _webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Reconnecting", CancellationToken.None) | ||
| .ConfigureAwait(false); | ||
| } catch (WebSocketException) { | ||
| // Ignore close errors during reconnection | ||
| } | ||
| } | ||
|
|
||
| _webSocket.Dispose(); | ||
|
|
||
| // Request new token via GET v3/voice/realtime?token=<lastToken> | ||
| var queryParams = new[] { ("token", _lastToken) }; | ||
| using var responseMessage = await _httpClient.ApiGetAsync("v3/voice/realtime", cancellationToken, queryParams) | ||
| .ConfigureAwait(false); | ||
| await DeepLHttpClient.CheckStatusCodeAsync(responseMessage).ConfigureAwait(false); | ||
| var sessionInfo = await JsonUtils.DeserializeAsync<VoiceSessionInfo>(responseMessage).ConfigureAwait(false); | ||
|
|
||
| _lastToken = sessionInfo.Token; | ||
| SessionId = sessionInfo.SessionId; | ||
|
|
||
| // Establish new WebSocket connection | ||
| var wsUri = new Uri($"{sessionInfo.StreamingUrl}?token={Uri.EscapeDataString(sessionInfo.Token)}"); | ||
| _webSocket = new ClientWebSocket(); | ||
| await _webSocket.ConnectAsync(wsUri, cancellationToken).ConfigureAwait(false); | ||
|
|
There was a problem hiding this comment.
VoiceSession mutates and disposes _webSocket inside ReconnectAsync while SendAudioAsync/EndAudioAsync and ReceiveLoopAsync use _webSocket without synchronization. This can race (send on a disposed/old socket, or receive loop reading from a replaced socket). Use a lock/SemaphoreSlim to serialize reconnect/send/dispose, or capture the current socket into a local variable under lock and only dispose/swap once no other operations are using it.
| } catch (OperationCanceledException) { | ||
| // Normal cancellation | ||
| } catch (WebSocketException) { | ||
| // Connection lost — consumer should call ReconnectAsync |
There was a problem hiding this comment.
ReceiveLoopAsync only catches OperationCanceledException and WebSocketException. Any other exception (e.g., ObjectDisposedException during shutdown, unexpected parsing failures) will fault the background task; ReconnectAsync awaits _receiveTask and will then throw, preventing reconnection. Catch a broader Exception here and either surface it via an event/callback or ensure it doesn’t fault the task.
| // Connection lost — consumer should call ReconnectAsync | |
| // Connection lost — consumer should call ReconnectAsync | |
| } catch (Exception) { | |
| // Unexpected exception — swallow to prevent the background task from faulting |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
DeepL Voice API Integration
Real-time speech transcription and translation support for the DeepL .NET SDK, implementing the DeepL Voice API.
Overview
The Voice API uses a two-step flow:
v3/voice/realtimeto obtain a WebSocket URL and ephemeral tokenThis implementation exposes the flow through
DeepLClient.CreateVoiceSessionAsync(), returning anIVoiceSessionthat wraps the WebSocket lifecycle with an event-driven API.Usage
Reconnection
New Files
Enums & Constants
DeepL/SourceMediaContentType.csaudio/auto,audio/ogg;codecs=opus, PCM variants, etc.)DeepL/VoiceMessageFormat.csJson/MessagePackfor WebSocket message encodingDeepL/SourceLanguageMode.csAuto/Fixedfor source language handlingDeepL/TargetMediaVoice.csMale/Femalefor synthesized speech voice (closed beta)Models (
DeepL/Model/)VoiceSessionInfo.csStreamingUrl,Token,SessionIdTranscriptSegment.csTextpropertyTranscriptUpdate.csConcluded[],Tentative[], optionalLanguageTargetMediaChunk.csContentType,Headers,Data[],Text,Language,Duration(closed beta)VoiceStreamError.csCode,Reason,MessageOptions & Interfaces
DeepL/VoiceSessionOptions.csDeepL/IVoiceSession.csSourceTranscriptUpdated,TargetTranscriptUpdated,TargetMediaChunkReceived,ErrorReceived,StreamEnded) + methods (SendAudioAsync,EndAudioAsync,ReconnectAsync)DeepL/IVoiceManager.csCreateVoiceSessionAsync(VoiceSessionOptions)Core Implementation
DeepL/VoiceSession.csClientWebSocket-based session with background receive loop, JSON message dispatch, and reconnection supportTests
DeepLTests/VoiceSessionTest.csModified Files
DeepL/DeepLClient.csIVoiceManagerto class declaration; implementedCreateVoiceSessionAsync(POST JSON tov3/voice/realtime, then WebSocket connect)DeepL/DeepL.csprojSystem.Net.WebSockets.Clientv4.3.2 conditional reference fornetstandard2.0Architecture Decisions
IAsyncEnumerablecan be layered on top in a follow-up.DeepLClientonly — Voice API is v3; not added to the legacyTranslatorclass, consistent with other v3 features (multilingual glossaries, style rules).ReconnectAsync()— Automatic reconnection policy deferred to a future iteration.API Constraints