Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求为 Edge TTS 语音生成功能引入了强大的文本过滤机制。通过允许用户配置正则表达式,系统现在可以在将文本发送到 TTS 引擎之前,自动识别并移除大模型回复中常见的角色扮演动作描写或内心独白等干扰内容。这一改进显著提升了生成语音的质量和用户体验,使其听起来更加自然和流畅。 Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Hey - 我发现了 1 个问题,并留下了一些整体反馈:
- 考虑在
__init__中只编译一次正则表达式(例如self.filter_pattern = re.compile(...)),并在配置加载时进行校验,这样可以在正则无效时尽早失败,同时避免在每次调用get_audio时重复编译。 - 在
re.sub外面捕获宽泛的Exception会让配置错误(无效正则)和其他意外故障之间难以区分;更清晰的做法是显式捕获re.error并进行处理(例如禁用过滤或回退),同时让其他异常继续向上抛出。
给 AI 代理的提示
Please address the comments from this code review:
## Overall Comments
- Consider compiling the regular expression once in `__init__` (e.g., `self.filter_pattern = re.compile(...)`) and validating it at config load time so you fail fast on invalid patterns and avoid recompiling on every `get_audio` call.
- Catching a broad `Exception` around `re.sub` makes it harder to distinguish configuration errors (invalid regex) from other unexpected failures; it would be clearer to catch `re.error` explicitly and handle it (e.g., disable filtering or fall back) while letting other exceptions surface.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/edge_tts_source.py" line_range="50-56" />
<code_context>
async def get_audio(self, text: str) -> str:
+ if self.filter_regex:
+ try:
+ # 使用 re.sub 将匹配到的内容替换为空字符串
+ text = re.sub(self.filter_regex, "", text)
+ logger.debug(f"正则过滤后的文本: {text}")
+ except Exception as e:
+ logger.error(f"正则表达式执行错误: {e}")
+ if not text.strip():
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Catch a more specific exception type when applying the regex filter.
Using a broad `Exception` here can hide unrelated bugs and makes it harder to tell regex issues from other failures. Since only `re.sub` is risky, catch `re.error` instead so that only regex problems are handled and other exceptions propagate.
```python
try:
text = re.sub(self.filter_regex, "", text)
except re.error as e:
logger.error(f"正则表达式执行错误: {e} | pattern={self.filter_regex!r}")
```
This keeps the handler scoped to regex failures and avoids masking future errors in this block.
```suggestion
if self.filter_regex:
try:
# 使用 re.sub 将匹配到的内容替换为空字符串
text = re.sub(self.filter_regex, "", text)
logger.debug(f"正则过滤后的文本: {text}")
except re.error as e:
logger.error(
"正则表达式执行错误: %s | pattern=%r",
e,
self.filter_regex,
)
```
</issue_to_address>帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据反馈改进后续的评审。
Original comment in English
Hey - I've found 1 issue, and left some high level feedback:
- Consider compiling the regular expression once in
__init__(e.g.,self.filter_pattern = re.compile(...)) and validating it at config load time so you fail fast on invalid patterns and avoid recompiling on everyget_audiocall. - Catching a broad
Exceptionaroundre.submakes it harder to distinguish configuration errors (invalid regex) from other unexpected failures; it would be clearer to catchre.errorexplicitly and handle it (e.g., disable filtering or fall back) while letting other exceptions surface.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider compiling the regular expression once in `__init__` (e.g., `self.filter_pattern = re.compile(...)`) and validating it at config load time so you fail fast on invalid patterns and avoid recompiling on every `get_audio` call.
- Catching a broad `Exception` around `re.sub` makes it harder to distinguish configuration errors (invalid regex) from other unexpected failures; it would be clearer to catch `re.error` explicitly and handle it (e.g., disable filtering or fall back) while letting other exceptions surface.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/edge_tts_source.py" line_range="50-56" />
<code_context>
async def get_audio(self, text: str) -> str:
+ if self.filter_regex:
+ try:
+ # 使用 re.sub 将匹配到的内容替换为空字符串
+ text = re.sub(self.filter_regex, "", text)
+ logger.debug(f"正则过滤后的文本: {text}")
+ except Exception as e:
+ logger.error(f"正则表达式执行错误: {e}")
+ if not text.strip():
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Catch a more specific exception type when applying the regex filter.
Using a broad `Exception` here can hide unrelated bugs and makes it harder to tell regex issues from other failures. Since only `re.sub` is risky, catch `re.error` instead so that only regex problems are handled and other exceptions propagate.
```python
try:
text = re.sub(self.filter_regex, "", text)
except re.error as e:
logger.error(f"正则表达式执行错误: {e} | pattern={self.filter_regex!r}")
```
This keeps the handler scoped to regex failures and avoids masking future errors in this block.
```suggestion
if self.filter_regex:
try:
# 使用 re.sub 将匹配到的内容替换为空字符串
text = re.sub(self.filter_regex, "", text)
logger.debug(f"正则过滤后的文本: {text}")
except re.error as e:
logger.error(
"正则表达式执行错误: %s | pattern=%r",
e,
self.filter_regex,
)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This PR introduces a new feature to add regular expression filtering to Edge TTS, allowing users to remove unwanted text before speech generation. However, the implementation is vulnerable to Regular Expression Denial of Service (ReDoS) because it uses user-provided regular expressions directly in re.sub without validation or timeouts, which could allow a malicious regex to hang the application. Additionally, there is a performance overhead due to calling re.sub in the get_audio method on every invocation, and error handling could be improved. It is recommended to pre-compile the regular expression in __init__ for better performance and to handle invalid expressions early.
| if self.filter_regex: | ||
| try: | ||
| # 使用 re.sub 将匹配到的内容替换为空字符串 | ||
| text = re.sub(self.filter_regex, "", text) |
There was a problem hiding this comment.
The filter_regex configuration option uses user-provided regular expressions directly in re.sub without validation or timeout, making it vulnerable to Regular Expression Denial of Service (ReDoS). A malicious regex could cause the application to hang. To mitigate this, consider implementing validation for the user-provided regular expression, using a ReDoS-resistant regex engine, or implementing a timeout for the regex operation. Additionally, after pre-compiling the regular expression in __init__, the logic here can be simplified, removing the need to handle potential compilation errors on every call.
| self.pitch = provider_config.get("pitch") | ||
| self.timeout = provider_config.get("timeout", 30) | ||
|
|
||
| self.filter_regex = provider_config.get("filter_regex", "") |
There was a problem hiding this comment.
为了提高性能和实现更清晰的错误处理,建议在 __init__ 方法中预编译正则表达式。这样可以避免在每次调用 get_audio 时都重新编译,并且可以在服务启动时就捕获并记录无效的正则表达式,而不是在运行时才发现问题。
| self.filter_regex = provider_config.get("filter_regex", "") | |
| self.filter_regex_str = provider_config.get("filter_regex", "") | |
| self.compiled_filter_regex = None | |
| if self.filter_regex_str: | |
| try: | |
| self.compiled_filter_regex = re.compile(self.filter_regex_str) | |
| except re.error as e: | |
| logger.error(f"Edge TTS 的正则表达式 '{self.filter_regex_str}' 无效,过滤功能将被禁用: {e}") |
There was a problem hiding this comment.
except re.error确保了用户正则错误也能捕获报错,不会影响框架运行,对于tts 调用并不会非常频繁,是否预编译基本没有任何影响
采纳代码审查建议,将宽泛的 Exception 替换为更具体的 re.error Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
在使用 Edge TTS 生成语音时,默认情况下 TTS 会将生成的全部文字一并读出,如果大模型的回复中包含角色扮演的动作描写或内心独白(例如
(微笑着说)你好啊或*摸摸头*),严重影响语音的自然度和沉浸感。此 PR 增加了一个正则表达式过滤功能,允许用户在面板自定义过滤规则,在文本送入 TTS 引擎前将其精准剔除。
Modifications / 改动点
astrbot/core/config/default.pyfilter_regex字段(默认为空字符串"")。astrbot/core/provider/sources/edge_tts_source.pyre模块。ProviderEdgeTTS类的__init__方法中增加了对filter_regex配置项的读取。get_audio方法中,在执行 TTS 生成逻辑之前,加入re.sub处理,对传入的text进行正则匹配与替换空字符操作。已在本地完成完整流程测试。
在 Web 面板的 Edge TTS 配置中填入正则表达式(如:
\(.*?\)|\(.*?\))后,发送包含中英文括号的测试文本,生成的录音文件已成功跳过括号内的动作描写内容,且未触发异常报错。Checklist / 检查清单
requirements.txt和pyproject.toml文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations inrequirements.txtandpyproject.toml.Summary by Sourcery
在进行音频生成之前,为 Edge TTS 添加可配置的基于正则表达式的文本过滤功能。
新功能:
filter_regex配置选项,用于在合成前移除匹配到的文本片段。改进:
Original summary in English
Summary by Sourcery
Add configurable regular-expression-based text filtering to Edge TTS before audio generation.
New Features:
Enhancements: