fix(rss,zip): fix UnboundLocalError in RssConverter and surface ZipConverter failures#1786
Open
AKIB473 wants to merge 1 commit intomicrosoft:mainfrom
Open
Conversation
…nverter failures RssConverter: - Initialize md_text to '' before conditional assignment to prevent UnboundLocalError when a channel element has no <title> node - Pass channel description through _parse_content() for consistent HTML-to-markdown conversion (item descriptions already did this) ZipConverter: - Surface FileConversionException as a markdown warning block instead of silently swallowing it, so users know which files were skipped Add tests for all three fixes in test_rss_zip_fixes.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two bug fixes in existing converters, each with tests.
Fix 1 —
RssConverter:UnboundLocalErrorwhen channel has no<title>Bug: If an RSS
<channel>element has no<title>child,md_textis never initialised but thenmd_text +=is attempted for the description, raisingUnboundLocalError: local variable 'md_text' referenced before assignment.Fix: Initialise
md_text = ''unconditionally before the conditional blocks.Bonus fix in same area: The channel
<description>was written raw into the output, while item descriptions already go through_parse_content()for HTML-to-markdown conversion. Applied the same treatment to channel description for consistency.Fix 2 —
ZipConverter:FileConversionExceptionsilently swallowedBug: When a file inside a ZIP archive cannot be converted, the
FileConversionExceptionis caught and silently discarded (pass). Users receive partial output with no indication that files were skipped.Fix: Include a markdown warning block in the output so users know what was skipped and why:
Tests
Three new tests added in
packages/markitdown/tests/test_rss_zip_fixes.py:test_rss_no_title_does_not_raise— RSS feed with no<title>must not raisetest_rss_channel_description_html_is_cleaned— HTML entities in channel description converted cleanlytest_zip_conversion_failure_surfaced_in_output— unconvertible ZIP contents appear as warningsAll 3 pass locally.