feat(website): add llms.txt support for LLM-friendly content#13932
feat(website): add llms.txt support for LLM-friendly content#13932
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Add support for generating llms.txt and .llms.md files for Quarto websites, providing LLM-friendly markdown versions of HTML pages. Features: - New `llms-txt: true` option in website config - Generates .llms.md companion files alongside HTML output - Creates llms.txt index file linking to all markdown pages - Converts HTML to clean markdown using Pandoc with Lua filter - Handles callouts (blockquotes with bold type markers) - Converts images to markdown syntax - Converts internal links from .html to .llms.md - Respects draft settings (excludes drafts from output) - Cleans listing pages (removes empty links, category badges) - Matches sitemap behavior for incremental builds New files: - src/project/types/website/website-llms.ts - src/resources/filters/llms/llms.lua Test coverage: - Basic file generation - Content conversion (callouts, code, tables, links) - Draft handling - Listing page cleanup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
20d8156 to
b5a829e
Compare
…tests - Add **/*.llms.md to projectHiddenIgnoreGlob() to prevent cascading renders of llms.txt companion files - Fix ensureLlmsTxt* test functions to use dirname(htmlFile) instead of treating file path as directory - Update llms-txt test files to use correct two-element array format for regex matches [matches, no_matches] - Add render-project: true where needed for llms.txt generation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tibility Use pathWithForwardSlashes() to ensure paths in llms.txt use forward slashes on all platforms. Also adds changelog entry for the llms-txt feature. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Overall, it looks good! I think this is a great way to do it. So I had a look at pkgdown implementation to compare:
For example Definition list are converted to bullet list. Probably because GFM does not support them
unless activated but is this really GFM syntax ? An example of their output: https://pkgdown.r-lib.org/llms.txt I am thinking among good ideas:
Just some ideas - I am sure we'll have more feedback when this will be tested. |
| return { | ||
| name: `File ${llmsFile} exists`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyPath(llmsFile); | ||
| return Promise.resolve(); | ||
| }, |
There was a problem hiding this comment.
We have fileExists() if we want to refactor and avoid duplication
Lines 228 to 236 in 6c8a9b1
| return { | ||
| name: `File ${llmsFile} does not exist`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyNoPath(llmsFile); | ||
| return Promise.resolve(); | ||
| }, |
There was a problem hiding this comment.
We have pathDoNotExists() if we want to reuse and avoid code duplication
Lines 238 to 246 in 6c8a9b1
| // Verify the llms.txt index file in a website output directory. | ||
| // Takes the HTML file path and looks for llms.txt in the same directory. | ||
| export const ensureLlmsTxtRegexMatches = ( | ||
| htmlFile: string, | ||
| matchesUntyped: (string | RegExp)[], | ||
| noMatchesUntyped?: (string | RegExp)[], | ||
| ): Verify => { | ||
| const llmsTxtPath = join(dirname(htmlFile), "llms.txt"); | ||
| return verifyFileRegexMatches(regexChecker, `Inspecting ${llmsTxtPath} for Regex matches`)(llmsTxtPath, matchesUntyped, noMatchesUntyped); | ||
| }; |
There was a problem hiding this comment.
This verify helper is to be used only for index.qmd or another .qmd test that will be at the root of the output dir right ?
If we need to have verify function that works on output-dir as input, it is just a matter of adding the function as special handling in smoke-all.test.ts and you could have
export const ensureLlmsTxtRegexMatches = (
outputDir: string,
matchesUntyped: (string | RegExp)[],
noMatchesUntyped?: (string | RegExp)[],
): Verify => {
const llmsTxtPath = join(outputDir, "llms.txt");
return verifyFileRegexMatches(regexChecker, `Inspecting ${llmsTxtPath} for Regex matches`)(llmsTxtPath, matchesUntyped, noMatchesUntyped);
};But I guess this is just a matter of being sure to use ensureLlmsTxtRegexMatches() only in compatible source document.
Just a thought while reviewing the new functions in verify.ts
| return { | ||
| name: `File ${llmsTxtPath} exists`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyPath(llmsTxtPath); | ||
| return Promise.resolve(); | ||
| }, | ||
| }; |
There was a problem hiding this comment.
Same - could be fileExists()
| return { | ||
| name: `File ${llmsTxtPath} does not exist`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyNoPath(llmsTxtPath); | ||
| return Promise.resolve(); | ||
| }, | ||
| }; |
There was a problem hiding this comment.
And same could be pathDoNotExists
There was a problem hiding this comment.
Sorry, but I think it's pathDoNotExists that needs to go; verifyNoPath has been used for a long time, see eg noSupportingFiles.
611098eb4f (Charles Teague 2021-06-04 14:44:01 -0400 1155) verifyNoPath(outputFile.supportPath);
|
I'll fix the refactorings but I don't want to keep iterating on features: I want to just ship this. |
Yes agreed. Sorry it wasn't clear. It was just improvement ideas we can do later. Except maybe the absolute url. All 'llms.txt' I have worked with recently are using absolute path, and even with that sometimes agent are guessing some url. So with only relative path, it needs to be clear to the agent what is the base url to use. I think using absolute path directly would help. |
Add support for generating llms.txt and .llms.md files for Quarto websites, providing LLM-friendly markdown versions of HTML pages.
Features:
llms-txt: trueoption in website configNew files:
Test coverage: