字体混淆反爬虫工具。通过重映射字体中的字形(glyph),使 HTML 中的文本与用户看到的内容不同,从而防止爬虫直接抓取页面文字。
支持混淆英文、数字及大部分 CJK(中日韩)字符,生成 TTF 和 WOFF2 格式的字体文件。
使用 Rust 实现,基于 Google fontations 项目进行字体解析和构建。
普通混淆:用户提供明文和阴书(长度相同),生成一个自定义字体,其中阴书字符的 Unicode 码位指向明文字符的字形。HTML 中写阴书文本,浏览器加载自定义字体后渲染出明文内容,爬虫只能读到阴书。
加强混淆:只需提供明文,自动使用 Unicode Private Use Area(U+E000-U+F8FF)的随机码位进行映射,并返回对应的 HTML entity 编码。爬虫无法通过 Unicode 表反查字符含义。
# 构建
cargo build --release
# 运行(默认监听 127.0.0.1:1323)
./target/release/font_obfuscator
# 自定义端口
PORT=8080 ./target/release/font_obfuscator| 变量 | 默认值 | 说明 |
|---|---|---|
PORT |
1323 |
服务监听端口 |
LISTEN_ADDR |
127.0.0.1 |
服务监听地址 |
BASE_FONT_PATH |
base-font/KaiGenGothicCN-Regular.ttf |
基础字体文件路径 |
健康检查,返回 it works。
普通混淆(明文 + 阴书)。生成的字体只包含映射的字符。
设置 keep_all: true 可保留原字体中的所有字符,仅替换指定字符的字形(#97)。
curl -X POST http://127.0.0.1:1323/api/encrypt \
-H 'Content-Type: application/json' \
-d '{
"plaintext": "真0123456789好",
"shadowtext": "假6982075431的",
"only_ttf": false,
"keep_all": false
}'响应:
{
"message": "success",
"hint": "",
"response": {
"base64ed": {
"ttf": "AAEAAAALAIAAAwA...",
"woff2": "d09GMgABAAAAA..."
}
}
}加强混淆(仅需明文)。
curl -X POST http://127.0.0.1:1323/api/encrypt-plus \
-H 'Content-Type: application/json' \
-d '{
"plaintext": "价格998元",
"only_ttf": false
}'响应:
{
"message": "success",
"hint": "",
"response": {
"base64ed": {
"ttf": "AAEAAAALAIAAAwA...",
"woff2": "d09GMgABAAAAA..."
},
"html_entities": {
"价": "",
"格": "",
"9": "",
"8": "",
"元": ""
}
}
}<style>
@font-face {
font-family: 'ObfuscatedFont';
src: url(data:font/woff2;base64,d09GMgABAAAAA...) format('woff2');
}
.protected {
font-family: 'ObfuscatedFont';
}
</style>
<!-- 普通混淆:HTML 中写阴书,用户看到明文 -->
<span class="protected">假6982075431的</span>
<!-- 用户看到的是:真0123456789好 -->
<!-- 加强混淆:HTML 中写 entity 编码 -->
<span class="protected"></span>
<!-- 用户看到的是:价格998元 -->A font obfuscation tool for anti-scraping. It remaps glyphs in a font so that the text in HTML differs from what users actually see, preventing crawlers from directly extracting page content.
Supports obfuscation of English, digits, and most CJK characters. Generates TTF and WOFF2 font files.
Built with Rust, using Google's fontations project for font parsing and building.
Basic obfuscation: Provide plaintext and shadow text (same length). A custom font is generated where shadow characters' Unicode codepoints map to plaintext characters' glyphs. The HTML contains shadow text, but the browser renders plaintext when the custom font is loaded. Crawlers can only read the shadow text.
Enhanced obfuscation: Only plaintext is needed. Random codepoints from Unicode Private Use Area (U+E000-U+F8FF) are used for mapping, and corresponding HTML entity codes are returned. Crawlers cannot reverse-lookup character meanings from Unicode tables.
# Build
cargo build --release
# Run (listens on 127.0.0.1:1323 by default)
./target/release/font_obfuscator
# Custom port
PORT=8080 ./target/release/font_obfuscator| Variable | Default | Description |
|---|---|---|
PORT |
1323 |
Server listening port |
LISTEN_ADDR |
127.0.0.1 |
Server listen address |
BASE_FONT_PATH |
base-font/KaiGenGothicCN-Regular.ttf |
Base font file path |
Health check, returns it works.
Basic obfuscation (plaintext + shadow text). The generated font only contains the mapped characters.
Set keep_all: true to preserve all characters from the original font, only replacing the specified character glyphs (#97).
curl -X POST http://127.0.0.1:1323/api/encrypt \
-H 'Content-Type: application/json' \
-d '{
"plaintext": "real0123456789content",
"shadowtext": "fake6982075431garbage",
"only_ttf": false,
"keep_all": false
}'Response:
{
"message": "success",
"hint": "",
"response": {
"base64ed": {
"ttf": "AAEAAAALAIAAAwA...",
"woff2": "d09GMgABAAAAA..."
}
}
}Enhanced obfuscation (plaintext only).
curl -X POST http://127.0.0.1:1323/api/encrypt-plus \
-H 'Content-Type: application/json' \
-d '{
"plaintext": "price998usd",
"only_ttf": false
}'Response:
{
"message": "success",
"hint": "",
"response": {
"base64ed": {
"ttf": "AAEAAAALAIAAAwA...",
"woff2": "d09GMgABAAAAA..."
},
"html_entities": {
"p": "",
"r": "",
"i": "",
"c": "",
"e": "",
"9": "",
"8": "",
"u": "",
"s": "",
"d": ""
}
}
}<style>
@font-face {
font-family: 'ObfuscatedFont';
src: url(data:font/woff2;base64,d09GMgABAAAAA...) format('woff2');
}
.protected {
font-family: 'ObfuscatedFont';
}
</style>
<!-- Basic: HTML contains shadow text, user sees plaintext -->
<span class="protected">fake6982075431garbage</span>
<!-- User sees: real0123456789content -->
<!-- Enhanced: HTML contains entity codes -->
<span class="protected"></span>
<!-- User sees: price998usd -->MIT