feat: dict based Hyphenation (#305)

## Summary

* Adds (optional) Hyphenation for English, French, German, Russian
languages

## Additional Context

* Included hyphenation dictionaries add approximately 280kb to the flash
usage (German alone takes 200kb)
* Trie encoded dictionaries are adopted from hypher project
(https://github.com/typst/hypher)
* Soft hyphens (and other explicit hyphens) take precedence over
dict-based hyphenation. Overall, the hyphenation rules are quite
aggressive, as I believe it makes more sense on our smaller screen.

---------

Co-authored-by: Dave Allie <dave@daveallie.com>
This commit is contained in:
Arthur Tazhitdinov
2026-01-19 17:56:26 +05:00
committed by GitHub
parent 5fef99c641
commit 8824c87490
40 changed files with 36465 additions and 52 deletions

32
test/run_hyphenation_eval.sh Executable file
View File

@@ -0,0 +1,32 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
BUILD_DIR="$ROOT_DIR/build/hyphenation_eval"
BINARY="$BUILD_DIR/HyphenationEvaluationTest"
mkdir -p "$BUILD_DIR"
SOURCES=(
"$ROOT_DIR/test/hyphenation_eval/HyphenationEvaluationTest.cpp"
"$ROOT_DIR/lib/Epub/Epub/hyphenation/Hyphenator.cpp"
"$ROOT_DIR/lib/Epub/Epub/hyphenation/LanguageRegistry.cpp"
"$ROOT_DIR/lib/Epub/Epub/hyphenation/LiangHyphenation.cpp"
"$ROOT_DIR/lib/Epub/Epub/hyphenation/HyphenationCommon.cpp"
"$ROOT_DIR/lib/Utf8/Utf8.cpp"
)
CXXFLAGS=(
-std=c++20
-O2
-Wall
-Wextra
-pedantic
-I"$ROOT_DIR"
-I"$ROOT_DIR/lib"
-I"$ROOT_DIR/lib/Utf8"
)
c++ "${CXXFLAGS[@]}" "${SOURCES[@]}" -o "$BINARY"
"$BINARY" "$@"