echo -e "\xe1\x9e\x81\xe1\x9f\x92\xe1\x9e\x93\xe1\x9f\x89\xe1\x9e\xbb\xe1\x9e\x84"
Version: 1.0
Last updated: 2025
License: Free to share and adapt linux khmer pdf
For full documentation, see: [https://github.com/khmer-unicode/linux-khmer-guide] Write-Up: Linux and Khmer Language Support for PDF
For scanned Khmer documents, standard pdftotext will fail. Use Tesseract OCR with the Khmer language pack: Version: 1
sudo apt install tesseract-ocr tesseract-ocr-khm
# Convert PDF to image, then OCR
pdftoppm scanned.pdf page -png
tesseract page-1.png output -l khm
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -o output.pdf input.pdf
Okular has superior support for complex text layout (CTL) out of the box.
sudo apt install okular # Ubuntu/Debian
Open your Khmer PDF, go to Settings > Configure Backend, and ensure "Use smooth text" is disabled for complex scripts.