About MonOCR

The Mon language (mnw) is my native language. Today, its use is slowly declining and UNESCO classifies it as vulnerable.

One of the biggest challenges is that Mon has very little digital presence. There are almost no public datasets, which makes it difficult to build modern tools like search systems, language models, or AI applications. Most of our written knowledge still lives in scanned books and old documents.

As part of a long-term effort to help bring Mon into the digital world, I built a CRNN-based OCR engine to recognize Mon script. The goal is simple: extract clean, usable text from scanned materials so we can start building datasets and make further development possible.

This is just one small step, but I hope Mon OCR can help turn our printed history into digital text — and open the door for future tools, research, and AI integrations.