Categories
Technology

Stemming transliterated Hindi

I needed a library which could stem Hindi words written in roman script (transliterated), but could not find one. My search took me to Lucene’s HindiStemmer, which in turn led me to the paper by Ananthakrishnan Ramanathan and Durgesh D Rao: A Lightweight Stemmer for Hindi [PDF]. It was a good initiation to how some simple rules could stem most Hindi words. The problem was, it was for words written in Devanagari script not Roman.