Mahābhārata

Welcome

The Mahābhārata is one of the world’s longest and most influential epics, composed in Sanskrit over two millennia ago. It has been translated many times, but to my knowledge, there exist only three Sanskrit-English translations of the complete Mahābhārata corpus. In chronological order, they are:

I recently came across Itihāsa Sanskrit-English Translation Corpus while searching for a scripture-centric dataset to finetune AI4Bharat's IndicTrans2 on Sanskrit-English translation task. This immediately piqued my interest as up till then I had no knowledge of any such fine-grained mapping existing. For years, Shreevatsa's Mahabharata site (link) has been my primary reference for reading/looking up verses from the Mahabharata. It hosts a chapter-wise parallel data corpus of the CE Sanskrit verses, corresponding K. M. Ganguli's chapters, and slices of the scanned PDF of M. N. Dutt's translation. However, the chapters not being split into verses along with their corresponding translations was not optimal for my needs. I've always wanted to see the Mahabharata translations on a verse-level. With the BORI CE data available here, and the Itihāsa dataset, I got to work on creating a triplet-mapping of BORI CE Sanskrit verses, Dutt's Sanskrit Verses, and Dutt's Translations. My intention was to put the CE Sanskrit verses and the relevant OCR-ized (non-proofread) Dutt's Sanskrit verses side by side for comparison between versions, and provide Dutt's English translation of the relevant section as an addition.

The problem turned out to be more complex that I initially thought, with the texts indicating different relational mappings between two individual verses, parts of verses(individual pādas), or even collections of verses. I eventually figured it out to the best of my abilities, and the mappings are provided here for the viewer's perusal. In case of any mistakes in the data, please sent a mail to srvklkrn (at) gmail (dot) com


TO-DOs