Mahabharata

Welcome

The Mahābhārata is one of the world’s longest and most influential epics, composed in Sanskrit over two millennia ago. It has been translated many times, but to my knowledge, there exist only three Sanskrit-English translations of the complete Mahābhārata corpus. In chronological order, they are:

K. M Ganguli's Mahabharata (1883-1896): Ganguli was the first one to translate the Mahabharata along with publication partner Pratapa Chandra Roy. The story behind the translation project is worth reading. This translation was digitized(scanned and proofread) by the good guys at ISTA in/before 2006 (link to ISTA/Mahabharata). At that time, this was the only complete translation available in public domain.
M. N. Dutt's Mahabharata (1895-1905): M. N. Dutt was a prolific translator of the late 18th, early 19th Century with his works including translations of Ramayana, Mahabharata and several Puranas. His translation, like that of Ganguli is complete, but differs from Ganguli's wherein the translations are provided on a verse-by-verse level as opposed to Ganguli's chapter-by-chapter level. A scanned copy of his work in the original format is available here. Later, a format of his book with the Sanskrit verses and relevant English text side by side was published, a scanned copy of which is available here. M. N. Dutt's translation too is in public domain.
Bibek Debroy's Mahabharata (2010-2015): The most recent complete translation of the Mahabharata, works with Sanskrit text from BORI Critical Edition. Debroy's translation is split across 10 volumes covering all 18 books of Mahabharata. Being the only one of these three translations that I've read in detail, I've found the translations to be as close to the underlying sanskrit text as possible. However, the translations are not provided on verse-level, which can be mapped to the underlying sanskrit verses. Debroy's works are not available in public domain, however a scanned copy of the his Mahabharata can be found here.

I recently came across Itihāsa Sanskrit-English Translation Corpus while searching for a scripture-centric dataset to finetune AI4Bharat's IndicTrans2 on Sanskrit-English translation task. This immediately piqued my interest as up till then I had no knowledge of any such fine-grained mapping existing. For years, Shreevatsa's Mahabharata site (link) has been my primary reference for reading/looking up verses from the Mahabharata. It hosts a chapter-wise parallel data corpus of the CE Sanskrit verses, corresponding K. M. Ganguli's chapters, and slices of the scanned PDF of M. N. Dutt's translation. However, the chapters not being split into verses along with their corresponding translations was not optimal for my needs. I've always wanted to see the Mahabharata translations on a verse-level. With the BORI CE data available here, and the Itihāsa dataset, I got to work on creating a triplet-mapping of BORI CE Sanskrit verses, Dutt's Sanskrit Verses, and Dutt's Translations. My intention was to put the CE Sanskrit verses and the relevant OCR-ized (non-proofread) Dutt's Sanskrit verses side by side for comparison between versions, and provide Dutt's English translation of the relevant section as an addition.

The problem turned out to be more complex that I initially thought, with the texts indicating different relational mappings between two individual verses, parts of verses(individual pādas), or even collections of verses. I eventually figured it out to the best of my abilities, and the mappings are provided here for the viewer's perusal. In case of any mistakes in the data, please sent a mail to srvklkrn (at) gmail (dot) com

TO-DOs

Highlight corresponding lines/sub-lines in matching BORI CE and DUTT verses. Done.
Fill in verses from DUTT that don't have a corresponding match in BORI. Done.
Provide DCS word decomposition support.