This reminds me of a paper put out by a few researchers at Google Brain: `No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World` [0]
The take away was that existing open image datasets are biased toward western contexts (eg what a wedding looks like), leading to low performance when applied in non-western contexts.
https://tdil-dc.in/index.php?lang=en has good some good datasets. Only problem is you need to agree the t&s and fax the document to Delhi to get the access. They supposedly will send the DVD. I don't understand why they are doing. If Indian government wants to foster the research, they should put them in public domain.
Here are few more datasets which I am aware of and can be added to the list:
- Dataset for Indian National Rupee https://cvit.iiit.ac.in/research/projects/cvit-projects/curr...
- City-scale Road Audit using Deep Learning dataset https://cvit.iiit.ac.in/research/projects/cvit-projects/city...
- Multi domain corpus for sentimental analysis (Telugu dataset, download link isn't working probably email for download)