Hacker News new | past | comments | ask | show | jobs | submit login
Indigenous datasets: A listing to help AI perform better in India (factordaily.com)
67 points by LogicRiver on March 12, 2019 | hide | past | favorite | 4 comments



Glad to see multiple contributions from my college.

Here are few more datasets which I am aware of and can be added to the list:

- Dataset for Indian National Rupee https://cvit.iiit.ac.in/research/projects/cvit-projects/curr...

- City-scale Road Audit using Deep Learning dataset https://cvit.iiit.ac.in/research/projects/cvit-projects/city...

- Multi domain corpus for sentimental analysis (Telugu dataset, download link isn't working probably email for download)


There is also an India Driving Dataset: http://idd.insaan.iiit.ac.in/


This reminds me of a paper put out by a few researchers at Google Brain: `No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World` [0]

The take away was that existing open image datasets are biased toward western contexts (eg what a wedding looks like), leading to low performance when applied in non-western contexts.

[0] https://arxiv.org/abs/1711.08536


https://tdil-dc.in/index.php?lang=en has good some good datasets. Only problem is you need to agree the t&s and fax the document to Delhi to get the access. They supposedly will send the DVD. I don't understand why they are doing. If Indian government wants to foster the research, they should put them in public domain.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: