Hacker News new | past | comments | ask | show | jobs | submit login
Art Gallery Datasets (artnome.com)
78 points by benleb on Sept 7, 2021 | hide | past | favorite | 12 comments



Difficult to understand how to extract artist information reliably. This is a flat data set so to do a proper ETL into a relational database you'll need to normalize as you go. But each piece of information about the artists is stored in a separate array, and the arrays are not all of the same size. So assuming you can rely on the arrays all being ordered the same (dubious assumption) there's still no way to map reliably.

For example, how do I create Artist records with the correct bios assigned to each artist from this record? (Nationality is easy, if the sorting assumption holds).

  {
  "Title": "Zhar-ptitsa, nos. 1-14",
  "Artist": [
    "Léon Bakst",
    "Ivan Bilibin",
    "Leonid Brailovskii",
    "Sergei Chekhonin",
    "L. E. Chirikov",
    "Natalia Goncharova",
    "Boris Grigor'ev",
    "Boris Kustodiev",
    "Mikhail Larionov",
    "Georgi Shlikht"
  ],
  "ConstituentID": [
    300,
    23753,
    23754,
    14602,
    23755,
    2229,
    14632,
    23756,
    3389,
    23757
  ],
  "ArtistBio": [
    "Russian, 1866–1924",
    "Russian, 1881–1962",
    "Russian, 1881–1964"
  ],
  "Nationality": [
    "Russian",
    "",
    "",
    "",
    "",
    "Russian",
    "",
    "",
    "Russian"
  ],


In this case, each ConstituentID in the Artworks.json records corresponds to an artist in the Artists.json records. I would cross-reference these two datasets and discard "Artist", "ArtistBio", etc from Artworks.json.


Somewhat related and relevant is Artsy's Genome Project[0] which is basically a classification system of (all?) artworks on the Artsy platform which I think includes those not for sale. They publish the full list of "genes" on their github[1] and they have a public API where you can query for artworks by gene[2].

Disclaimer - a friend works there.

[0]: https://www.artsy.net/categories

[1]: https://github.com/artsy/the-art-genome-project

[2]: https://developers.artsy.net/v2


I’ve created some art datasets in addition to those presented:

Bob Ross Paintings: https://github.com/jwilber/Bob_Ross_Paintings

USDA Pomological Watercolors (pretty paintings of fruits): https://github.com/jwilber/USDA_Pomological_Watercolors


What kind of «digital humanities» research has been, or could be done, based on one or more of these datasets?


Hmm, they need the GBIF (https://www.gbif.org/) of these.


How weird that they DO NOT include images.


Can anyone recommend art data sets like this, but with images?

I am trying to compile as many as I can. So far I have these:

* https://www.ianvisits.co.uk/blog/2021/01/21/over-700000-pain...

* https://www.artic.edu/articles/902/public-access-to-our-publ...

* https://artvee.com/


I think it's an academic-specific derangement. They lose track of the difference between the symbol and the symbolized. So a wall-o-words is just as good as a gallery fulla paintings.


This is open data/metadata about art, not an attempt to digitize artworks. Also, the descriptions make it pretty clear that several of these institutions have made tens of thousands of digital images of artworks available.


Yes, we get that. Nevertheless an image or 2 would really tie it together.


As decoration, sure. I don't think it would really add anything to the information about the datasets on this page.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: