Art Gallery Datasets

mdoms · on Sept 7, 2021

Difficult to understand how to extract artist information reliably. This is a flat data set so to do a proper ETL into a relational database you'll need to normalize as you go. But each piece of information about the artists is stored in a separate array, and the arrays are not all of the same size. So assuming you can rely on the arrays all being ordered the same (dubious assumption) there's still no way to map reliably.

For example, how do I create Artist records with the correct bios assigned to each artist from this record? (Nationality is easy, if the sorting assumption holds).

  {
  "Title": "Zhar-ptitsa, nos. 1-14",
  "Artist": [
    "Léon Bakst",
    "Ivan Bilibin",
    "Leonid Brailovskii",
    "Sergei Chekhonin",
    "L. E. Chirikov",
    "Natalia Goncharova",
    "Boris Grigor'ev",
    "Boris Kustodiev",
    "Mikhail Larionov",
    "Georgi Shlikht"
  ],
  "ConstituentID": [
    300,
    23753,
    23754,
    14602,
    23755,
    2229,
    14632,
    23756,
    3389,
    23757
  ],
  "ArtistBio": [
    "Russian, 1866–1924",
    "Russian, 1881–1962",
    "Russian, 1881–1964"
  ],
  "Nationality": [
    "Russian",
    "",
    "",
    "",
    "",
    "Russian",
    "",
    "",
    "Russian"
  ],

joe5150 · on Sept 8, 2021

In this case, each ConstituentID in the Artworks.json records corresponds to an artist in the Artists.json records. I would cross-reference these two datasets and discard "Artist", "ArtistBio", etc from Artworks.json.

12ian34 · on Sept 7, 2021

Somewhat related and relevant is Artsy's Genome Project[0] which is basically a classification system of (all?) artworks on the Artsy platform which I think includes those not for sale. They publish the full list of "genes" on their github[1] and they have a public API where you can query for artworks by gene[2].

Disclaimer - a friend works there.

[0]: https://www.artsy.net/categories

[1]: https://github.com/artsy/the-art-genome-project

[2]: https://developers.artsy.net/v2

jwilber · on Sept 7, 2021

I’ve created some art datasets in addition to those presented:

Bob Ross Paintings: https://github.com/jwilber/Bob_Ross_Paintings

USDA Pomological Watercolors (pretty paintings of fruits): https://github.com/jwilber/USDA_Pomological_Watercolors

pletnes · on Sept 7, 2021

What kind of «digital humanities» research has been, or could be done, based on one or more of these datasets?

xipho · on Sept 7, 2021

Hmm, they need the GBIF (https://www.gbif.org/) of these.

fxtentacle · on Sept 7, 2021

How weird that they DO NOT include images.

gourneau · on Sept 7, 2021

Can anyone recommend art data sets like this, but with images?

I am trying to compile as many as I can. So far I have these:

* https://www.ianvisits.co.uk/blog/2021/01/21/over-700000-pain...

* https://www.artic.edu/articles/902/public-access-to-our-publ...

* https://artvee.com/

swayvil · on Sept 7, 2021

I think it's an academic-specific derangement. They lose track of the difference between the symbol and the symbolized. So a wall-o-words is just as good as a gallery fulla paintings.

joe5150 · on Sept 7, 2021

This is open data/metadata about art, not an attempt to digitize artworks. Also, the descriptions make it pretty clear that several of these institutions have made tens of thousands of digital images of artworks available.

swayvil · on Sept 7, 2021

Yes, we get that. Nevertheless an image or 2 would really tie it together.

joe5150 · on Sept 7, 2021

As decoration, sure. I don't think it would really add anything to the information about the datasets on this page.