Open sourcing Grid, the Guardian’s new image management service

jackgavigan · on Aug 12, 2015

This is a bit of a blast from the past for me.

> Unfortunately, the incumbent system was nearing end-of-life, having been around for over 15 years.

Back in 1999, I was a systems/support engineer at the vendor (Picdar) who supplied that incumbent system. I spent an evening onsite in the Guardian newsroom when the system went live. I recall there was a England (international soccer) game on that night, and we timed how long it took a picture of the first goal to land in the system (IIRC, Reuters won, with a picture that arrived something like 10-15 minutes after the goal was scored).

sdoering · on Aug 12, 2015

Anecdotes like this are the reason I keep returning to HN. They always shine a smile over my face.

Greetings from Germany.

jackgavigan · on Aug 12, 2015

Well, there's more where that came from...

I really enjoyed my time at Picdar. I joined the company in 1998 and it was my first exposure to proper enterprise systems. I got to play with Sun Microsystems servers, fibre-channel storage arrays and HP magneto-optical jukeboxes that could store >1TB (that was a lot in the late '90s).

I once went to a customer site to check a server that kept having problems and discovered that it was located at the downstream end of a row of big IBM boxes. The first IBM box sucked in air, spat it out the other side slightly warmer, where it got sucked in by the second IBM box, which warmed it up a little more, and so on until it eventually got blown out onto our poor little Sun server, like it had a fan heater pointed at it. We got them to move our server and the problems abated.

My boss (technical co-founder Andy Heather) was a proper hacker. Picdar had its own SCSI drivers for HP's jukeboxes, its own image format (based on lossless JPEG, if I recall correctly, but wrapped in a file format that included the photo's metadata), and they eschewed SQL in favour of their own in-house database and query language called NACAS (for Named Access Control And Search). I learnt a huge amount from him, largely because he had infinite patience.

For a long time, my answer to the question "What's the biggest mistake you've ever made?" was "I ftp'd a bunch of large files from a production system on top of themselves, resulting in each file getting truncated to 1024 bytes. The client hadn't kept backups." The files in question were tar-style media archives containing preview images for a major newspaper's photo archive. Andy and another developer had to stay in the office until about midnight to fix my fuck-up by re-generating the preview images. I went home at about 7pm because there was nothing I could do. In hindsight, I should have stayed. :-/

Picdar was the first opportunity I had to work with serious software engineers, one of whom introduced me to the Apple Human Interface Guidelines, which has shaped my approach to UI/UX ever since.

There was a framed cheque on the wall for >$1m from a large database company. Said database company had won a contract with one of the big UK newspaper groups to provide a system that indexed their article archive and allowed them to do free text searches. Unfortunately, it didn't work too well (they loaded the article archive in, and ran a test query; after half an hour with no response, they decided to switch the system off), so the newspaper group asked Picdar if they could provide an interim solution. They did, it worked (the same query took just 6 seconds), the "interim" solution became a permanent solution, and the newspaper group forced the large database company to pay Picdar for the solution.

One of the reasons they hired me was because I came from an Internet background. Part of my role early on was to write a strategy white paper outlining the opportunities the Internet offered for the company. I remember doing a lot of research into e-commerce and online payments (in the context of allowing our clients to licence their image archives on the web). I completely failed to recognise the opportunity to offer image hosting to the masses or the potential of online advertising. Picdar could have been Flickr/Picasa/ImageShack/500px. Deeply embarrassing for me, in hindsight.

I once deployed a demo system (running on Linux, with a RAID setup that was a complete bastard to set up because nobody had documented how the Linux RAID tools worked) at a big magazine company. For some reason, the technology guy's office was on the same floor as a bunch of fashion magazines. He was one of two men on the entire floor (there were about a hundred girls). His office was glass-walled and there was a constant stream of models walking past, on their way to castings and photoshoots. That was a memorable couple of days.

In late 1999, I was approached out of the blue and offered the CTO role at a funded startup. I agonised over leaving but it was an offer I couldn't refuse. The right decision in hindsight but it was a real wrench at the time.

sdoering · on Aug 20, 2015

Wow... ... lost for words. Thanks a lot! Not working that long. I started in publishing and self trained to pivot to data/web analytics. Nowadays I work at an digital agency crunching numbers, advising customers, writing white papers or trying to (objectively) benchmark different sites using the tools at hand.

But these "war" stories are reminding me of some people that showed me the first html, gave me the first *nix OS and made me enter the internet at 14.4...

This were the first proper hackers I encountered. I remember, one of them, being 15 at the time, working as the only server admin at a local isp, while still going to school. One night all our IRC bots just disappeared showing him, that something was really wrong at his workplace.

We were the only ones left awake and so I offered to fetch him (me being 19 and able to drive) and drive im to work. There I saw for the first time someone in parallel working on a routing server, compiling a kernel on a local machine and just for the sake of it bringing a local workstation up to date. Never had seen fingers flying that fast over three different keyboards, while I had my first experience on a 2mbit connection (the connection, that this provider had overall).

Well - that made me learn some more things in regards to computers and the net - but in hindsight, I could have made my way faster.

shark1 · on Aug 12, 2015

Same Idea of Globo (Brazilian TV conglomerate) https://github.com/thumbor/thumbor

About Grid, it seems the images are stored on FTP. Am I right?

I look for a solution like these, but more flexibe, where I could host my images on AWS, Azure, etc.

theefer · on Aug 12, 2015

It has some similarities with Thumbor (incl. cropping and resizing assets, etc), but the two are quite different. Thumbor goes a bit further than Grid with features like face recognition, etc, which we are looking at for the future. Thumbor also supports dynamic resizing, whereas we prefer to generate static assets and use external services (ImgIx in our case) to handle dynamic resizing and optimisations.

Grid is more of a media management service, allowing quick search of indexed metadata, organisation and collaboration using labels, quick upload into the system, rights management, etc. It also has extensive APIs to allow various degrees of integration.

Grid stores images in S3 (could be abstracted to any storage system with a small amount of effort). FTP is only used as a source for ingesting images into the system, and we're looking to scrape that and replace with S3-based ingestion in the near future.

- Séb, Lead developer on Grid

UserRights · on Aug 12, 2015

Thank you very much for open sourcing this powerful tool, very nice. I checked some open source DAM tools some time ago and from what I can see from the linked page this will easily be in the top 5.

Quick questions:

* are there some video features (like kaltura or youtube simple editing)?

* how does "Indesign integration" exactly work?

* is it possible to use local storage, not Amazon services?

* is there some ansible / salt / chef / whatever installation scripting available?

Also one special consideration, none of the tools I checked, no picture gallery and no web service that hosts pictures offers this simple but very important feature:

* picture approval workflow: please allow to upload pictures, show input fields for email(s) of person(s) visible on the picture and send them a link to a page where they can allow to publish the picture. This could be a (customizable) form with some specific text or some more details, however the user should be able to simply agree publishing the picture. Save the user agreement status in the datastore.

Every software / web service allowing to publish pictures online should have such a feature - the fact that this is not available anywhere shows that we are still in the very beginnings of the internets. One day, certainly, humans will laugh about the "non-privacy-by-default"-internetz of the early days...

Yes, I know, as a journalist usually you have rights cleared material, however this functionality would still be a great step forward, also for pro material a good rights management workflow would be very important to have. How to implement this, is there some developer documentation / plugin system available?

theefer · on Aug 12, 2015

> are there some video features (like kaltura or youtube simple editing)?

Not at this stage, no. We're currently focusing on images.

> how does "Indesign integration" exactly work?

InDesign integration works by drag and dropping an image into InDesign. Associated metadata, including the canonical ID, are then read by an InDesign plugin so we can store a reference to the original asset.

Other integrations with other web-based editorial tools in our suite also use drag and drop (+ drag metadata). It's also possible to copy paste URLs.

> is it possible to use local storage, not Amazon services?

Not out of the box, but there is no reason why it couldn't be done with a few changes.

> is there some ansible / salt / chef / whatever installation scripting available?

I'm afraid not. We mostly use CloudFormation to spin up our infrastructure and inject the relevant configuration on each service. There is a lightweight CF script for development purposes in the Grid repo, but our main CF script is currently in a private repository.

> picture approval workflow

We had discussions with our picture desk about the level of permissions desired. Currently, we prefer the approach of keeping publishing open, so as not to introduce tedious barriers and slow down publishing (e.g. in case of breaking news, etc). There is a balance between the cost of publishing the wrong thing (e.g. not rights cleared) vs the missed value of publishing too late.

To balance the openness, we are focusing on providing high degrees of visibility to our picture editors, e.g. a feed of all images about to be published that aren't fully rights cleared. We also work really hard to get the information properly recorded (ideally automated) so the rights information is accurate about whether a picture can be used or not.

At the end of the day, we want to empower desks with the tools to come up with their own workflows and ways to limit risks associated with allowing all (or most) people to publish content, rather than imposing a rigid workflow.

- Séb, lead developer on Grid

ndufresne · on Aug 12, 2015

I was actually just looking for ways to place images from a DAM into indesign to track the original source. I'm curious - is the plugin you use part of the project, 3rd party or closed source? I took a quick look through the github repo, but i wasn't sure if I missed it. Thanks!

UserRights · on Aug 13, 2015

yes, custom workflows would be the best solution, very good direction to go!

foolinaround · on Aug 12, 2015

What are Grid's similarities and differences over a Digital Asset Management tool ( DAM )?

theefer · on Aug 12, 2015

As other replies have said, this is functionally a DAM.

It is particularly adapted to the requirements of publishers, in that it supports large number of images (we have over 3M currently), can scale to ingest many new images continuously quickly (publishers often receive lots of images from agencies and wires), indexes all the metadata to power a very fast search, allows collaboration of various roles involved in the use of assets and production of content, etc.

Unlike most commercial DAMs, which can be quite costly to run and acquire, Grid is also Open Source. We didn't find any existing DAM (incl Open Source) that fit our requirements, in particular in terms of ease and speed of use, powerful Web-based interface, rich APIs, etc.

You will have to review alternatives to know which one is the best fit for your use case.

Hope this helps clarify what Grid is wrt other DAMs.

Best,

- Séb, lead developer on Grid

jackgavigan · on Aug 12, 2015

To what extent did the old Picdar solution influence the design of Grid?

theefer · on Aug 13, 2015

As you can probably see yourself from the screenshot and video, not much.

There may be some subconscious influence from the old system and other image systems we have used (Lightroom, Picasa, Google Photos, Flickr, etc), but I can't think of particular features inspired from Picdar.

napoleond · on Aug 12, 2015

It looks to me like it is a DAM, just... a lot better than the alternatives (at least as of ~5 years ago when I evaluated options and ended up with a smaller-scale in-house version of this).

EDIT: Better in the sense of discoverability, at least.

UserRights · on Aug 12, 2015

You will need a lot of time to evaluate DAM solutions, but I recommend putting this quite on top of your list of candidates.

Here are some more to spend the winter with:

http://www.opensourcedigitalassetmanagement.org/

masom · on Aug 12, 2015

Thumbor let's you store and retrieve your images anywhere. We use AWS S3 with a redis storage cache + another redis cluster to store the result cache.

Daviey · on Aug 13, 2015

My assumption was that the photographer uploaded via FTP, but were stored by the provider in S3.

lefthandme · on Aug 13, 2015

I've recently started using gridFS for a proof-of-concept project (mainly because I wanted to leaarn more about it), and was wondering about your choice of S3 over something like this (or an equivalent).

In my case, were this project to go into production, I'd probably recommend using S3. Given you have 8TB of assets, I was wondering if, at that scale, S3 still represents a lower TCO, or if you had other reasons for using it? Obviously you are using Dynamo for your metadata and SNS/SQS, and hence have a pretty AWS-oriented stack to begin with.

Anyway, just asking out of interest.

telekid · on Aug 13, 2015

Here's a related and similarly interesting look into the New York Times' CMS, which includes some clever image management tools:

http://open.blogs.nytimes.com/2014/06/17/scoop-a-glimpse-int...

pjmlp · on Aug 13, 2015

For a split second I thought about CERN´s Grid:

https://en.wikipedia.org/wiki/Worldwide_LHC_Computing_Grid

Don't you love everyone using the same names?

jamesgorrie · on Aug 13, 2015

Strangely ours had more to do with Tron & the ascetic than it did any sort of computing configuration.

One of the picture editors on it claimed, "it looks a bit like the Grid from Tron" - and it stuck.

batoure · on Aug 12, 2015

[fastly]

oh that box, thats the global cdn technology that makes this entire stack possible. Lets not mention it or discuss how these types of technologies have made it possible for us to fundamentally change how we do business.

junto · on Aug 13, 2015

To be fair they are talking about the technology they have developed themselves and are now outsourcing.

They are using Fastly as a paid service, so why should they talk about it? It's just a building block of their delivery solution.

It isn't like they discuss the web server software they use either.

h1fra · on Aug 12, 2015

This kind of article are very interesting. Have a glimpse of what big player do for their backend can inspire many of us. I'm really enjoying reading these blog post

Also this product look very cool and well thought