Oh, that's a fun story. We use https://github.com/google/mtail to turn log data ...

Oh, that's a fun story. We use https://github.com/google/mtail to turn log data into metrics exported for prometheus. We were already doing this for our haproxy logs; we just added a stanza to capture a secondary indicator that the problem is occurring (termination_state 'SD', and 0 bytes_read), and export that as a metric. You can see the relevant MR at https://gitlab.com/gitlab-cookbooks/gitlab-mtail/merge_reque... (and https://gitlab.com/gitlab-cookbooks/gitlab-mtail/merge_reque... to fix some bugs)

Then we just hooked it up through the usual sort of prometheus alerting rules. We made it really twitchy, because this combination of logs Should Not Happen if everything is working right, and we want to alert as soon as it starts occurring, so we can bump the limits again (or re-evaluate in general)