1. Let me say that it's great to see more hackers work on issues of civic interest. The data from the government, like most things in life, is messy...but the more eyes we have on it, the better potential there is for public good to come from transparent.
2. This seems like a textbook case of mapping things that don't need to be mapped. I understand this is a hackathon and showing off a simple table will cause you to lose...but the map is indecipherable...it's only because it's a commonly-used template that I can make an assumption what the zoomed-out numbers mean. But the major flaw is that the purportedly important number, the "lives-at-risk" score, is completely buried. There's no way to make an easy comparison ...I can't even tell how many mines are actually considered dangerous. I think if your aim is to shed light on what are the most dangerous mines, according to your analysis, you should at least put up a table of 50 mines, listed in descending order of "lives-at-risk"
3. What alternative methodologies did you try, and how do they compare to your score? I would guess that a simple indexing of days-since-last-inspection and number of major violations (and some factoring in of type of mine) would also be a good indicator of how risky a mine currently is.
This was unfortunately truer than it should have been. I saw two cases where truly predictive models on cleaned data sets were beaten by less informative mappings of uncleaned data.
It's density of mines - plotting all 6000 doesn't provide useful information. Scroll in and the pins will expand. Clicking on them will give you a link to our analysis for that mine.
The Lives-At-Risk score aims to estimate the number of lives that would have been lost since the opening of a mine, in the absence of any inspections.
We decompose this into 2 parts:
1) The number of lives that have been lost, and
2) the number of lives that could have been lost.
We measure 1) using past accidents, and 2) using safety violations. We normalize safety violations according to their likelihood of occurrence, severity relative to fatality and number of people affected, as assessed by the Mine Safety and Health Administration inspectors.
2. This seems like a textbook case of mapping things that don't need to be mapped. I understand this is a hackathon and showing off a simple table will cause you to lose...but the map is indecipherable...it's only because it's a commonly-used template that I can make an assumption what the zoomed-out numbers mean. But the major flaw is that the purportedly important number, the "lives-at-risk" score, is completely buried. There's no way to make an easy comparison ...I can't even tell how many mines are actually considered dangerous. I think if your aim is to shed light on what are the most dangerous mines, according to your analysis, you should at least put up a table of 50 mines, listed in descending order of "lives-at-risk"
3. What alternative methodologies did you try, and how do they compare to your score? I would guess that a simple indexing of days-since-last-inspection and number of major violations (and some factoring in of type of mine) would also be a good indicator of how risky a mine currently is.