A snapshot of one year's participation in that specific period isn't too relevant to what we are discussing, because it doesn't track sustained future contributions by those same users.
In theory they are. In practise the lack of a test dataset - and the lack of access to their dataset - means it's virtually impossible for a third party to make any significant contribution to the data processing code.
Such an effort would have to start with them voluneering a test dataset and/or schema.