This is great thanks! I had a few weeks ago the problem of not being able to get a passive verb parser in CoreNLP fast enough to work. Does SpaCy support reduced passives?
You can write rules to find them in the dependency parse, although the parse tree won't necessarily be correct.
I've thought a lot about passive reduced relative clauses over the years --- they were a big part of my PhD thesis. So I happen to know that the first one in the WSJ data is wsj_0003.1. This isn't in the training or development data, but it's in the same data set --- so, this is a fair but optimistic spot-check. The sentence is:
> A form of asbestos once used to make Kent cigarette filters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago, researchers reported.
There are two reduced passives here --- "used" and "exposed", and a potential (but unlikely) false positive in "reported".
spaCy correctly attached "exposed" to "workers", but didn't attach "used" correctly --- it attached it to "reported" instead of "form". This doesn't really make syntactic sense, but that's what it did --- the system's entirely statistical; there's no grammar licensing certain attachments.
To see the parse, run:
from spacy.en import English
nlp = English()
tokens = nlp(u'A form of asbestos once used to make Kent cigarette filters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago, researchers reported.')
for word in tokens:
print word.orth_, word.tag_, word.dep_, word.head.orth_