It handles extraction too, trying to find where the results are and then extract...

It handles extraction too, trying to find where the results are and then extracting individually the title/summary/url/date.

To elaborate on the general approach I used, it was to take each node in the web page and get stats about all of them (e.g., position on page, amount of freetext, etc) and plug those stats into a neural net.

I worked on a different project some years ago that took the approach of looking for repeating tag patterns in the page, focusing especially on structural tags (as opposed to ones that are purely for formatting).

Another possible approach might be to just plug the whole result page into something such as Boilerpipe (https://code.google.com/p/boilerpipe/) and look at the set of urls in the text block it identifies.