Hacker News new | past | comments | ask | show | jobs | submit login

I was looking at several scraping solutions (e.g. imacros, selenium) that can handle DHTML for a project and they all have significant performance issues since they need to render the actual pages before processing them. A couple of thousands or rows isn't a problem but try anything more and you got a real performance bottleneck.



DHTML is server-side. You mean AJAX. Also, think of the page as an interface to a more lightweight web service. You should probably be parsing that directly.


He's referring to this: http://en.wikipedia.org/wiki/Dhtml I'm not sure what DHTML you are thinking of that would be server-side.


Fuck, thinking of SHTML for some reason.


Have you tried Watir? I'm not sure if it'll solve your performance issue, but it's been at least twice as fast as Selenium for me.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: