PhantomJS: Archiving the project, suspending development

emilsedgh · on March 4, 2018

Chrome and Firefox gaining headless modes is the ultimate goal Phantom could've achieved.

So I consider it a complete success.

Kudos to all contributors.

tyingq · on March 4, 2018

I feel the same. No failure here. It served its purpose very well, while it was needed. It’s retiring while it’s still better than it’s replacement in a few areas...like proxy support.

overdrivetg · on March 4, 2018

This is the main blocker for me right now for sure.

BinaryIdiot · on March 4, 2018

I really hope Microsoft offers a headless IE / Edge at some point. It would be amazing to be able to use all 3 major browsers like this. Heck, get Safari in there too (though I feel like it should be doable with WebKit already).

poizan42 · on March 4, 2018

Huh? They have had the WebBrowser ActiveX control since IE3 I think. That works as headless as you get on Windows.

BinaryIdiot · on March 4, 2018

Not quite. It's my understanding it's not as up to date or equivalent to the latest IE. When I tried looking around I just got a lot of confusing information about the control, how it applies (or doesn't) traditional IE hacks, etc.

Though I could be wrong but it didn't seem like an equivalent to headless chrome or firefox.

poizan42 · on March 4, 2018

That is not correct, it just runs in compatibility mode by default, see https://blogs.msdn.microsoft.com/patricka/2015/01/12/control...

luisrudge · on March 4, 2018

they have webdriver support. Is that enough? I'm not sure what's the difference between headless and webdriver (if any) https://blogs.windows.com/msedgedev/2015/07/23/bringing-auto...

alxlu · on March 4, 2018

Running Edge with webdriver still includes the GUI.

tnolet · on March 4, 2018

Agree 100%. Headless & programmable browsing is a hard nut to crack. PhantomJs paved the way.

kodablah · on March 4, 2018

Sadly, Chrome won't support extensions: https://bugs.chromium.org/p/chromium/issues/detail?id=706008

rnnr · on March 4, 2018

but... i think you can attach it to an existing chrome with remote debugging enabled, so not that big a deal!

kodablah · on March 4, 2018

Sure can, I've done it with puppeteer. Just can't enable headless.

ricardobeat · on March 4, 2018

The main advantage of PhantomJS at this point is the way it's built and distributed as a single binary, unlike Chrome. Much easier to maintain.

ris · on March 4, 2018

Correction: much easier to maintain irresponsibly. That binary you just downloaded and stuck somewhere some time ago - when was the last time it was updated? What about all of its underlying bundled static libraries? How certain were you ever of their up-to-dateness? Did you ever check what version of zlib it was using?

Distributions attempting to package phantomjs properly had one hell of a time trying to reproduce its builds reliably. Most gave up.

Distribution from author as binaries is a whole bundle of fail from the get-go.

fareesh · on March 4, 2018

Does headless chrome do PDF generation? That's the only thing I'm using phantom for at the moment.

madeofpalk · on March 4, 2018

Yes.

In fact, there's a command line switch for it https://developers.google.com/web/updates/2017/04/headless-c...

The Chrome team also make Puppeteer, a node's library for interfacing with headless chrome, and has methods for making PDFs as well https://github.com/GoogleChrome/puppeteer

dewey · on March 4, 2018

First result:

--print-to-pdf

https://developers.google.com/web/updates/2017/04/headless-c...

itslennysfault · on March 4, 2018

Yes.

It's really easy to do using [puppeteer](https://github.com/GoogleChrome/puppeteer). The 2nd or 3rd example is PDF.

jonaswi · on March 4, 2018

I just recently switched from using PhantomJS to pupeteer for PDF generation in a production application. Works like a charm and has a very clean API.

fareesh · on March 4, 2018

Thanks that looks like a very clean library. Will switch to this.

wgjordan · on March 4, 2018

This project has been effectively dead since April 2017, when Vitallium stepped down as maintainer as soon as Headless Chrome was announced [1]:

> Headless Chrome is coming [...] I think people will switch to it, eventually. Chrome is faster and more stable than PhantomJS. And it doesn't eat memory like crazy. [...] I don't see any future in developing PhantomJS. Developing PhantomJS 2 and 2.5 as a single developer is a bloody hell.

One potential path forward could have been to have PhantomJS support Headless Chrome as a runtime [2], which Paul Irish (of Google Chrome team) reached out to PhantomJS about. However, it seems there hasn't been enough interest/resources to ever make this happen.

[1] https://groups.google.com/d/msg/phantomjs/9aI5d-LDuNE/5Z3SMZ...

[2] https://github.com/ariya/phantomjs/issues/14954

micimize · on March 4, 2018

Timeline of what lead to this, from what I could gather:

• phantomjs is 7 years old, @pixiuPL has been contributing for about 2 months

• @ariya didn't respond to his requests for owner level permissions

• @pixiuPL published an open letter to the main page of phantomjs.org https://github.com/ariya/phantomjs/issues/15345

• the stress leads @ariya to close the repo.

• @pixiuPL intends to continue development on a fork

This is a good reminder of why non-technical skills are so important in OS and in general.

rilut · on March 4, 2018

I don't know man, but look at @pixiuPL's commits https://github.com/ariya/phantomjs/commits?author=pixiuPL

Especially his own commits (non-merge commits)

enitihas · on March 4, 2018

Exactly. I mean look at his commits. Look at this file: https://github.com/ariya/phantomjs/blob/master/package.json (He created a commit only to add his how name as a contributor to package.json, which is the only name in package.json).

How did his changes even make it to the repo. There are commits adding and deleting whitespace with the disguised commit message of "Refactoring Code". I have no doubt on why ariya couldn't work with him.

valleyer · on March 4, 2018

Some of his commits surely don't even compile. This one takes the leading # characters off of preprocessor directives: https://github.com/ariya/phantomjs/commit/52e4849a36c68b15c6...

ExactoKnight · on March 5, 2018

This is why every open source project needs a basic CircleCi.

ricardobeat · on March 4, 2018

Changing default indentation from 2 spaces to tabs (without updating existing files?): https://github.com/ariya/phantomjs/commit/6485f5466110fc8d4b...

I couldn't find a single one containing any meaningful code changes. The closest one is a81a38f[1] which seems to introduce bugs - removing open file check, plus a hanging if clause.

Sounds like it's either an elaborate prank, or the guy has no grounding on reality.

[1] https://github.com/ariya/phantomjs/commit/a81a38ffabe2cea715...

bcherny · on March 4, 2018

What about them?

rilut · on March 4, 2018

He committed:

- Removing one whitespace and adding an unnecessary file https://github.com/ariya/phantomjs/commit/98272b9752b2d505f7...

- Conflicted files https://github.com/ariya/phantomjs/commit/63a69d9e2e9c31baab...

- Personal build env https://github.com/ariya/phantomjs/commit/d57ff74f36c5b79d82...

- Deleted the whole project while changing cloud provider https://github.com/ariya/phantomjs/commit/a242fb8d605d9aa4af...

- then re-adding the whole project again https://github.com/ariya/phantomjs/commit/ddaaa09785d453e415...

and other weird/careless commits

enitihas · on March 4, 2018

Now, he shows up at number 4 in the Github contributor list (https://github.com/ariya/phantomjs/graphs/contributors) with 179,603 lines added, not a single one of which seems to be a meaningful contribution by him.

Are such incidences common in other open source software too, or does this one seem a rare case?

micimize · on March 5, 2018

I doubt it, but the line count / commit frequency heuristics really shouldn't be taken too seriously anyways

ollysb · on March 4, 2018

Make of it what you will but they appear to be a bunch of commits that amount to removing white space or adding/removing comments with messages like "Code refactoring". It looks a lot like someone trying to get their git blame count up by any means other than writing actual code.

watwut · on March 4, 2018

Who is lacking non-technical skills in that scenario? I just don't see it.

enitihas · on March 4, 2018

I think looking at @pixiuPL's commits (https://github.com/ariya/phantomjs/commits?author=pixiuPL) and his tall, non sensical claims makes it clear who lacks the basic human communication skills. Just have a look for yourself.

Looking at some issues filed by him (https://github.com/composer/composer/issues/7016) makes the entire thing more clear.

watwut · on March 4, 2018

Gotcha. Through, lack of communications skills and being either delusional or complete dick is not the same. Meaning there are people whose main problem is lack of communication skills and people whose problems run much deeper.

TheAceOfHearts · on March 4, 2018

Some people are mentioning headless Chromium, so I wanna mention another tool I've used to replace some of phantomjs' functionality: jsdom [0].

It's much more lightweight than a real browser, and it doesn't require large extra binaries.

I don't do any complex scrapping, but occasionally I want to pull down and aggregate a site's data. For most pages, it's as simple as making a request and passing the response into a new jsdom instance. You can then query the DOM using the same built-in browser APIs you're already familiar with.

I've previously used jsdom to run a large web app's tests on node, which provided a huge performance boost and drastically lowered our build times. As long as you maintain a good architecture (i.e. isolating browser specific bits from your business logic) you're unlikely to encounter any pitfalls. Our testing strategy was to use node and jsdom during local testing and on each commit. IMO, you should generally only need to run tests on an actual browser before each release (as a safety net), and possibly on a regular schedule (if your release cycle is long).

[0] https://www.npmjs.com/package/jsdom

AlphaWeaver · on March 4, 2018

Cheerio [0] is fantastic for this as well...

[0]: https://www.npmjs.com/package/cheerio

TheAceOfHearts · on March 4, 2018

I've tried Cheerio as well, but I prefer JSDOM since it exposes the DOM APIs. What I'll normally do is interactively test things out in the browser's console, and then transfer em over to my script. Browser dev tools are just super amazing.

madeofpalk · on March 4, 2018

Agreed - I find the Cheerio APIs to be awkward when traversing deep into the DOM. Last time I used Beautiful Soup I found it also had this problem. The DOM API that JSDOM provides is such much more natural to work with.

pitaj · on March 4, 2018

Cheerio is so much faster and more lightweight than jsDOM. If you can do it with Cheerio, you absolutely should.

Rapzid · on March 4, 2018

Using Cheerio with TypeScript types to parse data from tables in downloaded HTML files. Great tool.

madeofpalk · on March 4, 2018

One question I've had recently is how to scrape out a Javascript object out of HTML source. With server-side react + redux, I've wanted to be able to scrap out the serialised var __STATE__ = {...} object to JSON, from nodejs. Best solution I cobbled together was to basically eval() the JS source, which I know is far from ideal.

seeekr · on March 4, 2018

You could use a parser like esprima or its equivalent from the babeljs ecosystem on the JS source instead and just find the global variable with name `__STATE__` and just eval its init expression. Cheaper, more secure, more direct than actually running the JS.

madeofpalk · on March 4, 2018

I actually looked into this (from reading docs, never wrote code) and I wasn't able to find a way to convert the AST for the ObjectExpression into JSON or an actual Javascript object.

seeekr · on March 5, 2018

What you need is a code generation library that will turn the AST back into JS code once you've identified which part of the syntax tree you're interested in. And that's the code you want to then eval(). Esprima has escodegen for that purpose. I'm not sure what the counterparts are in the babel world. Feel free to shoot me an email with any specifics of where you're getting stuck thinking this through (email should be visible from my profile?), and I'll be glad to help.

TheAceOfHearts · on March 4, 2018

You can use the vm module [0] to securely execute the code.

[0] https://nodejs.org/api/vm.html

tonto · on March 4, 2018

Right before release is a bad time to realize there are problems with your build

chrisweekly · on March 4, 2018

Better than right after release.

tokenizerrr · on March 4, 2018

Depends on how often you release and who sets the schedule.

h1d · on March 5, 2018

jsdom is pretty unforgiving and won't load broken HTML. That's where I stopped using it. Could use some tidy tool maybe.

draw_down · on March 4, 2018

jsdom is an impressive achievement, but it may not be what you want depending on what you’re trying to do. It doesn’t mimic the behavior of browsers well in a number of regards, so it will let you do things that real browsers don’t allow. If you’re doing integration-type testing that can lead to tests that pass but functionality that fails in real browsers.

enitihas · on March 4, 2018

For those who haven't looked at some of the commits by @pixiuPL, the list is here : https://github.com/ariya/phantomjs/commits?author=pixiuPL.

To summarize: It does not look like the guy has done a single commit with any meaning. His commits are basically the following:

1. Adding his own name in package.json 2. Adding and deleting whitespace. 3. Deleting the entire project and commiting. 4. Adding the entire project back again and commiting.

Just out of curiosity: How likely is that someone may be able to use a large number of such non functional commits(adding and removing whitespace) to a popular open source repository to boost their career ambitions.(e,g. Claiming that they made 50 commits to a popular project might sound impressive in an interview.)

captain_murdock · on March 4, 2018

Grab some popcorn and give this a read: https://github.com/ariya/phantomjs/issues/15345.

@pixiuPL thinks he's king of the world, but gets rightfully put in his place.

enitihas · on March 4, 2018

I think an interesting project may be to look at popular github repositories and searching for such 'stat builders',i.e, people who make commits of no utility just to boost their github stats.

nolok · on March 4, 2018

Given that he hides those behind fake commit message (unless one counts removing a comment or a whitespace "code refactoring), I would say rather likely.

enitihas · on March 4, 2018

It seems there are no limits to this madness.e,g https://github.com/ariya/phantomjs/commit/970edb9b683175a6b1...

In this commit the guy deletes two spaces from a file, and adds copyright for his name at the top. Going through his commits has made me extremely shocked. I mean how did such low quality commits made it into the master branch of the repo. It is like these commits were invisible to all the visitors and users of the repo.

michaelermer · on March 4, 2018

Also look at stuff like this... https://github.com/SeleniumHQ/selenium/pull/5468

enitihas · on March 4, 2018

The main question which is coming to my mind right now is how on earth did this guy get contributor access to the repository. And here I used to think that being maintainer of a large open source project must take a lot of talent and hard work.

gulperxcx · on March 5, 2018

Sounds like you're getting ideas.

petercooper · on March 4, 2018

Two alternatives:

Headless Chrome with Puppeteer: https://github.com/GoogleChrome/puppeteer

Firefox-based Slimer.js: https://github.com/laurentj/slimerjs (same API as Phantom which is useful if using a higher level library like http://casperjs.org/)

mrskitch · on March 4, 2018

I maintain a puppeteer-as-a-service repo here: https://github.com/joelgriffith/browserless. It’s pretty feature rich at this point, allowing you to specify concurrency, sessions timeouts, and comes with a robust IDE (which you can play with here: https://chrome.browserless.io).

I’m working on building out a serverless model, which is the holy grail of headless workflows, but it’s a bit more challenging to operationalize than one would think.

I’m hoping that these efforts will lower the bar for folks wanting to get started with puppeteer and headless Chrome!

skinnymuch · on March 4, 2018

Browserless seems awesome. Thanks for sharing your project!

lukebennett · on March 4, 2018

As has been said, this point was somewhat inevitable with the advent of Chrome and Firefox's headless modes. However, as the project slips into the mists of history, let's not forget the vital stepping stone it provided in having access to a real headless browser environment vs a simulated one. I for one will remain grateful to Ariya, Vitallium and all the team for their efforts.

tnolet · on March 4, 2018

I’m super biased in this, having spend considerable time programming against PhantomJs, Selenium and now Headless Chrome / Puppeteer for my startup https://checklyhq.com. This whole area of automating browser interactions is an extremely hard thing to get stable. In my experience, the recent Puppeteer library takes the cake but PhantomJs is the spiritual father here. I will not talk about Selenium for blood pressure reasons

iaml · on March 4, 2018

Having dabbled with both selenium and phantom, I can vouch for both being PITA to work with.

mrskitch · on March 4, 2018

Have you seen _my_ startup (https://browserless.io/). The stability part is something I’m trying to solve once and for all with this project.

rumblefrog · on March 4, 2018

Within the issue @pixiuPL created, I listed some of the things that he has shown incompetence on: https://github.com/ariya/phantomjs/issues/15345#issuecomment...

mkarnicki · on March 4, 2018

Nicely put github comment, well done. Thank you. I feel sick in my mouth seeing PL in his username, which clearly indicates my home country. I am beyond baffled.

hrasyid · on March 4, 2018

Ariya wrote a bit about his reasoning here: https://mobile.twitter.com/AriyaHidayat/status/9701730017013... also mentioning an old post in https://github.com/ariya/phantomjs/issues/14541

hartator · on March 4, 2018

I still think it's premature. There is still couple of fields PhantomJS is better than Headless Chrome. Notably proxy support, and API aviability.

ComputerGuru · on March 4, 2018

Yes, but what was in it for Vitallium? Continue working thanklessly on a project to serve others’ needs, who has a whole will leave en masse as soon as headless chrome gets to parity with proxy support?

transreal · on March 4, 2018

That's not really true. You can use proxies with Headless Chrome using the --proxy-server command line parameter. And the API is richer that PhantomJS. See the underlying API documentation here: https://chromedevtools.github.io/debugger-protocol-viewer/to....

hartator · on March 4, 2018

It's only for proxy without auth. So mainly local ones. There is no way to use username and a password right now for proxy with headless chrome.

redka · on March 4, 2018

Well with Chrome going headless there isn't a whole lot of place for PhantomJS anyway. Or is there? What is it still good for?

apocalyptic0n3 · on March 4, 2018

Legacy systems for one. The Cooperative Patent Classification group releases their classifications en masse as HTML (single zip download, which is great). I built a parser for a PHP project that could parse all several hundred thousand records from the HTML in a few minutes. In 2017, they switched to a system that loads in the data from JSON stored in Javascript in the HTML (it is every bit as terrible as you imagine). Obviously loading in the HTML and trying to use regex to match the JSON was a terrible idea (especially since it was encoded to boot...), so I instead used Phantom to load each file, render it, and save it to a temporary file which I then parse using the original pre-2017 parser. Like 10 lines of code in Phantom to do it.

Obviously with my situation, this is not the end of the world. I use the parser twice a year and Phantom will continue to handle that task just fine. But I also know that the switch to using headless Chrome would be an expensive one if necessary; we have to research it, we have to update local dev environments, we have to implement it, we have to write new tests for it, we have to test it, we have to updating our deployment strategy, update our server deployment configuration, and, worst of all, get all of these changes and new software installations approved by the USPTO which is a nightmare. My situation is simple, but would take several weeks to several months to actually deploy to production. As it stands, I will likely have to explain why we have a now-unmaintained piece of software on the server and may be forced to switch regardless.

I can easily imagine how this project sunsetting, even though there is a clear alternative and successor, could be a nightmare to a lot of people. It's not the end of the world, but it's definitely unfortunate

feelin_googley · on March 4, 2018

Is this the data you were trying to parse?

https://www.cooperativepatentclassification.org/Archive.html

apocalyptic0n3 · on March 4, 2018

Yes, but I just realized I was mistaken. The data I was talking about was the International Patent Classification. CPC was XML, IPC is HTML, and the former/now-deprecated US patent classification system was plain text. I have to deal with all three on a regular basis and have built importers for all three, and I forget which one is which.

IPC can be downloaded from the link below. I needed the Valid Symbol List. Looks like they fixed the encoded JSON that was there when they first put out the new format.

http://www.wipo.int/classifications/ipc/en/ITsupport/Version...

redka · on March 4, 2018

Why would you need PhantomJS for that? Can't you just parse the HTML files with Nokogiri and be done with it? That would be orders of magnitude faster anyway

tnolet · on March 4, 2018

Big misunderstanding in browser land. The HTML delivered to you over the wire, the stuff Nokogiri sees, is not the stuff you see on your screen or even when doing a “view source”

nkozyra · on March 4, 2018

OK, obviously the stuff you see on your screen not matching the HTML delivered makes sense, but explain the HTML source not matching what's sent via the HTTP response. DOM can be modified, of course, JS can introduce more dynamic HTML, but view-source should always represent any non-redirected HTTP response. What is Nokogiri getting that the browser isn't (or vice versa)?

joatmon-snoo · on March 4, 2018

> view-source should always represent any non-redirected HTTP response

Not the grandfather, but generally in browsers you have two versions of HTML "source" - the canonical source, the stuff pulled down over HTTP, and the repaired source, the version that actually gets rendered.

I'm unfamiliar with Nokogiri, but I suspect that from context, it doesn't repair HTML in the same way that browsers do.

Kiro · on March 4, 2018

But it should be the same as "view source" right? The post replied to claims otherwise.

dewey · on March 4, 2018

No it's not. https://news.ycombinator.com/item?id=16514517

acdha · on March 4, 2018

It sounds like you are confusing View Source and the live developer tools DOM view.

apocalyptic0n3 · on March 4, 2018

> JS can introduce more dynamic HTML, but view-source should always represent any non-redirected HTTP response

That is both true and false. Because the JS can introduce dynamic content, the source returned by the HTTP response often doesn't match the source that is rendered by the browser itself. In many cases, a site will return a skeleton (just HTML) and then make an Ajax request to populate it. In my case, it was just the skeleton HTML with a few hundred lines of JS plus a long string of JSON

Kiro · on March 4, 2018

But we're not talking about the rendered source here. We're talking about "view source", which afaik always matches what is returned by the server.

The post replied to claims that Nokogiri doesn't see this however so I'm puzzled.

dewey · on March 4, 2018

"view source" shows the source after all the javascript ran. So what a client that doesn't execute javascript (like curl) sees is different from what you see in "view source".

That's also the reason while you had to "pre-render" you javascript web apps for SEO purposes until google bot got the ability to execute javascript.

madeofpalk · on March 4, 2018

I get what you're saying now, but I believe you're mistaken about "View Source".

I've never seen "View Page Source" or "Show Page Source" be the current DOM representation. It's always the HTML what came over the wire, the same you'll get from curl (unless the server is going user agent shenanigans, which I think we can agree is out of scope here).

If you're talking about the page after Javascript is ran, the only way you're seeing that is by opening the dev tools and looking in the 'Elements' or 'Inspector' panel.

I just checked in Safari, Chrome, and Firefox and found this to be true in all of them. The distinction between the View Source and DOM Inspector is very clear.

detaro · on March 4, 2018

In what browser is this case? Chrome and Firefox it isn't. In the dev tools, you see the rendered DOM, but view source shows you the HTML from the server.

apocalyptic0n3 · on March 4, 2018

I had to actually render the HTML and run the Javascript in order to populate the HTML with the data I needed to parse. The HTML does not include the parse-able data by default and is populated at runtime from JSON embedded in the Javascript in the HTML.

As far as I am aware, Nokogiri isn't capable of that and even if it is, I was unaware of that library at the time I wrote the Phantom solution (only discovered it last Summer but have yet to use it for anything)

redka · on March 4, 2018

No, Nokogiri isn't capable of that so you need an actual browser runtime. I didn't think a downloadable site would have javascript populating the page with data. But if it's only from JSON embedded in the JS from the HTML then I guess it's still possible to retrieve that and unless it requires some processing a JSON is as good as you can get.

apocalyptic0n3 · on March 4, 2018

The JSON was encoded (quotes and brackets were both HTML encoded) and couldn't reliably be parsed, or at least not in a way I was satisfied with. Rendering the HTML and actually building out the page as it would normally be rendered and using the parser that I already had built made way more sense. And, at the time, Phantom was the best option I could find for it.

forgotmypw · on March 4, 2018

I think you might have missed this part:

>In 2017, they switched to a system that loads in the data from JSON stored in Javascript in the HTML

minitoar · on March 4, 2018

Maintaining systems already built on top of PhantomJS.

toomuchtodo · on March 4, 2018

A bit concerning, as youtube-dl relies on PhantomJS currently.

netheril96 · on March 4, 2018

youtube-dl will do fine. It is updated once in several days, and with that activity count, I think they will transition to headless chrome in no time.

bklaasen · on March 4, 2018

Amazingly, youtube-dl works very reliably in Termux[1]. I can't see that surviving a transition to headless Chrome.

[1] https://termux.com/

paulie_a · on March 4, 2018

I am curious about this aspect and probably should do some research, but how will highcharts to PDF work?

Phantomjs was generally great for that type of rendering

epx · on March 4, 2018

Not sure whether it is as easy to use as PhantomJS.

nkozyra · on March 4, 2018

I'd say Puppeteer is on-par with Phantom for ease of basic use. It has a richer, deeper API, of course, but at its core it's modern Javascript.

chucksmash · on March 4, 2018

+1 on Puppeteer. Using it for something now. For small projects, the ability to have the JS you want to run within the context of the page itself live side by side with your browser instrumentation code feels magical. Head and shoulders nicer experience than in cases where half of your logic is second class code-as-a-string (e.g. trying to work directly with Gremlin Server from a non-JVM language by POSTing Groovy-as-a-string)

vorg · on March 4, 2018

> half of your logic is second class code-as-a-string (e.g. trying to work directly with Gremlin Server from a non-JVM language by POSTing Groovy-as-a-string

It must be particularly difficult when your Groovy-as-a-string script itself has many strings in its code, which is what a typical Apache Groovy build script for Gradle looks like.

epx · on March 4, 2018

Thanks for the info.

redka · on March 4, 2018

Well that depends if you're stuck with Javascript. There isn't anything simpler (that I'm aware of - bu I do web scraping/automation professionally for about 6 years) than watir[0]. PhantomJS doesn't even come remotely close.

[0] http://watir.com/

Analemma_ · on March 4, 2018

There is one thing about this that saddens me: PhantomJS still starts up much faster than headless Firefox or Chrome, at least for me, which makes some of our integration tests take a long longer than they should.

Has anyone here figured out any tricks to get headless Chrome booted fast?

vaviloff · on March 4, 2018

Also PhantomJS was a single statically linked binary with no dependencies that you could literally drop into a server and run scripts at once.

_pctq · on March 4, 2018

For those who may struggle with using chrome headless on server, here is a dockerfile example to get your started : https://github.com/oelmekki/chromessr/blob/master/Dockerfile

godet is the lib I use for chrome piloting, replace with your favorite one.

cowkingdeluxe · on March 5, 2018

Running it as a pooled web server via generic-pool makes it run a bit more efficiently. Using the pooling method, it can do 512x512 images every 400 ms, add in Optimize, WebP & S3 for a total 1000 ms.

I based the pool off of https://github.com/latesh/puppeteer-pool/blob/master/src/ind... .

pbiggar · on March 4, 2018

I have the same problem, so it's not just you.

gowan · on March 5, 2018

this is one of the reasons i created chromedriver-proxy[0]

[0] https://github.com/ZipRecruiter/chromedriver-proxy

sergiotapia · on March 4, 2018

End of an era! Congratulation to team for all their hard work and excellent contribution to help teams build better software.

All the best to everybody!

pknerd · on March 4, 2018

Somehow I am having issue to use both headless FireFox|Chrome. Unlike PhantomJS where all I had to do is to drop the binary and set the path, both FF and Chrome are not following same route thus I am happy to use PhantomJS for a while

isuckatcoding · on March 4, 2018

I would think PhantomJS is still quite heavily used so having some kind of migrator to puppeteer would be useful. I’m sure people would pay $$$ for it.

skrebbel · on March 4, 2018

Thank you, PhantomJS contributors. You built a life saver.

_ugfj · on March 4, 2018

Drupal dropped PhantomJS too https://www.drupal.org/project/drupal/issues/2775653

kschiller · on March 4, 2018

Does anyone here know if there's a way to set SSL client certs with Headless Chrome? With PhantomJS I could use

  --ssl-client-certificate-file and --ssl-client-key-file

Changu · on March 4, 2018

I do lightweight web automation via Chromiums "Snippets". It is super nice to work that way because you see on screen what happens and can check everything realtime in the console. Only problem is that they dont survive page loads. So when my snippet navigates to a new url I have to trigger it again manually. What would be a good way to progress from here so I can automate across pages?

icebraining · on March 4, 2018

Greasemonkey and its descendants (e.g. Violentmonkey) can run user scripts which work across pages.

Changu · on March 4, 2018

Maybe it is even easier to write a Chrome extension?

moondev · on March 4, 2018

I remember taking full page screenshots with phantom back in the day. Really cool project. Nightmarejs is another alt with a friendly api.

rutierut · on March 4, 2018

One of the guys working on P-JS just linked from a GH issue to his open letter... He isn't very happy with the owner blah blah blah and is going to fork the master branch to make phantom great again, I'll just put this here:

"Will do as advised, as I really think PhantomJS is good project, it just needs good, devoted leader."

enitihas · on March 4, 2018

It does not look like the guy has done a single commit with any meaning. His commits are basically the following: 1. Adding his own name in package.json 2. Adding and deleting whitespace. 3. Deleting the entire project and commiting. 4. Adding the entire project back again and commiting.

paulie_a · on March 4, 2018

That sounds slightly ambiguous, is that person going to be that leader, out are they looking for one?

chirag64 · on March 4, 2018

Shoot, I was just planning to use this for generating PDFs out of a URL on nodejs. Does anyone know of any other library / module out there that is good at this?

randlet · on March 4, 2018

You can generate pdfs with headless Chromium/Chrome pretty easily.

    chromium-browser --headless --disable-gpu --print-to-pdf=output_file_name.pdf file:///path/to/your/html

bluehatbrit · on March 4, 2018

Sadly you get 0 control over headers and footers of the output PDF, meaning you get lovely crappy page numbers around the place with no way to turn them off. This is why, sadly, I have to keep my command line markdown -> pdf converter (https://www.npmjs.com/package/mdpdf) using Phantomjs.

So this does work for very basic pdf printouts, but so far phantom is the only tool that offers full control over the PDF output. Even down to things like margins, paper size, etc.

runarberg · on March 4, 2018

I think you can just use headless firefox[1] or headless chrome[2].

[1]: https://developer.mozilla.org/en-US/Firefox/Headless_mode

[2]: https://developers.google.com/web/updates/2017/04/headless-c...

laktek · on March 4, 2018

Check pdf.cool (hosted API)

wnevets · on March 4, 2018

is headless chrome's API just as easy to work with? Taking a screenshot or saving a page as pdf is stupid simple with phantomjs

andrewguenther · on March 4, 2018

yep, just as easy

wxyyxc1992 · on March 4, 2018

Thanks & Goodbye