One of the biggest wins here is this little tidbit:
> When you install Puppeteer, it downloads a recent version of Chromium (~71Mb Mac, ~90Mb Linux, ~110Mb Win) that is guaranteed to work with the API.
A lot of the chrome interface libs about at the moment require you to maintain your own instance of chrome/chromium and launch the headless server with your command line, or require a pre compiled version, that can quickly get out of date (https://github.com/adieuadieu/serverless-chrome/tree/master/...). Having this taken care of is a blessing.
There is complexity here you may not be seeing, namely if you are on a platform without X, it will not work.
Normal chromium requires a bunch of X libraries be present. It doesn't use them, but for things like headless testing, it's a massive pain, since the apt-get (or equivalent) is generally many hundreds of megabytes.
(Hi Tim, I use your GitHub corner :) thanks for making it) - There are still some differences between Chromium and Chrome, for example the case of playing back MP4 video due to paid licensing. But Puppeteer being easily configurable to use Canary or an existing Chrome installation addresses this gap.
I'm using The Verge for testing because it lazy loads images, and am being clumsy about the scrolling, but it mostly works - I can get 90% of the images to show on the finished PDF.
What I can't seem to get right, though, is creating a single-page PDF where the page height matches the document height perfectly - it always seems to be off by a bit, at least on this site (mine works fine, _sometimes_).
Well, no, really. Seems like this is due to the way Chrome deals with print CSS and PDF generation rather than any kind of calculations I do prior to invoking PDF generation...
So first there was Selenium's JSON Wire protocol, then came the W3C WebDriver spec and now we're back to browser-specific implementations? As someone who's tried/is trying to automate Firefox/Chrome/Safari/IE in a consistent fashion, my only question is: WHY?
Based on a quick read of the API, my interpretation is that this is not targeting people who are trying to automate every browser, but those who need to automate any browser.
In that context, it's dead-simple to use, and someone with very little experience should be able to get a working prototype in under 5 minutes.
For my use case, it's closer to "wget/curl with JS processing" than "automating a user's browsing experience". I don't particularly care which browser is doing the emulation, with the ease-of-use of the API making the biggest difference.
It seems very similar to PhantomJS, but to be honest, it's more attractive from an ongoing-support standpoint simply because it's an official Chrome project.
Exactly, this looks perfect for taking a screenshot of a page[1], or converting a page to a PDF[2] in just a few lines of code.
If you have an existing web service, this appears suitable for actual production usage to deliver features like PDF invoices and receipts, on-demand exports to multiple file formats (PNG/SVG/PDF) etc., which has quite different requirements compared to an automated testing framework.
The primary use-case for headless-chrome is to support stuff like scraping/crawling JavaScript-dependent sites and services, and emulating user workflows to retrieve data or trigger side effects that couldn't otherwise be achieved with something more low-level (curl, manual HTTP requests w/ Node's HTTP/S API etc).
headless-chrome would be used for the functionality of a server-side microservice, rather than for automated testing of UI/UX, there are already more appropriate projects to achieve that.
If you just need wget/curl with js, you can actually do that with the chrome CLI now. Just run your chrome binary with the arguments --headless --disable-gpu --dump-dom <url>
Chrome DevTools Protocol (CDP) is a more advanced API, and I've been leading an effort on getting multiple browsers to look at CDP under https://remotedebug.org
Get them to actually version their protocol first, maybe?
As someone with a bunch of tooling around the dev-tools API, it's a huge pain in the ass to not be able to tell what functions the remote browser supports. There's a version number in the protocol description, but it's literally never been incremented as far as I've seen.
For an example of an impossible task, try to retrieve request headers using Selenium. Browser automation getting more and more complicated the more cases you're trying to cover and in my impression WebDriver is definitely not enough. Who knows, perhaps some new version of WebDriver that I never heard of it will catch up once the functionality gets properly defined.
Yes, I wholeheartedly agree, it was a stupid decision by the Selenium devs not to make request headers etc. accessible. But why throw all the standardisation efforts overboard?
I think this is more "selenium is actively antagonistic to it's major use-case", then trying to throw everything away. There have been multiple attempts to convince the selenium people to revisit their decision W.R.T. headers, and they're completely unwilling.
Given that the selenium leadership is apparently uninterested in improvements, and it's many limitations, trying to improve there is more effort then it's worth.
I didn't follow that development. Can you share why Selenium maintainers chose not to implement headers? Is it that they want to restrict the tool to simulate what a normal user can do with a browser and not hacks such as overriding headers? Thanks in advance!
"Something something something normal users can't do that something something".
Basically, they're still stuck in this idea that they're ONLY for emulating user-available input (and the dev-tools don't exist).
In reality, there is tremendous interest in more complex programmatic interfaces, but apparently they're unwilling to see that, and are instead only interested in their implementations "user-only" ideological purity.
For an example of an impossible task, try to retrieve request headers using Selenium.
Selenium can't do that itself but Selenium can drive a browser that uses a proxy and you can retrieve everything about the request from that. It's a lot more challenging if you're testing things that use SSL but it's not impossible with a decent proxy app (eg Charles).
(I don't have the full history, but from my experience...) When I first started using Selenium, it used browser-specific drivers that communicated directly with bespoke extensions/plugins/add-ons/what-have-yous. Then came Selenium Server which allowed for testing on remote machines. Then Grid allowed for simultaneous testing of many remote machines.
Due to the nature of browser plugins, they could only access information that the browser made available. Also, the dev's have consistently been of the opinion that they only care to simulate an end-user experience. (End user's only care about the webpage's presentation, not which headers were present per transaction.) Although, I suspect that early restrictions influenced that viewpoint.
It wasn't until much later that browsers started implementing their own automation channels; Chrome's Automation Extension and Debugging Protocol, Firefox's Marionette, etc. At the same time, browsers started putting additional security measures around plugins making it even more difficult to have consistent features across Selenium's drivers.
Which is why the WebDriver became an open specification instead various driver implementations. I believe, Microsoft was the first to implement their own driver, InternetExplorerDriver, for IE7+. Then ChromeDriver (powered by Chrome Automation Extension), GeckoDriver (Firefox translation to their Marionette driver), and SafariDriver is now baked into Safari 10.
Marionette is possibly the most interesting, but I lack experience with it. TMK it allows for automating the entire browser; both the webpage and the 'chrome' interface. Whereas Selenium, at best, could only simulate actions - like the back button - through Javascript. But, even with the Marionette's feature richness, you still don't get access to request and response information.
I think for most developers ("devs", "qa", "scrapers", etc.) there's very little appeal in moving away from Selenium because it would require maintaining multiple test suites. It gives consistent results and just works. If you want lower level information, it's fairly simple to either 1) just use a CLI client (curl, wget, etc.), or 2) a library libcurl, Requests(Python), Net::HTTP(Ruby), etc. etc. etc., or 3) setup proxy server. I do all of the above, each has it's own downsides clients and libs don't do any rendering themselves and proxies tend to rewrite transactions (ex. strip compression and add/remove/alter headers 'Content-Type: gzip').
Puppeteer automates headless / visible Chrome today. However, the foundational way it does this is through DevTools Protocol. I believe it might be browser agnostic in time to come. There are various (some successful, some failed) adapters for bridging DevTools Protocol to IE, Edge, Firefox, Safari etc. So I really do think supporting cross-browsers is not an impossibility.
But resonated with your point that there are just so much codebase / automation assets already written. Usually exploring new tools happens when a new project happens, rather than recoding entirely an existing project. For the existing code base, unless contributors from the community writes a parser to translate those to Puppeteer API?
I do a lot of web scraping that requires a full browser for some websites. This sounds perfect for me. I often use Selenium, but it's a lot of complexity and quite buggy if I just want to run chrome (or any one browser, but not all browsers).
Have you tried TestCafe? I have had good experience with TestCafe on a recent project. The development team are very responsive to bug fixing and stability.
I think this can be ok. It's pretty obvious that the Webdriver protocol was insufficient. Let browsers "take it home" for remodelling and see if a new standardization effort will happen down the line.
I'm really loving headless chrome so far. I have around 650 tests which are mostly dealing with iframes and popup windows, and they run flawlessly. The first release seemed to have a memory leak which wasn't present in non-headless chrome, but that seems to have been fixed in version 60.
Hi bluepnume, if you are ok, can you share how you manage iframes and popups? I'm using DevTools protocol through websocket directly to communicate with Chrome. Have to do a lot of context handling (frames) and using Target.sendMessageToTarget (popup windows) to deal with them. These 2 features seem to be the harder to handle parts of the interaction layer when doing automation with DevTools Protocol. Thanks in advance!
It's funny how open the market for browser automation is. Everyone needs to switch off phantomJS in the next few years and is looking for the easiest option.
Sadly, there's probably no money on the line. But you will get your buggy software used by huge corporations for years to come!
There was still some commits in early July but haven't seen new updates till date. Maintaining a browser project is mammoth task. Even for an established startup such as Segment you see that NightmareJS has lots of issues unreplied for months. I guess it's primarily because of a potentially large user base and all kinds of edge cases requirements from different users. And add on to that, suppose to work on all OSes. Nightmare.
Currently only PDF printing of the entire page view-port is available. File an issue with your use-case so the team can assess whether or not to look into including that kind of behavior.
The API looks nice and clean, but I'm puzzled by this from the FAQ:
> Puppeteer works only with Chrome. However, many teams only run unit tests with a single browser (e.g. PhantomJS).
Is this true? Do teams write unit tests but only test them in a single browser? With test runners like Karma and Testem, running tests concurrently in multiple browsers is easy. You'd be throwing away huge value if for some reason you only decided to test in one vendor's browser.
From my experience most teams only test against a single browser, yes. It's however nice to have the option to switch to another browser when debugging a browser-specific bug.
From what I've seen in practice, for a lot of teams and projects, running tests concurrently across multiple browsers isn't easy as you claim. Once you have a database involved and not a perfectly clean setup, concurrent testing can become a hassle and is rarely worth the effort of fixing.
It's also quite easy to run Chrome and/or Firefox for free in CircleCI or TravisCI. Safari and Edge are non-trivial to test in those environments. (I think you can get Safari if you tell CircleCI you're an iOS app and hack your way over to Safari from there, but I don't know if you can even test Edge in the popular CI stacks.)
For unit tests, yes. (In fact, we use Jest which uses JSDom and Node). For integration or end-to-end automation, we normally run them in a wider suite of browsers.
I've been working on something similar (Headless Chrome via DevTools protocol) called Webfriend. It is a Python wrapper to the DevTools protocol, as well as a simplified imperative scripting environment which is specifically built for ease of use by people with a technical-but-not-programming background (lovingly called Friendscript).
It's by no means done, but it is functional and I'm hoping to see the project grow beyond a toy if there's interest in the community.
Things to note:
- Documentation is there, but there are gaps (especially w.r.t. the Python API.) I'll eventually get around to wrestling with Sphinx, et. al., but have not as of yet.
- Targets Google Chrome / Chromium 58.x - 60.x. No testing outside of those versions has occurred.
- Is Chrome only for the moment, but may evolve to work with WebDriver and other browser APIs in the future.
Also, PRs and issues are welcome, but my time to work on this is limited at the moment (which does speak to the point made elsewhere about corporate backing vs. individual maintainers, but this was built largely to scratch an itch.)
> Crawl a SPA and generate pre-rendered content (i.e. "SSR").
I've been maintaining (thanks to this team and Headless Chrome) a convenience API based on this feature. Some additional features:
* React checksums for v14 and v15 (v16 no longer uses checksums)
* preboot integration for clean Angular server->client transition
* support for Webpack code splitting
* automatic caching of XHR content
AWS Lambda presents a base image ready to be provisioned as multiple instances.
If you are familiar with Docker you can think about Lambda image as a small Docker image, while real work will be done in instances (Docker containers) created from this image.
Usual scenario is to provide ready to go image (with all source code, npm packages being installed, with Chrome Headless plugin, etc). Then AWS/Azure will run VM instances based on the image for almost every function request. Most of the time spinning such lite VMs takes no more than a couple of seconds.
Are there any new python drivers for headless chrome? Selenium seems to still be lacking several features exposed by the DevTools protocol (e.g. print PDF)
Yes, I understand it's just a protocol but it's a pain writing it by hand especially when I just want to test it out on some existing projects Vs selenium.
Thanks for the binding link. Will check it out this weekend.
Pretty psyched to see this. A lot of my automated testing headaches came from the intersection of using PhantomJS with transpiled code - which makes sense, since the Phantom team was always forced to play catch-up with the browsers being emulated.
Awesome! This will make the Boozang/CI integration much smoother. I was using npm-headless-chromium before, but the Xvfb dependency was error-prone and difficult to setup properly. Thank you very much!
Would be nice if it was possible to change the proxy settings for each request or for each session. Last time I checked it was only possible with the C API.
Awesome! This will simplify the Boozang Jenkins integration significantly (I was using npm-headless-chromium before, but there were many manual steps, and the Xvfb dependency was annoying). Thank you very much!
Very cool. In a previous job we ran our JS unit tests on PhantomJS, using a tool that allowed arbitrary code to be piped to Phantom for execution and redirected Phantom's console to stdout. (We did all this so our tests ran on something resembling a real browser, as opposed to jsdom.) Something similar should be possible with this.
We're fans of LucianoGanga's project (and of Chromeless, Doffy, Chrominator, Chromy, Navalia). I can tell you from personal experience that dealing with the raw DevTools Protocol isn't ideal for a developer writing an automation script, so it's clear there's demand for libraries with this higher-level API.
Would love to know if there's a feature parity concern you have or what you'd like to see from puppeteer (or any of these projects). (My personal feature unicorn: generate not a static screenshot but a proper video of a page session)
Chromeless allows for the setting of cookies. I noticed in an issue for Puppeteer that it wasn't seen as a high priority. Our use-case (session based cookies) rely on a cookie being set in order for our PDF screenshots to work.
Any chance you could give us some insight on when this may be available?
We're mainly using these things for screenshots of existing corporate intranet tools that are then surfaced to slack. Headless chrome is finally giving us screenshots that are actually authentically what you would actually see on the screen. PhantomJS came close but still has issues.
Actually the release of Puppeteer is a really exciting development. I've been waiting for some time for something like this to happen. We've seen what happened to PhantomJS (almost 2k open issues and main maintainer stepping down without a successor), NightmareJS (lots of unreplied issues for months, probably the project is not a strategic part of Segment) and so on. In theory it is great for an individual or an established startup to drive a web browser automation project. But in reality, the scope of web browser automation simply gets out of hand very quickly. There are just too much edge cases to support for a fast-changing domain.
Being driven by a large commercial entity actually has a chance of making it work out. With the browser automation tool and the browser dev team being one team, there can be synergies not possible otherwise. When I spoke to CasperJS creator some time ago, I can understand why there will be burnout. Referring to the popular Chromeless project launched less than a month ago, there are already 150+ new issues and 100+ still open, and they already have enough pipelines for a few releases ahead. It can be a nightmare to manage.
There's just too many needs from a large user-base for such projects. I'm speaking from the context of test automation and general browser automation.
Yep. Pat Meenan's herculean / ultramarathon support of WebPageTest is a remarkable exception. (Speaking of WPT, after just a cursory glance at this thread on my phone prior to thumbing this comment, it surprisingly hasn't been mentioned yet? shrug)
I just recently got to know WebPageTest. It even has scripting abilities! I'm just surprised why the project didn't enter into mainstream (in the sense that an average test automation guy like me will know).
> When you install Puppeteer, it downloads a recent version of Chromium (~71Mb Mac, ~90Mb Linux, ~110Mb Win) that is guaranteed to work with the API.
A lot of the chrome interface libs about at the moment require you to maintain your own instance of chrome/chromium and launch the headless server with your command line, or require a pre compiled version, that can quickly get out of date (https://github.com/adieuadieu/serverless-chrome/tree/master/...). Having this taken care of is a blessing.