Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Chromeless – Headless Chrome Automation on AWS Lambda (github.com/graphcool)
385 points by schickling on July 26, 2017 | hide | past | favorite | 89 comments



I'm really excited to finally open-source Chromeless. We've used NightmareJS and similar tools before to run integration tests but these basically added ~20min to each build. With Chromeless we were able to reduce this time to under a minute!

Here is btw a demo playground to try it out: https://chromeless.netlify.com/

Let me know if you have any questions :)


How did you manage to get Chrome headless compiled and running on AWS?

I spent a while looking at this a few months ago to do a similar project (run chrome in headless mode on AWS Lambda) but couldn't get the binary to work.


We're using serverless-chrome for the headless-chrome-on-aws-lambda part. https://github.com/adieuadieu/serverless-chrome

If you're interested in how serverless-chrome works, I wrote an article on how to get headless chrome running on AWS Lambda from scratch here: https://medium.com/@marco.luethy/running-headless-chrome-on-...


How difficult would it be to also support google cloud functions and the azure offerings? This seems like a really useful standard tool that lots of people might want to use. CI jobs on pull requests that take seconds instead of minutes = big win!


Azure Container Instances might be good for this: https://azure.microsoft.com/en-us/blog/announcing-azure-cont...


This should actually be pretty easy. There is no reason that this is bound to AWS Lambda. We're more than open to accept PRs that enable multi-cloud support.


+1 for Google Cloud Functions.


shamless plug: i've also written a high level api on top of the chrome remote debugger chrominator [1]

similar idea. chrominator use promises instead of a fluent api. it also follows the selenium w3c spec where possible. it does cool stuff with evaluate and evaluateAsync where it resolves the remote object to something usable.

to be fair there are a few other projects i know about that wrap chrome remote debugger with a high level api:

* autogcd [2]

* ghostjs [3]

[1] https://github.com/jesg/chrominator

[2] https://github.com/wirepair/autogcd

[3] https://github.com/KevinGrandon/ghostjs


Shameless plug: I've been hacking on headless chrome in AWS Lambda but with selenium webdriver support [1], also using the binaries from the serverless-chrome [2] project.

To echo another comment on this thread, headless chrome seems well-positioned to shake up the automated browser testing market. The price—especially with the AWS Lambda free tier—is very, very compelling for a number of projects.

[1] https://github.com/smithclay/lambdium [2] https://github.com/adieuadieu/serverless-chrome


[1] Chrominator looks interesting. jesg/gowan is the maintainer for PhantomJS GhostDriver, so the API looks extensive. Another project that I heard recently is Navalia [4], which uses GraphQL type of queries to run headless Chrome. Also worth noting is Chromy [5] by a Japanese developer; it's one of the earliest to market (released in early May). BackstopJS maintainer also looks to be in active development using Chromy.

[4] https://github.com/joelgriffith/navalia

[5] https://github.com/OnetapInc/chromy


Wrote a post with the list of new entrants to browser automation using headless / visible Chrome -

https://medium.com/@kensoh/chromeless-chrominator-chromy-nav...


This looks great! As the developer of an automated screenshot solution (https://urlbox.io), one of the major pain points when taking screenshots is font-rendering. I wonder how you could install/configure fonts on lambda?


@schickling - When will the PDF support arrive? https://github.com/graphcool/chromeless/blob/master/docs/api...


As I can't give you an exact estimate when it will arrive, seems like a lot of people are already asking for it here: https://github.com/graphcool/chromeless/issues/5#issuecommen...

Please feel more than welcome to take a stab at it yourself and create a PR for this feature! :)


Also interested, this would be excellent fit for a use case I have archiving certain important government websites.

Btw, does the .viewport() option not work in the demo? I'm seeing a `TypeError: Failed to fetch` when I set one.



Thanks!


Interested in this as well!


This would have been awesome to have back when I was heavy in to the UI test automation game. Our best option at that point was a spot instance EC2 fleet and analyzing commits to determine which tests would be the most valuable to run. It's awesome being able to easily run hundreds or thousands of tests in parallel, completely segmented, and pay only on demand. A fantastic use of AWS Lambda! It suddenly becomes reasonable to do full integration tests on every merge request, or even commit, and get feedback to the developer in seconds.


Do the math. If you're suggesting running all the tests, drastically increasing your compute volume, it's probably cheaper to use a dedicated instance running at max CPU utilization.


It absolutely depends on how burst driven your utilization pattern is, but don't forget the cost to actually manage the test cluster, and the engineering cycles that go into optimizing and sharding those tests across the cluster.


For sure, lots of considerations to take into account. If you're already running your own CI pipeline though, tests on top of it are trivial (in my experience running a Jenkins CI pipeline for a python monolith). Tweak your test runner, upsize the EC2 instance, and go grab a mojito.


It's of course based on what's more important for you: Running tests non-stop and perfectly utilizing a compute instance vs having the tests executed when you need the results as soon as possible. I might argue that in most cases the latter is what you'd actually want. Especially given that you're getting billed on a millisecond basis.


I would love to have something like this for my scala API tests. We have 3k tests running on 10 ec2 instances and the entire test-suite takes 10 minutes. If some infrastructure would allow me to pay a little more and run all tests in parallel, that would be a game changer for our team.


This sounds very familiar. I'm amazed every time to see what serverless infrastructure can be used for!


This is a really cool project, but looking closer at the API and issues raised it seems that the features are being over-promised.

- "Do pretty much everything you've used PhantomJS, NightmareJS or Selenium for before".

The main features of those tools plus their ability to handle a large range of edge cases are built up over the years in production use and do not seem to be already in Chromeless. Also, Lambda costs can be a significant point of consideration for professional test automation with large volume.

Nevertheless, there's no turning back as flood gates have been opened and many developers are noticing Chromeless. I believe, with enough dedication from Chromeless maintainers, they may be able to channel the attention and contributions to shape Chromeless to be the main challenger to existing test automation approaches. That will really be a blessing to the open-source community!

The only catch I believe, is it may be easier for those existing tools to be made working in Lambda or implement a similar form of parallelism while still having their mature API, than for Chromeless to catch up to the state of maturity of those tools. But as they say, growth solves almost every problem, so issues like these may be ironed out through collaborative efforts from contributors/maintainers.


Hi kensoh, thanks a lot for your great comment.

I totally agree with you! It took years for these tools to mature and so it will be the case for Chromeless. There are probably a range of edge-cases that yet have to be solved but like you said, I'm very optimistic that together with our great community, we'll be able to handle all of these cases.

The big incentive for us to create Chromeless instead of using Nightmare or similar (which I've done for years) was the fact that you can now use headless Chrome (which provides a way more stable foundation) + the ability to execute the code on AWS Lambda which solves the parallelisation question. I hope this makes sense to you :)


I'm really excited for you guys, it looks like your implementation through AWS Lambda has struck a chord. I really can't wait to see how Chromeless maintainers & contributors give back to the open-source community by challenging existing ways to do stuffs and introduces new innovations in this space.

Yes perfectly, I believe for any one very serious about test automation or browser automation in general, they will make their own tool and bring it to market if there isn't already one that meets their needs. :)


I've been using serverless-chrome for the past few weeks and this looks like a big improvement in usability!

https://github.com/adieuadieu/serverless-chrome


Chromeless is actually built on top serverless-chrome and was developed together with the author of serverless-chrome :)


;-)


Really cool collaboration and project! :)


I've got the impression that lots of sites block AWS IP addresses. I wonder if this would hamper the practical use of this on Lambda.

I'm doing something similar, and this concern was one motivation for running in our datacentre vs EC2.

Does anyone have concrete info on rates of bots blocked from AWS IPs?


I assume the number one use of this would be test automation for one's own sites so blocking would not be an issue.

What are sites' motivations for blocking AWS IPs? I bet there are some reasons I would agree with even though the somewhat crude method of blocking ip range would have some unintended consequences (e.g. blocking people running a personal VPN).


>What are sites' motivations for blocking AWS IPs?

I block AWS. So many crawlers up to so much nonsense! I don't block by IP, but by hostname.

  $block='.amazonaws.com';
  $ua = @$_SERVER['HTTP_USER_AGENT'];
  
  if (stripos($rh,$block)!==false &&
  	stripos($ua,'Silk')===false &&
  	stripos($ua,'Safari')===false){
  
    	$block_visitor=true;
  	$message="Blocked Host:Amazon Web Services";
  }


Just curious what have you seen crawlers do to make you conclude they're up to nonsense?


Well, from amazonaws.com, there are so many requests for wp-login.php!

And then all the off-brand scraping companies use amazonaws.com.


What do you mean by off-brand scraping? You mean search engines that you haven't heard of, or copyright violating orgs?


Some AWS visitors:

Cliqzbot, VidibleScraper/1.0, CheckMarkNetwork, CCBot/2.0 (http://commoncrawl.org/faq/), linkdexbot/2.2; +http://www.linkdex.com/bots/

That last one is a "SEO platform".


Indeed, so many of AWS’s IP ranges are used for DDoS and malicious behaviour they end up getting blacklisted due to their poor reputation. It’s a bit like using one third party resellers shared IP ranges in your mail relay - you’re asking to end up on reputation based lists. There’s a lot to be said for your IP reputation on the internet and when you outsource that - you outsource your freedom to maintain your reputation risk.


Fairly simple (if this is like the other programmatic headless browsers) to make a request through a proxy.


And where do you run the proxy?


In my experience this isn't happening.


How do cookies work in Chromeless? Can I specify a cookiejar to use? Can I keep cookies separated?


How is this different from using a container/vm image which has chrome pre-installed and on request launch it in headless mode, accessing the instance via chrome-launcher and manipulating the browser with chrome-remote-interface?

You can then use the vm/container as a function to match AWS lambda.

Is it the that the api is more-user friendly or selenium w3c complaint?

Genuinely curious, don't know much about this project.


I've been super excited about Chrome headless but haven't had a chance to dig into using it yet. The api here looks amazing for getting started without getting lost in the weeds. It'd be fairly trivial to hook this up to a Slackbot and to get on-demand screenshots of various pages on my websites, etc.


A month ago I was trying to migrate some tests from phantomjs to chrome headless. The big blockers I ran into were that chromedriver automation has certain functionality that only works with a special chrome extension that it installs- but headless chrome doesn't support extensions.

There's work being done to fix this, but it's still in progress, last I checked. Until then, resizing windows, taking screenshots and a handful of other things simply don't work.


The last time I tried headless Chrome, file downloads were a PITA. Has anyone tried downloads with Chromeless?


Other than taking a screenshot and evaluating JS code in the context of Chrome and returning JSON, we haven't yet implemented any file-download features. But it might be possible for us to implement something. Would you mind creating an issue here describing your use case so we can discuss it further? https://github.com/graphcool/chromeless/issues/new


By design, headless Chrome disables file downloads. This is being tracked at this issue to offer a way to enable that, and the issue seems to be moving along =) https://bugs.chromium.org/p/chromium/issues/detail?id=696481

EDIT - above is assuming downloading a file by simulating a click event to perform the download. there may be other workarounds by script injection etc to use XMLHttpRequest() for downloading a resource directly.


I'll just add some related projects I've used / tried in the past.

The promise of fast execution time in parallel is tempting with Chromeless. Thanks for sharing.

- https://github.com/webdriverio/webdriverio

- https://github.com/nightwatchjs/nightwatch

- https://github.com/assaf/zombie

- https://github.com/dhamaniasad/HeadlessBrowsers


Thanks a lot for bringing this up. We've tried all of the projects listed above before we began to implement Chromeless.

Ultimately it was the combination of using headless Chrome and the ability to execute code in parallel on Lambda, which made us invest in Chromeless.


How hard do you think it would be to update Nightmare/etc. to use headless Chrome under the hood? Asking as someone with some interest in the space but little experience with the codebases.


Another platform that supports headless chrome: https://devexpress.github.io/testcafe/

The error report given by this are some of the best.


This is pretty nifty! I've been keeping an eye on Phantomium for a while, I wonder what's come out of it.


Looks like Google releasing headless Chrome is really shaking up the test automation domain. This is a really cool project :) The other day I saw another interesting and refreshing implementation using GraphQL - https://github.com/joelgriffith/navalia (not affiliated with the project, just got to know from a PhantomJS issue).


Looks super promising, the API is really neat and running the tests in parallel is a big plus! Awesome work!


(Shameless promoting): for an API version of the prerendering functionality, with no warm-up latency, we're running a large cluster of Chrome headless instances here: https://www.prerender.cloud/


Sounds like a great service and very useful for frontend developers.

We've built something similar internally and will shortly migrate it to Chromeless. We basically use it to pre-render our websites and docs: https://github.com/graphcool/prep


We've been using chrome-remote-interface for test automation in a project that makes heavy use of Lambda for a distributed event processing infrastructure. I'm looking forward to seeing whether we can implement this for running our test automation suite!


Would love to hear how that goes. Please reach out if you have any problems or questions. The easiest way is to ping me on Slack: https://slack.graph.cool


One minor housekeeping comment:

The first two examples seem to return 404, for me:

https://github.com/graphcool/chromeless#examples


Thanks a lot. This is fixed now!


So to clarify - this is basically a Node.JS wrapper around Chrome headless, right? =_)

Seems pretty awesome.

My use case is to take screenshots of various pages - the docs don't mention the default viewport dimensions, btw.


Yes exactly + this can run (in parallel) on AWS Lambda, so you don't need to worry about provisioning & running servers. That's actually the part I'm most excited about :)


It's parallel because AWS Lambda is inherently parallel? Or are you referring to within JavaScript?


Parallel in that you can invoke many Lambda functions at the same time and have them run independently of each other


So basically if I had 200 automations I could run them on 200 lambdas and have them finish by the time the slowest one finishes? That pretty awesome, specially for testing. For many cases this would also fall under the free tier since it not that many requests/usage...it kinda seems too good to be true. Am I missing something?


> For many cases this would also fall under the free tier since it not that many requests/usage...it kinda seems too good to be true

(since I've ran into the same trap a couple of weeks ago and ended up with USD 650 of unanticipated charges): the free Lambda tier includes

* 1M requests

* 400k GB-seconds

--> the GBsec can be a serious bottleneck. Imagine you're running each Lambda instance with 512MB RAM and each instance takes 2 mins to complete your test. This means that one Lambda instance is ~61.5 GBsec, meaning you can execute ~6,500 of these instances per month to remain in the free tier.

Depending on how extensive your tests are/how often you run them, you might run out of free GBsec well before you'd run out of the requests quota.


Granted that means running ~216 instances per day, or 9 instances per hour (taking 18 minutes per hour to run total). Now you're right, if you're running a screenshot service then this will kill you real fast.

However assuming a 8 hour work day you then get ~27 instances per hour. Each test takes two minutes to run, so for a single user testing, assuming a code - test - code - test routine, you'd be able to do that nearly continuously, for 8 hours a day, every day of the month (no weekends or days off). Seems safe to assume that wouldn't occur.


Yep. That's one of the main reasons why we're so excited about this project!


Haven't had time to read the source yet, how are the lambda headless chrome reliability issues dealt with? Is it a different chrome build than serverless-chrome?


For now, we're using the workaround proposed here: https://github.com/adieuadieu/serverless-chrome/issues/41#is...

The Chromeless Proxy service uses the @serverless-chrome/lambda package as is. Same build.


Nope, welcome to the serverless future! :)


You guys should put up an example on how to convert a (small) test suit to use it, like you said you did in your top comment. The examples you have are cool but don't really help visualize that parallelism gain.


Sounds like a great idea. Would you mind creating an issue here so we can track this? https://github.com/graphcool/chromeless/issues


Done.


There's a section in the readme which I'm not sure of - "Running integration Tests for example is much faster." I guess that gains is from the parallel executions. But for the integration testings, there is still work to be done to make a central controller / runner to distribute various tests to different Lambda instances, correct? If that's the case, is loadtesting using Chromeless be another use case? Just initiate in parallel and blast the app infrastructure. It won't be efficient in terms of costs as using non-real browsers but this is probably as realistic as simulating real users load through real browsers.


agreed no default is mentioned but you can declare with `await chromeless.viewport(1024, 800)`


That seems to break for some reason, but you can set it like this:

https://chromeless.netlify.com/#src=const%20chromeless%20=%2...


Can I use this somehow in a container? Farm out headless chrome + selenium on a local data center? I would be grateful for any hints.


We haven't done so yet, but it shouldn't be a problem at all to set up Chromeless in a container environment. Would you mind creating an issue here to discuss this further? https://github.com/graphcool/chromeless/issues/new


Really awesome. Will definitely be usefull for a project I am working on.

Any plan to support other languages beside JS?


Can any headless browser solution be considered complete without GPU support? A large percentage of sites use GPU enabled and accelerated features and without GPU support, headless options are worthless to many applications.


Wish it had a python wrapper.


There is no reason why you shouldn't be able to build one :) You should even be able to reuse the existing Lambda backend.


naming gore.


Nice tool

Some API documentation says:

pdf() - Not implemented yet

Not implemented yet

How about you deprecate the API for now but reveal the purpose please.


Cool. Now can you use it with webdriver functions instead of re-inventing the API?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: