I'm not a "serverless hater", but every company I've ever worked with had backend processes that were not tied to HTTP requests. I still keep actual servers around because the HTTP gateway is not the pain point. It's long-running processes, message systems, stream processing, and reporting.
That said, I look forward to the company (or side project) where "serverless" can save me from also assuming the "devops" role.
At my last gig, we were using Firebase, Google's acquired-and-increasingly-integrated serverless solution. It was straightforward to have custom GCP instances that integrated with and extended our regular serverless workflows. In that scenario, it meant the compute instances tended to be extremely simple, as they were essentially just glorified event handlers.
Interestingly, as Firebase evolved during our use, nearly all of our external-instance use cases were obsoleted by more powerful native serverless support, esp. around functions.
All of which is the best of both worlds for serverless: an easy escape hatch to custom instances, and an ever-decreasing need for that escape hatch.
Hello, I'm building a serverless platform. Could you please expand your "It's long-running processes, message systems, stream processing, and reporting" bit?
Not parent, but I have the same question; I worked in adtech and video analytics before, now with social media. It's usually a mix of some REST APIs, which already are very easy to scale and manage without using serverless, with long-running backend processes, such as:
* video encoding;
* ETL processes;
* other analytical workloads;
* long-running websocket connections with Twitter/Facebook/etc APIs.
From my perspective, serverless solves the "boring" part of making REST APIs easier to manage, which were already very easy to manage.
How would serverless be applies to, say, a python script that streams Twitter tweets using websockets?
You would probably use something like a queue[0] that takes in data from the websocket and dishes it out to lambda functions. You might also use something like Kinesis[1] or other alternatives.
Yes of course, or I could send it into Kafka instead (which makes more sense to me). The point is, how would a serverless process looks like which doesn’t have a REST API and does this long term polling of websockets?
It depends what you're doing with that stream, most basically you would create a nano/micro EC2 instance that will just trigger Lambda events on every new tweet. Or you could create some more intricate script that does a lot of pre-processing and then stores it in RDS or S3, and with each new update to either of those sources kick off a Lambda.
Unless the API can stream directly into one of those sources you'd probably need a long-running process, perhaps running on a CaaS like AWS Fargate.
I guess you could argue where to draw the "serverless" line, at functions or containers, but Zeit is calling this container service "serverless" so I think Fargate would fall into the same category. I think it would make sense for Zeit to eventually support long-running containers too (looks like the current max is 30 minutes, I'm not sure how they chose that number)
Serverless is fantastic for ETL and data analysis, especially for workloads that vary in scale (eg cronjobs). Feed data in, get data out with scaling as needed.
but how do you feed data in? Usually, it's some other service on one of the big 3 cloud providers. I'm using google for my projects these days so it's a mix of Google PubSub and Dataflow.
I think this is the issue/risk with serverless. You either get locked into one of the big 3, or you end up doing all of the ops work to run your own stateful systems. As some of the people above you said, managing and scaling the stateless HTTP components is not the hard/expensive part of the job.
Can't you use a queue service that's essentially just managed kafka/activemq/other-standard-system? I mean sure if you wanted to move off the cloud vendor you'd have to run your queues yourself, but if you're programming to the API of well-known open-source queue system then you're never going to be very locked into a particular vendor.
The short answer is yes, you can do that, but it starts to get nuanced rather quickly. The context of this is a desire to go “serverless” and that solutions like this only give you serverless for the relatively easy parts of your stack. If your goal is to go “serverless” I take that to mean a few things listed below.
1) you don’t have to manage infrastructure
2) you don’t have to think about infrastructure (what size cluster do i need to buy?)
3) you pay for what you use at a granular level. (GB stored, queries made, function invocations, etc)
4) scale to zero (when not in use, you don’t pay for much of anything)
Most things don’t hit all of these points, but typical managed services hit very few of these points. Sure, I can use a managed MySQL, but it only satisfies 1 of the 4 points.
How does one get locked in when it’s a simple function in X language? Seriously, serverless is just an endpoint they provide. You write the code and they handle everything else.
Because the function is the stateless easy part. To make any non trivial system, in a serverless way, you have to use their proprietary stateful systems. IN my case, google pubsub, google data flow, google datastore, Spanner, etc. that’s where the lock in happens.
Right, because serverless is actually just a cover for "de-commoditizing" the cloud services that companies like AWS built to commoditize datacenters. You hit the nail on the head. It's not completely useless to help less technical people solve the problems that folks like you and I consider "the easy part" and so people will find a use for it.
But the primary utility of serverless is an attempt at solving Amazon's problem of being commoditized by containers.
I’d say something more nuanced. Serverless is increasing commoditization of one layer of the stack at the cost of de-commoditizing a high layer of the stack. This is what makes it a hard decision to grapple with. You’re getting very real benefits from it, and potentially paying a very real cost sometime down the road when being locked into the propietary system bites you.
I think all that is still possible in Serverless. I'm not a serverless architect or anything, but that's typically handled by various serverless queues and related event systems.
Break your operation into a series of discreet tasks. For 99% of use cases, if you have an discreet task that takes 5+ minutes, there's a problem. In most cases, it can be split up.
That said, I look forward to the company (or side project) where "serverless" can save me from also assuming the "devops" role.