(i work at airbyte) this was announced last month at our conf to a very positive response (https://movedata.airbyte.com/, keynote)
The context is that Airbyte is now (after pivoting 3x during YC https://airbyte.com/blog/how-we-pivoted-3-times-in-the-1st-m...) the largest/fastest growing open source community (see our github https://github.com/airbytehq/airbyte) of data pipeline connectors[0], so in a sense they have always been free if you are self hosting. But now using them on Airbyte Cloud is going to be free as well aka "we will do your ELT for free no matter the volume as long as our connectors are not GA yet".
I like the idea but practically I wonder how it works out. I feel like I would have a double disincentive to use an alpha/beta connection as a) it might fail and b) it might hit GA and then suddenly my free workflow breaks?
Maybe I am thinking about it wrong and this is more aimed at people who were previously paying for something and now get the same thing for free.
i dont understand, why would it break when it goes from beta to GA?
and failures happen, anyone who promises you otherwise is lying. you'd have them yourself if you DIY. what helps is having good monitoring, a large open source community with strong first party support, and a good development/testing framework so most breakages can be fixed the same day they happen.
- a) maybe I don't care about 99% reliability and I am attracted by the fact that it's free. But I have no control over when it will stop being free, so I don't implement it.
- b) I have money to pay but I need reliability.
They pull in different directions but both seem to pull away from a free alpha version. I would imagine if you could offer them as alternatives you might get both personas but with this model you kind of lose both?
Again, I am sure this is a hole in my understanding, not the model. Just curious as to where it is.
Hi! Here's how we think about it.
Given the economical context, I would say pricing becomes a strong argument. It's also one of the reasons we built the program, it's a way for us to give back to the community.
Having more usage on our side will help us tremendously at bringing all those connectors to higher reliability standards as we're exposed to more and more use cases. We do believe most people will be willing to pay for reliability and will stay with us when those connectors will become GA. We will also give a grace period for them to switch if they want to (I guess Airbyte Open Source will be the best alternative then :) )
For me right now, Airbyte is that tool I wish we had at my last startup.
We we're pulling data from a lot of weird places (servers in the back of mom and pop vet clinics). This meant writing a lot of one off scripts to populate our databases. We learned the hard way about scheduling, retries, resource monitoring, error reporting etc..
Would've loved to have someone else take care of all that for us.
Anyway love letter over.
I'm currently wondering if I can use this to power some of my web scraping scripts....
Airbyte is a godsend for us. It works really well for most use cases. Unfortunately, we had to write our custom thing for a large table (8 billion rows) and "tricked" Airbyte into thinking it had done the first sync. After that, it continued to sync happily.
Hoping that more users will bring more maturity and better solutions to those edge cases.
I have recently inherited the small and unsophisticated data engineering practice at my company and it's been a learning experience. The market for tooling seems to be incredibly frothy right now and I'm almost at a loss to make good selections. Is airbyte a direct competitor to stuff like fivetran or AWS Glue?
Yep. Airbyte is the leading open-source solution (with a Cloud hosted solution too). Fivetran is the leading closed-sourced one.
Open source makes it the solution future-proof, in the sense that it will address your future long-tail or custom needs, while a closed-source solution won't which will require you to build/maintain connectors in-house again.
Congrats! Shameless plug (Original Author and Founder) - Go High Performance alternative (http://github.com/cloudquery/cloudquery). no backend, no ui, everything stateless, as code and you can run it anywhere including ECS.
We're exploring whether or not we can support Airbyte connectors on our upcoming Meltano Cloud offering. As it stands we have some community members running Airbyte connectors in their production environment with Meltano and they seem quite happy with it!
As with anything there are tradeoffs though - gaining the ability to have a connector in a non-Python language comes with the overhead of running (likely) Docker-in-docker. Also, connectors not built on our SDK[0] are missing out on some nice features like batch message[1] support (for bulk loading) and stream maps[2] for inline data transformations.
The context is that Airbyte is now (after pivoting 3x during YC https://airbyte.com/blog/how-we-pivoted-3-times-in-the-1st-m...) the largest/fastest growing open source community (see our github https://github.com/airbytehq/airbyte) of data pipeline connectors[0], so in a sense they have always been free if you are self hosting. But now using them on Airbyte Cloud is going to be free as well aka "we will do your ELT for free no matter the volume as long as our connectors are not GA yet".
This is a massive commitment to improve the quality of our connectors, which is also something we have been pushing the industry on: https://airbyte.com/blog/connector-release-stages :
Alpha: new, basic docs, works, passes acceptance tests
Beta: Alpha + at least 25 active users + >90% sync success rate + snapshot tests + all streams + severe issues handled + security + supports checkpointing + SLA on cloud
GA: Beta + >99% sync success rate + more than 50 active users + <24 hours downtime + polished docs + performant
It's been going very well; you can see how many connectors we promote to GA each month in our slack (https://slack.airbyte.io/) and changelogs, and our new lowcode CDK (https://www.youtube.com/watch?v=i7VSL2bDvmw) is helping new connectors insta-promote to beta.
We hope to set the new standard in data integration and this is still only day 1.
[0]: good explainer on why companies are moving towards ELT in the first place for the uninitiated https://airbyte.com/blog/elt-pipeline