Hacker News new | past | comments | ask | show | jobs | submit login
Grouparoo: Open-source app to sync customer data with 3rd party tools (grouparoo.com)
103 points by kine on Sept 14, 2020 | hide | past | favorite | 41 comments



Thanks for checking us out! Co-founder here, happy to answer any questions. There is so much to do in this space, but we’re excited to be getting started.

No engineer wakes up in the morning excited to sync data to Marketo, so we started there - `npm install` and so you can get back to building the core product. We make data self-serve for your non-technical colleagues and we handle all the exhausting integration stuff you don’t want to think about (API nuances, rate limiting, retrying, batching, etc).


First, thanks for providing the `npm install` way to run it. Too many apps require Docker and that's it.

Question: Could we use Grouparoo to replace mixpanel? Would we need to build the client side to collect events and dump that into Grouparoo?


At the moment, it depends what you are using Mixpanel for. If it's about collecting events and storing them, then we have a JS web library [1] for that and we'll happily store them in our database and let you do things with them. It will handle anonymous users and convert/merge them when they log in.

[1] https://www.npmjs.com/package/@grouparoo/client-web


How does this differ from something like Stitch (other than being open source)? We are in the planning stages of a data warehouse upgrade and Stitch seems to fit our needs, but your product looks great. I'm curious how it compares.


We think open source is important, of course, because it has cost and control ramifications, but I'll stay focused.

Stitch looks to be doing ELT in the fivetran-ish space. Their "sources" are lots of tools and their "destinations" are warehouses. Grouparoo can have sources like Mailchimp (did they read the mail), but Mailchimp is more likely to be a "destination" for us.

This is because we are doing more like ETL to those tools. In our current case, the "T" is property and segmentation definitions, often done by end users like marketers. So that notes that our users also include non-engineers. There's less burden on engineers after setting up Grouparoo because the person in charge of Mailchimp are doing those definitions.


How does this compare to Piesync?

There are many tools that do one-way integration from source to destination. Very few do real two-way sync. Is Grouparoo designed for that?


Piesync looks cool. I don't see any databases on their list, so it looks to be mainly syncing between SaaS apps. We are open source and running in your own cloud as more of an ETL process.

Grouparoo is set up with sources and destinations. The most common sources are likely to be databases/warehouses and the most common destinations are tools like Marketo/Zendesk/etc. That being said, Zendesk will likely be a source one day to pull user info into Grouparoo (number of tickets created, lastEmailedAt, etc). Databases can already be destinations (write back out the member of the VIP group to Postgres, in addition to sending it to Marketo and Zendesk).

I'm not sure all of that is two-way sync or not, but it's certainly round-tripping the data. If you really want a full duplication in your data warehouse of a SaaS tool, I'd look into Stitch or Fivetran at this point. Then, Grouparoo will happily read that :-)


What is your monetization strategy? How are you going to turn Gruparoo into a profitable business?


We are doing an open core model. At some point in the future, we'll have an enterprise edition.

The philosophy I've heard (maybe from Hashicorp?) is that the core should solve the data problem and the enterprise edition should solve organizational problems. So the source and destination and the syncing data and all that will stay in the core. At some point, we can do single-sign on, change, change management, GDPR support, compliance, etc in the enterprise version.


So, what's informed your making this open source?


For us, open source is about who has control of the data and the integrations. I don't think the world needs another SaaS marketing tool.

In the past, you needed a large, focused SaaS vendor to be able to store a million users and their properties/events. AWS and friends have caught up and now it's easy. Because of that, we can take that data into your own environment and use it to increase control, customization, privacy, and compliance. The cost is significantly better, too.

Open source is a good way to do that because you know what you are running and can fit it to your needs. Engineers tend to like open source and we've seen interest in extending it. There are thousands of things to connect with, both inside your infra and outside with vendors, and open source makes that possible.


Does grouparoo need to deal with GDPR? If so, how is it handled?


Good question!

GDPR is about giving users control and visibility over their personal data and controlling personally identifiable information.

Grouparoo is an open source app that runs in your own infrastructure (AWS, etc) and it does segmentation of users. The effect of that is that less info leaves your world and goes to third parties. For example, you used to send an address and lifetime value to Braze so that you could make a “high value Bay Area customers” group over there. Now you keep that in house.

On top of that, whatever information is leaving now has a chokepoint, so you can stop sending a user to Braze (and everywhere else) if that is the requirement or return all the information about them easily if that is the ask.


Wow, this is exactly what I started looking for at the start of the day.

Unfortunately I can't use it yet as there's no Postgres SSL support - filed a GitHub issue here: https://github.com/grouparoo/grouparoo/issues/734


Postgres connector enhanced! https://github.com/grouparoo/grouparoo/pull/735

Let us know on the issue if it's working for you.


Thanks! That should be an easy one. What was your goal when you started the day? :-)


Syncing our customer user data to Zendesk without having to write our own syncing service.


Sounds perfect for us. We have some questions on that issue. Let's make it happen!


At a glance, this looks cool - im in the middle of designing an internal Customer Data Platform/Single Customer View to serve as a single data source for various internal & external marketing tools. Looks like this might be worth looking at.

Does it have an API to access the user profile and group data ad-hoc?

Can you stream data in?

Can it trigger a destination sync when the underlying data changes?

Can it do profile merging (visitor -> known customer stitching)?

How can you do reporting/analytics?


Great questions, thanks!

> Does it have an API to access the user profile and group data ad-hoc?

We’ve seen a few approaches here. 1) Yes, there are APIs 2) The Postgres database this runs on is in your data center, so you can read it directly 3) You can write back to your own product database as a “destination”

> Can you stream data in?

We support events via an API. We’ll store the vents and allow your to create profile properties from them. I’m very interested in creating a Kafka or other message bus sort of integration too that brings in data and/or triggers recalculations. No one has needed that yet, though, so it’s just on our eventual list.

> Can it trigger a destination sync when the underlying data changes?

We have schedules, table queries, and events to know profile data has changed. When it changes, it then recalculates groups. Then properties or groups change that are being sent to a destination, it automatically exports there. “Hey Mailchimp, the user changed their first name and should now be tagged as VIP.”

> Can it do profile merging (visitor -> known customer stitching)?

We have the concept of anonymous id before login. When we realize two profiles are the same (usually after logging in from another device or something), the profiles are merged and everything recalculated.

> How can you do reporting/analytics?

This hasn’t been a focus so far outside of our ETL mechanics. You can see who has been imported and exported and with what and when and all of that.

Things get more interesting around properties and their values, but we haven’t gotten there yet. We’ve seen some success at pointing tools like Metabase at the Grouparoo database.


Cool thanks, just installed it now to have a look. Any timeline for when S3 and Redshift will be available as a source & destination?


Let me know how it goes: brian [at] grouparoo.com

Redshift is available now. We've talked about S3, but were looking for input on what kind of formats to read/write. Email me or make an issue [1] with what you were hoping for.

[1] https://github.com/grouparoo/grouparoo/issues/new?assignees=...


I worked with the Grouparoo guys years ago at TaskRabbit. They're a super impressive team. Excited to watch their progress with Grouparoo!


Oh wow, this reminds of Hightouch (https://hightouch.io) and Rudder (https://rudderstack.com). It's interesting that all of these are positioned around data warehouses, which is generally very messy to deal with.


Thanks for the links.

Overall, there's an interesting organizational dynamic that we've seen around data enablement. Marketing and other operational teams need it and it's often locked in the product space. It's usually not a priority for the eng team because they are focused on the core product, but the data is there. The important stuff (ETL copy of the product db) is usually not a huge mess.

We are inspired by warehouse tools like Looker that made that accessible to more people, giving them autonomy to be successful. Grouparoo takes that one step further to add on top of the data and make it actionable in all the other places that people want it.


In my experience this scenario is very accurate. Add to it the growing usage of tools like Pipedrive, Pipefy, Airtable, that sometimes prod-eng team is not even aware.

As a Rudder and Fivetran user, I can see a very complementary use case for Grouparoo. Where the first two are responsible for unifying events and external data in the DW and Grouparoo to sync user data to other tools.

Two other tools that I saw in this space (not Open Source): https://www.calixa.io/ https://windsor.io/


Overall, we believe events are an overused approach in this space. They clearly have their place for impression data, but product dbs or data warehouses have lots of good stuff that isn't being used enough to drive goals. Then there's all the issues with getting that right in these other tools that no engineer wants to deal with (rate limiting, formatting, schedules, idempotency, etc).

So I hope there's a place because it's the gap we saw and are hoping to solve and share with the community. Email me if you want to chat more: brian [at] grouparoo.com


I see, yeah that makes sense. The integrations on your website kind of clarify your point too.

You are moving up the stack, and providing value at that layer. As a general rule that formula works in most industries.

I'm a big fan of looker, and I hope to see you guys grow!


Hightouch.io cofounder here! Thanks for the S/O. Our online presence is quite limited so I'll post a summary here...

We've built an e2e marketing automation platform on top of your data warehouse. Marketers can interactively explore their customer base, run targeted campaigns in downstream email/ad/etc tools, and analyze results leveraging all the data they have in their warehouse.

RE: "messy data" -- Totally agree with bleonard's point that overall, the trend is towards data enablement. That said, I don't think any of the solutions in the markets (even Looker) suffice. I've attended dozens of calls with Looker users who first say that Looker offers self-service exploration but then fail to retrieve fairly basic information via a Zoom screen-share. The truth is it's really hard to do self-service data exploration generically. I think what's lacking from the "BI market" are more verticalized solutions on top of your warehouse (think UIs like Amplitude's funnel analysis, Intercom or Kustomer's segmentation interface, etc.).

To make our product work, we've built UIs that are super focused on particular tasks as well as a pretty nifty graph-based "modeling layer" that sits above your warehouse (which ideally, you use DBT/Dataform on) to abstract over complex JOINs and such.

This whole space is super fascinating to me. Always happy to exchange notes and talk shop RE: marketing, warehouses, customer data, etc. If you have thoughts, hit me up at tejas [at] hightouch.io.


Founder of RudderStack here. Totally agree that data warehouse is messy but marketing systems are way more messier. In my previous startup, we did a bunch of Marketo implementation and issues I ran into

1) generating a simple report (like how many people came to your website and then opened your emails in last 1 yr) used to take for ever. Storing data was costly.

2) You could not generate complicated reports, say combining marketing + product data (e.g. number of people who came through campaign X and did used the product)

3) You were stuck with their not-so-great UI.

4) Analytics is one use case. Segmentation is another. If you want to create a segment of users (e.g. customers using the free tier of your product and have become active in last 7 days) and sync that segment to multiple destinations like email, salesforce etc, there is not a great way to do that from all the marketing systems.

On the other hand, if you can get all the data in your warehouse, you can use the best of breed tools for storage (Snowflake etc), Visualization (Looker/ChartIO/Tableau) and so on.

But you would need a product like Segment, RudderStack (and now Grouparoo) to get the data into the warehouse and sync it from warehouse back to different destinations.


Grouparoo made it dead simple to connect a mysql database and generate information. Easy to Install and Integrations make it easy to send the right data to each tool you use. No code is necessary to change what data gets sent.


Very cool to see. A lot of folks (Notion, Figma, Loom) use our service (https://getcensus.com) for this purpose if folks are curious to check out a SaaS version of the concept.

We support all the major data warehouses (incl. straight Postgres), connect to a bunch of different applications, and don't store any of your customer data!

Bonus: we recently added native support for dbt :-)


I had a demo of Grouparoo a few weeks ago -- super cool platform that solves so many ETL / integration challenges.


This was such a massive problem at the large company where I used to work. Excited to see the progress!


Congrats guys from the team @ RudderStack. Glad to see more open-source products in this broad customer data space.

Is it fair to say this is more like the Segment personas's product? We see a bunch of use cases for personas (which we don't have in RudderStack) so can point to you guys.

Congrats again on the launch.


Yeah, the group-building part and syncing those (for example to Static Lists in Marketo) is like Personas. It was always funny to me that you had to pay quite a bit for Personas for Segment to actually, you know, segment.

The gap we saw was around understanding the user and segmenting in a way that could be shared across multiple services and the product. And doing so in a way where you controlled the data and the total cost was managed.

It's nice to see there are others open in this space. The trends are certainly in the open direction to provide a lot of value and control in a way that's good for everyone.


Can’t wait to try this out sometime soon. I’ve worked with all the founders and they’re awesome.


What if I need to store all user data (not limited to profile information) in google sheets and I provide only an interface for an specific usage. I cannot see how good this scenario fits. Use cases are quite simple


Everything in Grouparoo is somehow tied to a user, so if you have a list of "locations" or something, that's not a good fit at this point. We basically need a "foreign key" to a property of the user.

If you do have that, we have support for google sheets[1]. You share a sheet with a service account and that allows it to be a source in Grouparoo. From there, you can make groups and send to destinations.

We'd be curious about your use cases. Feel free to make an issue[2] with what you are hoping for and we can discuss there.

[1] https://www.grouparoo.com/blog/google-sheets-source

[2] https://github.com/grouparoo/grouparoo/issues/new?assignees=...


I've worked with the founders in the past and they're legit! I'm excited to see where they'll go with Grouparoo.


Looking forward to tracking this project. I totally get it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: