Hacker News new | past | comments | ask | show | jobs | submit | rusht's comments login

One thing to note is the license: AGPL vs Apache.


Curious why you decided to use MongoDB as opposed to HBase or Cassandra? Discord moved away from MongoDB because its sharding is “complicated to use and not known for stability.” Are these issues not relevant anymore?


HBase is asking for operational pain. Brownouts of region servers, Hadoop dependencies and a million knobs to tune. Add kerberos and that's the road to hell.


Hi, one of the developers here, there was no specific reason why we went mongo at the start, but there have been talks to move to a different database if needed. Currently mongodb is fine for us.


Just want to suggest that if you're interested in doing this in the future, some investment in defining interfaces up front is worth doing. I took a quick look at the codebase and it looks like mongo is "in there pretty good"[0] without any abstraction to make it easily shimmable.

Just a little specification around the that interface (Trait) will go a long way to making other backends possible and should make it much easier to know and manage the API contract a capable database must provide.

[0]: https://github.com/revoltchat/delta/blob/master/src/database...


> Currently mongodb is fine for us.

As someone who has administered too many MongoDBs, those are some famous last words.


Cassandra or HBase are not exactly a utopia either though


Can confirm.

We (Discord) moved off of MongoDB for various reasons and are quite happy about that decision but managing Cassandra/Scylla clusters is not exactly a walk in the park either.


If you had to do it all over again, would you still start with MongoDB or would you go with Cassandra right off the bat?


I didn't make the original decision but if I were starting something and I had no idea whether or not it'd be successful, I'd do whatever was the absolute fastest way to get to MVP. That'd probably be a cloud database, honestly -- but a modern MongoDB would be technically fine too (licensing stuff notwithstanding.)

Most startups fail not because they picked a suboptimal database for their usage but because they didn't build something that was good or it didn't achieve product market fit. I wouldn't worry about your database over-much in the beginning (unless it's critical to what you're doing and in that case, worry like hell, but you will probably know if that's the case.)

Many of Discord's issues with Mongo were exacerbated that we were using TokuMX which was abandoned shortly after we started using it. A few years into Discord we found ourselves with a rapidly scaling dataset and userbase that was built on top of an abandoned and not super popular third party version of MongoDB. (Funny story: at one point towards the end we realized that all of the packages had been pulled from every mirror we could find and literally the only place we could find the package files was off of some gov.uk mirror... that was a bad day. Thankfully we had the hashes and were able to validate the packages...)

FWIW, we did honestly debate moving our core user model (which was what was left in TokuMX by the end there) into a modern version of MongoDB -- some of the things we did (reverse indexes, secondary indexes, locking, etc) are much more complicated in a database like Scylla. It was tempting to just migrate the data from one "Mongo" to another and call it a day.

We didn't for a variety reasons, not least of which is keeping things simple by reducing the number of technologies you have in production (like when we chose to embrace Rust we went back and migrated nearly all of our Go systems).

Anyway, I'm pretty happy with not running MongoDB anymore, but not because MongoDB is inherently bad. It's popular for a reason!


Really appreciate this great, detailed answer! 100% agree with getting to an MVP with PMF as quickly as possible should be the top priority for a startup.


discord has a good writeup of their migration from mongo to cassandra. https://blog.discord.com/how-discord-stores-billions-of-mess...


If one can afford it, mongodb Atlas is really just fine. As someone said, at scale, all technologies need significant know-how and time investment.


It is not "just fine". They do not have encryption on the wire by default and require an enterprise plan, which is percentage of income based, to enable it.


As I said, if one can afford it. Also not sure what you mean by wire encryption is on for enterprise plan and income based. Their plans are size based.


I worked with them and had meetings with them in person. SSO and TLS were Enterprise-only features, and enterprise pricing was percentage based. It was insanely expensive.


Interesting. Probably it's different now. We use both SSO and TLS and TLS is on by default now. May be they changed their plans since you spoke with them.

https://docs.atlas.mongodb.com/reference/faq/security/


Thanks, seems like they indeed changed it (good on them! this always bugged me.)

> Atlas requires TLS connections for all Atlas clusters. After July 2020, Atlas will enable Transport Layer Security (TLS) protocol version 1.2 by default for all new Atlas clusters regardless of the MongoDB version.

The last time I interacted with them was in 2019.


I would strongly urge you to consider migrating to an alternative database, given MongoDB's SSPL license that is difficult to comply with.


What are the best nosql databases? half of our devs dont like sql much (ive tried to make them use orms too, didnt work)


I'd probably recommend ScyllaDB for NoSQL. It's a replacement for Cassandra (which it's mostly compatible with) which is more or less the de facto standard for large NoSQL deployments at big companies, but it's written in C++ rather than Java so it's even faster (and has more consistent latency) and easier to deploy. And it's been around long enough at this point that it's established and not likely to just disappear.

It's a shame your devs don't like SQL. It's probably my most useful developer skill. Saves so much time elsewhere. Having said that, a messaging app that you really want to scale HUGE (like Discord or Facebook Messenger huge) is one place where the NoSQL solutions are justified.


ScyllaDB looks interesting but that AGPL license is a no go for some organizations.


True, but revolt seems to be using AGPL themselves, so I doubt it would be an issue for them.


> half of our devs dont like sql much

SQL the language or defined schema and relational data?

I'm honestly curious what it means to "not like" sql - and what/why one would prefer eg mongodb.

Ed: looks like they're running quite close to the db - but doesn't look like there's much tooling?

https://github.com/revoltchat/delta/blob/master/src/database...

https://github.com/revoltchat/delta/blob/master/src/database...


> half of our devs dont like sql much

I'd suggest you and your team shouldn't rule out SQL-like databases, given a lot of very competent NoSQL databases have SQL-like syntaxes (say Cassandra or ScyllaDB, what Discord went with). And regarding hiding it behind an ORM, if you want or need cream of the crop performance not only do you need to have chosen a database that fits your needs, but you will also need to occasionally work very close to the database to avoid abstraction inversion situations.


“Best” depends on your use case. I don’t even think a SQL database is an option for a chat app since it’s write heavy. FB Messenger uses HBase and Discord uses Cassandra and they’ve done their research at scale, so those could be possible options.


Which self hosting instance would go at the size of Facebook or Discord? Just because Discord took a method doesn't mean a competing product should too.


I am assuming they want to provide a hosted service just like Discord at some point.


unQLite or SQLite would be a great way to encourage self-hosting.


People opting to use a meme "database" instead of a real DBMS is kind of a large red flag to me. I am hard pressed to imagine data more relational than chat messages and user accounts.


There's a reason all the big chat apps are using NoSQL.


I refuse to believe this is due to technical reasons, as long as we are talking ACID compliant data storage. A well configured PostgreSQL will blow MongoDB out of the water while actually caring about your data.


I’d be interested in OP’s answer here as well.

I know dapr injects a sidecar so you can code in any language and then use their SDK to call other microservices. Encore looks to be Go centric and uses code generation to create actual Go functions, which provides much better IDE support.

I did not see this in the docs but I’m not sure if Encore supports retries with or without exponential backoffs; dapr supports both.


Hey HN,

I am Rush, one of the makers of Onepanel. Onepanel is an open source, production scale vision AI platform, with fully integrated components for model building, automated labeling, data processing and model training pipelines.

We built Onepanel to significantly reduce the complexities with managing infrastructure and disparate tooling and at the same time allow teams to easily integrate their own tools into reproducible pipelines.

Under the hood, we integrate our own and other best of breed open source components [0] to provide a seamless user experience and abstract away infrastructure complexities that come with running parallelized data processing and training pipelines on different cloud providers. We leverage Kubernetes and deploy cloud provider specific components for networking, network policies, auto scaling, automated TLS certificate provisioning, logging, GPU plugins and more [1].

Our near future goals are to add APIs for inference and VNC enabled workspaces [2] so teams can also run simulation environments inside of Onepanel.

We're excited to share Onepanel with the HN community and look forward to hearing your feedback! And of course we welcome and encourage any contributions [3].

[0] https://github.com/onepanelio/core#acknowledgments

[1] https://github.com/onepanelio/manifests

[2] https://github.com/onepanelio/templates/tree/master/workspac...

[3] https://github.com/onepanelio/core#contributing


Hey Rush,

very impressive work! Do you know how much it would cost to self host on a cloud provider per month? (operating costs of the underlying infrastructure without running any job) I am very interested in trying this out and filled out the form. Would love to talk to you about how you came up with the product and how much work it was and such. Please drop me a line if you have some time :) Cheers


Thanks! You can see the estimated hourly cost and list of required resources for each cloud provider in our "Cluster information" section [0]. Would be happy to talk about the product and the effort, I will reach out to you to discuss.

[0] https://docs.onepanel.ai/docs/deployment/cluster/cluster


Hey HN,

I am Rush, one of the makers of Onepanel. Onepanel is a Kubernetes-native deep learning platform for computer vision with fully integrated components for model building, semi-automated labeling, data processing and model training pipelines.

We built Onepanel to significantly reduce the complexities with infrastructure and disparate tooling so teams can be productive at every step of their workflow but at the same time have the flexibility to change them and bring their own tools.

Under the hood, we integrate our own and other best of breed open source components [0] to provide a seamless user experience. We also try to abstract some of the complexities of Kubernetes by deploying cloud provider specific components for networking, network policies, automated TLS certificates, logging, GPU plugins and more [1].

Our near future goals are to add serverless APIs for inference and VNC enabled workspaces [2] so teams can also run simulation environments inside of Onepanel.

We're excited to share Onepanel with the HN community and would love to hear your feedback! And of course we welcome and encourage any contributions [3].

[0] https://github.com/onepanelio/core#acknowledgments

[1] https://github.com/onepanelio/manifests

[2] https://github.com/onepanelio/templates/tree/master/workspac...

[3] https://github.com/onepanelio/core#contributing


Link to the HuggingFace model contributed by Microsoft: https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-u...



Out of curiosity, what type of project was this? Did you end up rewriting the entire 100K loc in Rust? I ask because Docker and Kubernetes are large projects and as far as I know, using Go hasn't been an issue for them.


it actually has. Try to ask around some former workers of those projects.


Any links?


I assume they mean most of the features that were once in separate apps are now built into the OS.


pgadmin4's Query Tool [0] has a similar analyzer.

[0] https://www.pgadmin.org/docs/pgadmin4/1.x/query_tool.html


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: