Dependent on Deep Learning Models

gwern · on July 16, 2022

#1, and run at a loss. You have no idea if anyone even wants your startup idea or if you have any product-market fit, worrying about training your own model is putting the cart before the horse. If no one wants your product even ignoring the cost of ML, then you don't need to worry about any of that!

If you do potentially have a viable idea and it turns out there is >0 people out there willing to pay for it, then you can experiment with cheaper models and optimizations as necessary. ('People want our product we sell at a loss so much we may go bankrupt' is a good problem to have.) Also, consider that as time passes, your problem may be solved for you: lots of people moan and whine about the OA API and wrung their hands about how no one would ever be able to afford to train their own GPT-3 - but here we are, just over 2 years later, and you have a wealth of alternatives in API or FLOSS model, like Jurassic or GPT-J/Neo-20b or YALM or OPT or BLOOM or... Even if none of those work for you or can be finetuned or something, it is also now easier than ever to train your own: countless bugs have been worked out, better training recipes documented, newer better GPUs come out (A100s are no longer rare, and H100s are coming soon), and older GPUs themselves are enjoying a pricing correction.

chronicler · on July 17, 2022

This was really insightful, thanks for taking your time out to respond.

Jugurtha · on July 16, 2022

>a user would never buy it at this price.

You don't know that yet.

>Here are my options as I see them, if you feel there’s another route please don’t hesitate to add:

- Leverage the expensive APIs until I raise enough money to train my own models.

- Start with a subpar model that is trained on a limited dataset (this runs the risk of damaging the perception of how good the product can be).

These are not mutually exclusive if you offer tiers priced proportionally to the model's performance or, better yet, to the value the users get from the model whatever its performance.

A model with subpar absolute performance from a "metrics" standpoint can relatively be good enough for what the user is trying to do.

I'm not sure how much experience you have with ML/DL, but one important point when you serve a client is to get what matters most. You want to know the impact/cost of a false positive and the actions your model triggers and how many of these you can get away with.

For example, one company we interacted with wanted to predict an event which, if it happened, would cause a loss in the nine figures. What's the consequence of an alert? You wake up an engineer if they're asleep and they have one on rotation anyway. The downside is nine figures and possibly catastrophic consequences (lives, environmental, financial, infrastructure, and actually destabilizing the supply of energy for a country and unforeseen ripples). They're willing to accept some false alerts.

What's the consequence of a false positive and a false negative.

Also, the timeliness. The client originally asked for a prediction 48 hours before the event. I asked at what time would it be useless to alert you? They answered it was never useless and that even if we alert them 2 minutes before, they have protocols and mitigation measures and they can do things. They'll have to do them really, really, fast, but they know what to do and they're trained for these situations. So we brought the scope from 48 hours to 2 minutes.

You want to know what's important and what problem they're trying to solve.