Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I built an OSS alternative to Azure OpenAI services (github.com/bricks-cloud)
200 points by luyuanxin1995 9 months ago | hide | past | favorite | 61 comments
Hey HN, I am proud to show you guys that I have built an open source alternative to Azure OpenAI services.

Azure OpenAI services was born out of companies needing enhanced security and access control for using different GPT models. I want to build an OSS version of Azure OpenAI services that people could self host in their own infrastructure.

"How can I track LLM spend per API key?"

"Can I create a development OpenAI API key with limited access for Bob?"

"Can I see my LLM spend breakdown by models and endpoints?"

"Can I create 100 OpenAI API keys that my students could use in a classroom setting?"

These are questions that BricksLLM helps you answer.

BricksLLM is an API gateway that let you create API keys with rate limit, cost control and ttl that could be used to access all OpenAI and Anthropic endpoints with out of box analytics.

When I first started building with OpenAI APIs, I was constantly worried about API keys being comprised since vanilla OpenAI API keys would grant you unlimited access to all of their models. There are stories of people losing thousands of dollars and the existence of a black market for stolen OpenAI API keys.

This is why I started building a proxy for ourselves that allows for the creation of API keys with rate limits and cost controls. I built BricksLLM in Go since that was the language I used to build performative ads exchanges that scaled to thousands of requests per second at my previous job. A lot of developer tools in LLM ops are built with Python which I believe might be suboptimal in terms of performance and compute resource efficiency.

One of the challenges building this platform is to get accurate token counts for different OpenAI and Anthropic models. LLM providers are not exactly transparent with the way how they count prompt and completion tokens. In addition to user input, OpenAI and Anthropic pad prompt inputs with additional instructions or phrases that contribute to the final token counts. For example, Anthropic's actual completion token consumption is consistently 4 more than the token count of the completion output.

The latency of the gateway hovers around 50ms. Half of the latency comes from the tokenizer. If I start utilizing Go routines, might be able to lower the latency of the gateway to 30ms.

BricksLLM is not an observability platform, but we do provide integration with Datadog so you can get more insights regarding what is going on inside the proxy. Compared to other tools in the LLMOps space, I believe that BricksLLM has the most comprehensive features when it comes to access control.

Let me know what you guys think.




This looks like a useful value-add, but I'd hesitate to call it a replacement for Azure OpenAI. Governance and observability features of Azure OpenAI are really secondary to the stability and reliability guarantees of Azure owning and operating the model instead of OpenAI...


Azure OpenAI services also makes it possible to choose data centers outside the US.


But aeguably, being OSS, you can integrate additional OSS that performs thebafromentiones tasks of Obserbability. Just more hacking things together, but in theory should be doable.


This reminds me of an idea I had for an OpenAI proxy that transparently handles batching of requests. The use case is that OpenAI has rate limits not only on tokens but also requests per minute. By batching multiple requests together you can avoid hitting the requests limit.

This isn’t really feasible to implement if your app runs on lambda or edge functions, you’d need a persistent server.

Here’s a diagram I drew of a simple approach that came to mind: https://gist.github.com/b0o/a73af0c1b63fccf3669fa4b00ac4be52

It would be awesome to see this functionality built into BricksLLM.


They’ve recently added this functionality to AWS Bedrock thankfully. Doesn’t support OpenAI models, but does support Anthropic.

https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-be...


If you can get Claude approved.


how exactly are you intending to batch different prompts together in the openai api? its not like they accept an array of parallel inputs


OpenAI API doesn't support batching afaik.



Embeddings can be batched.


Looks interesting! We're in the middle of building something similar right now for ourselves. We may look at this as an alternative for ourselves.

By the way, I saw this after a quick glance poking through the code. This isn't encryption, it is hashing. Not sure where or how it is used but it is worth a rename at least: https://github.com/bricks-cloud/BricksLLM/blob/main/internal...


You might want to also look at: https://www.getjavelin.io


No commits to the repo for 3 months, doesn’t invoke confidence..


you are right. will update the language


I ended up building (closed-source) product that not only tracks surprise bills in real-time but minimizes them via semantic caching - https://observeapi.ashishb.net/

Demo: https://gptcache.ashishb.net/


No you didn't.

The whole point of using Azure OpenAI over plain OpenAI is the fact that your data doesn't get donated to OpenAI for training (this solves "compliance" for enterprise customers that need it), which you're not solving (because you obviously can't run the OpenAI models in your own data center).


Many here think that I'm afraid of my companies data being used to train models, but that is only half-correct.

The value proposition of Azure OpenAI to the enterprise is that it's bound by the Enterprise Agreement for not just data-use, but data access writ-large. I don't want my data to be accessible for *any reason* by a vendor that isn't covered explicitly in the EA.

While OpenAI says they delete your data after 30-days I have no contract or agreement in place that ensures that, or that within those 30-days they don't have carte blanche to my data for whatever else they may want to do with it.

The Azure OpenAI agreement *explicitly* states they only store my data for 30-days and can *only* access it if they suspect abuse of the service according to the ToC and our EA, which is no different than for other Azure services on my tenant. IIRC that's only after notifying me as well, and cannot exfiltrate that data even if they access it.


I suspect a lot of folks never worked with Enterprise / Gov customers and don't understand the restrictions and compliance requirements (like data residence, access control, FedRAMP, TISAX, reliability SLAs, etc. which you get with Azure but not with some "move fast and break things" startup like OpenAI).

My comment is not dissing on the author, I'm just pointing out what most folks get from Azure is compliance (and maybe safety), and an OSS cannot solve that (unless they're running some other models in owned infrastructure or on-prem).


Yup, and I agree with you.

Even beyond enterprise thought people seem to think that the only thing their data is good for as far as OpenAI is concerned is for training-datasets, but that's just not the case.

The authors of this have built something great, but it doesn't protect you from any of those non-training use-cases.


Even just doing SOC2 we prefer it

Let alone if you are a SaaS your customers may demand it.


> most folks get from Azure is compliance

To be fair, this doesn't prevent problems. Paperwork doesn't plug any security gaps in any cloud provider.


When people say stuff like this on Hacker News, it makes me think even more they haven't done a lot of work with government, or at least not the parts of the government I'm familiar with. Obviously, there are a lot of governments out there. But the FedRamp private enclaves with IL5-certification for CUI handling offered by the major cloud providers are a hell of a lot more secure than OpenAI's servers, and for workloads that require it, the classified enclaves are probably close to impossible to breach if you're not Mossad. Data centers on military installations, no connection to the Internet, private DX hardware encrypted on the installation with point-to-point tunneling through national fiber backbone only, and if you get anywhere near the cables, men in black SUVs suddenly show up out of of nowhere to bring you in and figure out why. I'm not even just saying that as a hypothetical. I've literally seen it happen when AT&T dug too close to the wrong line they didn't even know about because it was used for a testing facility the Navy doesn't publicly acknowledge. And the data they really cared about didn't even use that. It was hand-carried by armed couriers who kept hard drives in Pelican cases.

They may be tedious as fuck to implement and make what should be simple work take forever, but there are plenty of compliance checklists out there that really do give you security.


> When people say stuff like this on Hacker News, it makes me think even more they haven't

I have done work with government and defence. Ad hominem stuff is really pointless.


It potentially plugs the contractual and liability risks, which might be more important (talk to your legal and compliance folks). None of your data is going to launch nuclear missiles, if it leaks it would be unfortunate, but not as much as the litigation and regulatory costs you could potentially incur.

Everyone gets popped eventually. It's your job to show you operated from a commercially reasonable security posture (and potentially your third party dependency graph, depending on regulatory and cyber insurance requirements).

(i report to a CISO, and we report to a board, thoughts and opinions are my own)


> (i report to a CISO, and we report to a board, thoughts and opinions are my own)

That sounds like an interesting role. How did you get there? Did you start as a security analyst and work your way up?


Word of mouth referral into the org, last ~5 years as a security architect/cybersecurity subject matter expert, before that DevOps/infra engineer. 20+ years in tech. I rely solely on network and reputation.

Be interesting to people who can provide you opportunity, and ask whenever an opportunity presents itself. If you don’t ask, the answer is default no. Being genuinely curious and desiring to help doesn't hurt either.


I'm not disagreeing, but I'm making a separate point. I'm familiar with CYA, and need to use it myself, but that doesn't affect my previous point.


Compliance isn’t about preventing problems. It’s about identifying risks and determining who is responsible for mitigating those risks and who is on the hook for damages if the risks aren’t sufficiently mitigated.


OpenAI also does not use API data to train its models.

Source: https://openai.com/enterprise-privacy


Needs more attention. Anything that replicates ChatGPT using OpenAI APIs is valuable because of this fact for enterprise use cases.

https://mitta.ai uses these API calls, has an Open Source code base for inspection, a strong user privacy policy, and doesn't store data in transit, other than documents that are uploaded. I'm working on TTLs for the files stored, so there is no data left behind (other than what might be stored by the user in the DB).


The Azure OpenAI API to me was more about reliability and minimizing downtime.

The OpenAI API also allows opting out of training the OpenAI for training. If you are referring to something else with data, happy to learn.

https://help.openai.com/en/articles/5722486-how-your-data-is...

"API

OpenAI does not use data submitted to and generated by our API to train OpenAI models or improve OpenAI’s service offering. In order to support the continuous improvement of our models, you can fill out this form to opt-in to share your data with us. "


According to OpenAI, they don't use data sent to the API for training and delete it after 30 days. So your concern is misplaced unless you think OpenAI is outright lying about their policies.


Whoa there. That’s only half the story.

https://techcrunch.com/2023/03/01/addressing-criticism-opena...

They said only if they opt in not that they wouldn’t do it.

And they used to and could change those terms again after the market matures and developers have deeply integrated application investments.

Plenty of companies have followed this strategy in the past.

Let’s not oversimplify the situation as it was a recent change as well…


That's only for ChatGPT. The APIs are governed by [0]

> We will only use Customer Content as necessary to provide you with the Services, comply with applicable law, and enforce OpenAI Policies. We will not use Customer Content to develop or improve the Services.

[0]: https://openai.com/policies/business-terms


That language seems to be doing a lot of heavy lifting. A more straightforward phrasing would've been "we will never process or otherwise consume customer content once it has been delivered to you". As written, they could use it to train GPT-5 or what-have-you and refuse to share it with you, making that exempt as it is apart from "services". Or all manner of other shenanigans, if they have competent lawyers.


You can't really read the language at face value. 'Services' is defined term in the contract:

> “Services” means any services for businesses and developers we make available for purchase or use, along with any of our associated software, tools, developer services, documentation, and websites, but excluding any Third Party Offering.

Of course it's still language, and one can quibble with any language, but it's reasonably restrictive.


That’s my point though, they can clearly use your data to train as long as they don’t sell/share those models themselves, as per this contract.

At the end of the day you can’t blindly trust your data is safe, even with a solid contract (bad actors exist, after all). So whatever you do, do it at your own risk.


Microsoft can also change their terms so I don’t see much difference there.

I can understand preferring the Azure endpoints, but let’s be accurate about what the policies are.


+1 i dont think Azure provides that much more in terms of data privacy unless youre saying we should believe Azure's policies but not OpenAI's...


+1


This is a few basic tools put into an api, not a replacement for Azure OpenAI services. I know because I built similar tools to help me run the ChatGPT apis locally and it was a day or two of coding at max, even including calculating accurate token counts and cost.


If I were to self host an open source model like mistral or llama , are there options similar to this as an api gateway to proxy and authenticate , create api keys , monitor spends by api etc .,? How are people running open source LLM”s in production ? Thanks


We do support self-hosted models, as long as they're exposed as an API. Would this endpoint work for you? https://github.com/bricks-cloud/BricksLLM?tab=readme-ov-file...


Congratulations on shipping! We are currently evaluating replacing our homegrown version of an LLM proxy with this project:

https://github.com/BerriAI/litellm

Any comparison or contrast you would point out?


Litellm proxy is a pretty good project on its own. I am obviously biased because we are competitors. Here are my thoughts.

* Litellm is declarative and it let you define everything in yaml * Bricks is not declarative and you control everything via API

* Litellm does not have an UI * Bricks has a non open source UI

* Litellm is written in python * Bricks is written in Golang

* Litellm does not persist rate limits. Therefore can't accurately rate limit across distributed instances * Bricksllm let you create API keys with accurate rate limits and spend limits that work across distributed instances

* Litellm provides high level spend metrics on API keys * Bricks provides granular spend, request and latency metrics breakdown by model and custom id

* Litellm is not compatible with OpenAI SDK. You have to adopt Litellm python client * Bricks is designed to be compatible with OpenAI SDK

* Litellm only supports OpenAI completion and embedding * Bricks supports almost all OpenAI endpoints except image and audio

* Litellm has exact request caching * Bricks does not have caching as for now

* Litellm has OpenTelemetry integration * Bricks has statsd integration

* Litellm supports orchestration of API calls. When this API call fails, use this model or call this API endpoint instead * Bricks does not support orchestration of API calls since I believe that it is something that the client should handle


LiteLLM proxy (100+ LLMs in OpenAI format) is exactly compatible with the OpenAI endpoint. Here's how to call it with the openai sdk:

``` import openai

client = openai.OpenAI( api_key="anything", # proxy key - if set base_url="http://0.0.0.0:8000" # proxy url )

# request sent to model set on litellm proxy,

response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [ { "role": "user", "content": "this is a test request, write a short poem" } ])

print(response)

``` Docs - https://docs.litellm.ai/docs/proxy/quick_start


hi i'm the maintainer of litellm - we persist rate limits, they're written to a DB: https://docs.litellm.ai/docs/proxy/virtual_keys

- LiteLLM Proxy IS Exactly Compatible with the OpenAI SDK


Update: Litellm is compatible with OpenAI SDK


Thank you, I appreciate the earnest and thoughtful reply!


Have you taken a look at: https://www.getjavelin.io


I haven’t, thanks! Signed up. Not wild about another layer of SaaS, but perhaps they’ll have a self hosted option.


have you had a look at https://www.pulze.ai/ ? LLM API that provides the highest-quality LLM responses through intelligent routing, beating GPT-4 and any other single LLM across all prompt categories.


great idea for simple use cases, but not sure if it can load handle as well as Azure i like that its open sourced too since most solutions out there are black box and we can only rely on the "good will" of the company that they will do as they say


How do you calculate usage limits if you don't tokenize inputs and outputs?


we do tokenize inputs and outputs. sorry if that is not clear in the post


Tried this out last night and kept getting a key is not authorized error even though the key absolutely exists. (I had just created it via the steps on the Github page)


Congradulations (from the readme) :)


fixed. embarrassing on my part


You put in a lot of effort and built something cool - a spelling error is absolutely nothing to be embarrassed about.


Think your website is down? Would love to contribute to this.


Worth checking out Helicone and others




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: