What I've started experimenting with and will continue to explore is to have project-specific MCP tools.
I add MCP tools to tighten the feedback loop. I want my Agent to be able to act autonomously but with a tight set of capabilities that don't often align with off-the-shelf tools. I don't want to YOLO but I also don't want to babysit it for non-value-added, risk-free prompts.
So, when I'm developing in go, I create `cmd/mcp` and configure a `go run ./cmd/mcp` MCP server for the Agent.
It helps that I'm quite invested in MCP and built github.com/ggoodman/mcp-server-go, which is one of the few (only?) MCP SDKs that let you scale horizontally over https while still supporting advanced features like elicitation and sampling. But for local tools, I can use the familiar and ergonomic stdio driver and have my Agent pump out the tools for me.
Horizontal scaling of remote MCP Servers is something the spec is sadly lacking any recognition around. If you've done work in this space, bravo. I've been using a message bus to decouple the HTTP servers from the MCP request handlers. I'm still evolving the solution, but it's been interesting so far.
In a previous professional life, I did financial modelling for a big 4 accounting firm. We had tooling that allowed us to visualize contiguous ranges of identical formulas (if you convert formulas to R1C1 addressing, similar formulas have the same representation). This allowed for overrides to stick out like a sore thumb.
I suspect similar tools could be made for Claude and other LLMs except that it wouldn't be plagued by the mind-numbing tedium of doing this sort of audit.
An idea might be to require a financially meaningful deposit to pursue an account recovery like this. The deposit would be forfeit if the identity verification failed.
Though now that I write this, it creates a perverse incentive for a company to collect deposits and deny account recovery.
Both trick a privileged actor into doing something the user didn't intend using inputs the system trusts.
In this case, a malicious PDF that uses prompt-injection to get a Notion agent (which already has access to your workspace) to call an external web-tool and exfiltrate page content. Tjhis is simialr to CSRF's core idea - an attacker causes an authenticated principal to make a request - except here the "principal" is an autonomous agent with tool access rather than the browser carrying cookies.
Thus, same abuse-of-privilege pattern, just with different technical surface (prompt-injection + tool chaining vs. forged browser HTTP requests).
I'm fairly convinced that with the right training.. the ability of the LLM to be "skeptical" and resilient to these kinds of attacks will be pretty robust.
The current problem is that making the models resistant to "persona" injection is in opposition to much of how the models are also used conversationally. I think this is why you'll end up with hardened "agent" models and then more open conversational models.
I suppose it is also possible that the models can have an additional non-prompt context applied that sets expectations, but that requires new architecture for those inputs.
Yeah, ultimately the LLM is guess_what_could_come_next(document) in a loop with some I/O either doing something with the latest guess or else appending more content to the document from elsewhere.
Any distinctions inside the document involve the land of statistical patterns and weights, rather than hard auditable logic.
What does "pretty robust" mean, how do you even assess that? How often are you okay with your most sensitive information getting stolen and is everyone else going to be okay with their information being compromised once or twice a year, every time someone finds a reproducible jailbreak?
If you were willing to bring additional zod tooling or move to something like TypeBox (https://github.com/sinclairzx81/typebox), the json schema would be a direct derivation of the tools' input schemas in code.
The json-schema-to-ts npm package has a FromSchema type operator that converts the type of a json schema directly to the type of the values it describes. Zod and TypeBox are good options for users, but for the reference implementation I think a pure type solution would be better.
Ultimately though, I don't believe that channels are an abstraction that makes sense in JavaScript's concurrency model. Go's contexts, on the other hand, would be a huge improvement over AbortController and AbortSignal.
Software engineers don't want to be managing physical hardware and often need to run highly available services. When a team lacks the skill, geographic presence or bandwidth to manage physical servers but needs to deliver a highly-available service, I think the cloud offers legitimate improvements in operations with downsides such as increased cost and decreased performance per unit of cost.
I add MCP tools to tighten the feedback loop. I want my Agent to be able to act autonomously but with a tight set of capabilities that don't often align with off-the-shelf tools. I don't want to YOLO but I also don't want to babysit it for non-value-added, risk-free prompts.
So, when I'm developing in go, I create `cmd/mcp` and configure a `go run ./cmd/mcp` MCP server for the Agent.
It helps that I'm quite invested in MCP and built github.com/ggoodman/mcp-server-go, which is one of the few (only?) MCP SDKs that let you scale horizontally over https while still supporting advanced features like elicitation and sampling. But for local tools, I can use the familiar and ergonomic stdio driver and have my Agent pump out the tools for me.
reply