Definitely curious what circuits light-up from a Neuralese perspective. We want reasoning traces that are both faithful to the thought process and also interpretable. If the other language segments are lighting up meanings much different than their translations, that would raise questions for me.
Yes absolutely this! We're working on these problems at FlyShirley for our pilot training tool. My go-to is: I'm facing 160 degrees and want to face north. What's the quickest way to turn and by how much?
For small models and when attention is "taken up", these sorts of questions really send a model for a loop. Agreed - especially noticeable with small reasoning models.
I just tried this with a smaller "thinking" model (deepseek distill, running locally) and boy are you right. It keeps flipping between which direction it should turn, second guessing its thought process and then getting sidetracked with a different approach.
Hello! On this topic: Could you please look into making the path ‘next/image’ uses for caching images user definable? Currently I can’t use Google app engine standard because the directory it uses is write protected. The only real solution seems to be custom image providers, which is a drag, so I’m on App Engine Flex spending way more than I probably should :-)
I did try, but in a wrong way (try to SVD quantization error to recover quality (I.e. SVD(W - Q(W)))). The lightbulb moment in this paper is to do SVD on W and then quantize the remaining.
Could you reference any youtube videos, blog posts, etc of people you would personally consider to be _really good_ at prompting? Curious what this looks like.
While I can compare good journalists to extremely great and intuitive journalists, I don't have really any references for this in the prompting realm (except for when the Dall-e Cookbook was circulating around).
Sorry for the late response - but I can't. I don't really follow content creators at a level where I can recall names or even what they are working on. If you browse AI-dominated spaces you'll eventually find people who include AI as part of their workflows and have gotten quite proficient at prompting them to get the results they desire very consistently. Most AI stuff enters into my realm of knowledge via AI Twitter, /r/singularity, /r/stablediffusion, and Github's trending tab. I don't go particularly out of my way to find it otherwise.
/r/stablediffusion used to (less so now) have a lot of workflow posts where people would share how they prompt and adjust the knobs/dials of certain settings and models to make what they make. It's not so different from knowing which knobs/dials to adjust in Apophysis to create interesting fractals and renders. They know what the knobs/dials adjust for their AI tools and so are quite proficient at creating amazing things using them.
People who write "jailbreak" prompts are a kind of example. There is some effort put into preventing people from prompting the models and removing the safeguards - and yet there are always people capable of prompting the model into removing its safeguards. It can be surprisingly difficult to do yourself for recent models and the jailbreak prompts themselves are becoming more complex each time.
For art in particular - knowing a wide range of artist names, names of various styles, how certain mediums will look, as well as mix & matching with various weights for the tokens can get you very interesting results. A site like https://generrated.com/ can be good for that as it gives you a quick baseline of how including certain names will change the style of what you generate. If you're trying to hit a certain aesthetic style it can really help. But even that is a tiny drop in a tiny bucket of what is possible. Sometimes it is less about writing an overly detailed prompt but rather knowing the exact keywords to get the style you're aiming for. Being knowledgeable about art history and famous artists throughout the years will help tremendously over someone with little knowledge. If you can't tell a Picasso from a Monet painting you're going to find generating paintings in a specific style much harder than an art buff.
> You may output only up to 500 words, if the best summary is less than 500 words, that's totally fine. If details are unclear, do not fill-in gaps, do leave them out of the summary instead.
This is super great, glad to see you-all up'ing the ante for safety. As you say a good percent of accidents are pilot error based loss of control events (https://www.ntsb.gov/safety/data/Pages/GeneralAviationDashbo...). There's a lot be said for getting multiple safety technologies in one package, as you seem to be doing.
I'm interested in your approach to certification. I've heard the LSA limits are increasing dramatically, but how sure are you that MOSAIC is going to turn-out as you hope for fly-by-wire control? Are you prepared for the regulatory environment as it is today by going with an experimental platform like the Sling? Usually there's a mandate for a home-builder to build "51%" of the aircraft, so I'm also wondering how that works for a characteristics augmentation system such as yours. What percent of the control laws fit into the 51%?
On the certified side, Piper is shipping the Pilot 100i trainer aircraft with Electronic Stability and Protection (ESP), preventing students from doing some wild stuff while flying solo, using the Garmin G3X certified avionics. Garmin has also been working on auto-land. With a continued development of these certified platforms, combining a ballistic parachute, how much room is there for you with an experimental aircraft?
I looked at the prescribed spot at Oshkosh and sadly couldn't find your booth as I was excited to meet you-all.. Previously I enjoyed flying Joby's sim which is a great example of Simplified Vehicle Operations (SVO). While they're in the powered lift space, I'm curious how much overlap you two have in the control-law certification path of your SVO aircraft.
Finally, as an aviation startup founder myself (FlyShirley.com - Your AI Copilot from Sim to Sky), I'm approaching this from a different angle for a lot of the same reasons, including how task saturation, fatigue/distraction are contributing factors in many accidents.
Super excited to see more aviation startups on hn. Hope to chat at some point. Cheers!
I can't speak for what will or won't happen with MOSAIC, but the current proposal bakes in SVO so we (and a lot of other companies) all hope that stays.
Yes, our first product is experimental and has a 51% requirement, but Sling has a great factory build assist program that will help with that. Typically, electrical systems aren't included in a 51% build and Sling builders don't usually touch the avionics (when doing a factory assist), so we hope that stays true.
The things that exist today are good steps in the direction, but we are pushing the boundary further with fly-by-wire, where these is very little innovation.
I assume a lack of manpower and lack of concerted attention to specific scams.
Another option is a Kafkaesque reporting mechanism in which a scammer took down their own videos before they could be reviewed by moderators. This is rarer but possible.
As a person who got back into web dev and tried Next for the first time, I was pretty upset once I learned that I was using this half baked App router rather than the more straightforward page router.
Oh, and not to mention that because of Vercel ownership, Next/Image will never support Google App Engine and specifying the image cache directory in an environment variable or at compile time.