Hacker News new | past | comments | ask | show | jobs | submit | euclaise's comments login

Simpler than, but somewhat reminiscent of, Plan 9's windowing system https://man.cat-v.org/plan_9/4/rio


Between the official nvidia drivers and Linuxulator, FreeBSD can run CUDA applications, but it's a bit hacky

No other BSDs can


This one does have attention, it's just chunked into segments of 4096


Yes, but the claim is about "unlimited context length." I doubt attention over each segment can be as good at recall as attention over the full input context.


A lot of embedding models are built on top of T5's encoder, this offers a new option

The modularity of the enc-dec approach is useful - you can insert additional models in between (e.g. A diffusion model), you can use different encoders for different modalities, etc


LM studio is closed source, so no


Neat. I've worked on some similar projects in the past

I have previously ported w2c2 to Plan 9 here: https://github.com/euclaise/w2c9

It ran basic Rust code fine.

I later managed to run C++ code without wasm, by (partially) porting musl and doing some linker hacking here: https://sr.ht/~euclaise/cross9/


There's a new 7B version that was trained on more tokens, with longer context, and there's now a 14B version that competes with Llama 34B in some benchmarks.



Why is this flagged? Honest question.


To be fair, it's a stupid distraction from discussing the model. If every thread just turns into politics it would not make for good discussions. People can start threads about specific ideologies that language models have and they can be discussed there (and have been). Bringing that up every time a model is discussed feels off topic and legit to flag (I didn't flag it)

Edit: but now I see the thread has basically been ruined and it's going to be about politics instead of anything new and interesting about the model, congrats everyone.


Is it a distraction? AI alignment is a huge topic, especially if the model is from an authoritarian country.


Many people are just don’t get that theses models are going to be integrated in all sorts of everyday stuff.

Questioning whether all this tech innovation should be shared with authoritaria regimes is a damn valid question to be asked and often as possible.


Alignment is also a distraction, its OpenAI marketing and something people who don't understand ML talk about, not a serious topic.

Like I said, discussing model politics have a place but bringing it up every time a model is mentioned is distracting and prevents adult discussion. It would be like if every time a company comes up, the thread gets spammed with discussion about the worst thing that company has ever done instead of discussing it in context.


The condescension is unfounded and unnecessary. Discussion of the usefulness of a model or its interface also includes these topics. If the refusal to discuss the topic came from anything other than it simply being absent from training data, that’s highly interesting from multiple perspectives.

For example, ChatGPT’s practical usability is regularly hobbled by alignment concerns, notably more so in 3.5 than 4. It’s a worthy topic, not a distraction and characterizing it as something other than ‘adult discussion’ is nothing more than an egocentric encoding of your specific interests into a ranking of importance you impose on others. A little humility goes a long way.

We’re here to be curious and that includes addressing misconceptions, oversights and incorrect assumptions. That all still counts as adult discussion.


Artificial ways the model is restricted are an absolutely relevant and important thing to discuss.


Thought crimes implemented in code basically. Orwell would have had a field day with LLMs.


Imagine if a Chinese company releases a model that kicks the state of the art's ass and everyone starts using it because it works so well. Now the censorship has leaked into all the systems that use it.


It is not a stupid distraction per se.

Any model produced by any North American or European companies, even X, may be trained in a way that would be politically correct to the taste of the company, some are left leaning and some are right leaning, but the topics being censored by the model will still be far lesser than the model being created by a Chinese/Russian companies. This is because for the companies to survive in a totalitarian government, the company must bent and satisfy all the filtering request by the government.


It’s one of the interesting features and engineering challenges that’s unique to Chinese AI.


I get that someone would do it anyway, but why would the poster want to be the one helping an authoritarian entity fix these loopholes? smh


They actually have a performance edge, but they aren't well suited to chat models because you can't do caching of past states like with decoder-only models


That tweet had it backwards, more tokens in tokenizer means that the 16k token context window typically allows for even longer passages than if LLaMA were 16k


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: