More

euclaise · 2024-05-13T19:27:45 1715628465

Simpler than, but somewhat reminiscent of, Plan 9's windowing system https://man.cat-v.org/plan_9/4/rio

euclaise · 2024-04-20T14:13:07 1713622387

Between the official nvidia drivers and Linuxulator, FreeBSD can run CUDA applications, but it's a bit hacky

No other BSDs can

euclaise · 2024-04-16T23:59:58 1713311998

This one does have attention, it's just chunked into segments of 4096

cs702 · 2024-04-17T13:08:15 1713359295

Yes, but the claim is about "unlimited context length." I doubt attention over each segment can be as good at recall as attention over the full input context.

euclaise · 2024-04-15T22:19:35 1713219575

A lot of embedding models are built on top of T5's encoder, this offers a new option

The modularity of the enc-dec approach is useful - you can insert additional models in between (e.g. A diffusion model), you can use different encoders for different modalities, etc

euclaise · 2024-03-22T19:18:07 1711135087

LM studio is closed source, so no

euclaise · 2024-03-20T01:05:34 1710896734

Neat. I've worked on some similar projects in the past

I have previously ported w2c2 to Plan 9 here: https://github.com/euclaise/w2c9

It ran basic Rust code fine.

I later managed to run C++ code without wasm, by (partially) porting musl and doing some linker hacking here: https://sr.ht/~euclaise/cross9/

euclaise · on Sept 27, 2023

There's a new 7B version that was trained on more tokens, with longer context, and there's now a 14B version that competes with Llama 34B in some benchmarks.

euclaise · on Sept 27, 2023

https://www.reddit.com/r/LocalLLaMA/comments/16sw4na/qwen_is...

hashtag-til · on Sept 27, 2023

Why is this flagged? Honest question.

version_five · on Sept 27, 2023

To be fair, it's a stupid distraction from discussing the model. If every thread just turns into politics it would not make for good discussions. People can start threads about specific ideologies that language models have and they can be discussed there (and have been). Bringing that up every time a model is discussed feels off topic and legit to flag (I didn't flag it)

Edit: but now I see the thread has basically been ruined and it's going to be about politics instead of anything new and interesting about the model, congrats everyone.

chpatrick · on Sept 27, 2023

Is it a distraction? AI alignment is a huge topic, especially if the model is from an authoritarian country.

hashtag-til · on Sept 27, 2023

Many people are just don’t get that theses models are going to be integrated in all sorts of everyday stuff.

Questioning whether all this tech innovation should be shared with authoritaria regimes is a damn valid question to be asked and often as possible.

version_five · on Sept 27, 2023

Alignment is also a distraction, its OpenAI marketing and something people who don't understand ML talk about, not a serious topic.

Like I said, discussing model politics have a place but bringing it up every time a model is mentioned is distracting and prevents adult discussion. It would be like if every time a company comes up, the thread gets spammed with discussion about the worst thing that company has ever done instead of discussing it in context.

antiterra · on Sept 28, 2023

The condescension is unfounded and unnecessary. Discussion of the usefulness of a model or its interface also includes these topics. If the refusal to discuss the topic came from anything other than it simply being absent from training data, that’s highly interesting from multiple perspectives.

For example, ChatGPT’s practical usability is regularly hobbled by alignment concerns, notably more so in 3.5 than 4. It’s a worthy topic, not a distraction and characterizing it as something other than ‘adult discussion’ is nothing more than an egocentric encoding of your specific interests into a ranking of importance you impose on others. A little humility goes a long way.

We’re here to be curious and that includes addressing misconceptions, oversights and incorrect assumptions. That all still counts as adult discussion.

chpatrick · on Sept 27, 2023

Artificial ways the model is restricted are an absolutely relevant and important thing to discuss.

seanmcdirmid · on Sept 27, 2023

Thought crimes implemented in code basically. Orwell would have had a field day with LLMs.

chpatrick · on Sept 27, 2023

Imagine if a Chinese company releases a model that kicks the state of the art's ass and everyone starts using it because it works so well. Now the censorship has leaked into all the systems that use it.

phantomathkg · on Sept 27, 2023

It is not a stupid distraction per se.

Any model produced by any North American or European companies, even X, may be trained in a way that would be politically correct to the taste of the company, some are left leaning and some are right leaning, but the topics being censored by the model will still be far lesser than the model being created by a Chinese/Russian companies. This is because for the companies to survive in a totalitarian government, the company must bent and satisfy all the filtering request by the government.

enaaem · on Sept 28, 2023

It’s one of the interesting features and engineering challenges that’s unique to Chinese AI.

class4behavior · on Sept 27, 2023

I get that someone would do it anyway, but why would the poster want to be the one helping an authoritarian entity fix these loopholes? smh

euclaise · on Sept 23, 2023

They actually have a performance edge, but they aren't well suited to chat models because you can't do caching of past states like with decoder-only models

euclaise · on Sept 8, 2023

That tweet had it backwards, more tokens in tokenizer means that the 16k token context window typically allows for even longer passages than if LLaMA were 16k