More

FieryTransition · 2025-02-07T13:18:52 1738934332

But if you have a large set of problems to which you already know the answer, then using that in reinforcement learning, then wouldn't the expertise transfer later to problems with no known answers, that is a feasable strategy, right?

Another issue is, how much data can you synthesize in such a way, so that you can construct both the problem and solution, so that you know the answer before using it as a sample.

Ie, some problems are easier to make knowing you can construct the problem yourself, but if you were to solve said problems, with no prior knowledge, they would be hard to solve, and could be used as a scoring signal?

Ie, you are the Oracle and whatever model is being trained doesn't know the answer, only if it is right or wrong. But I don't know if the reward function must be binary or on a scale.

Does that make sense or is it wrong?

godelski · 2025-02-08T01:19:33 1738977573

I don't think this makes sense and I'm not quite sure why you went to ML, but that's okay. I am a machine learning researcher, but also frustrated with the state of machine learning, in part because, well... you can probably see how "proof by empirical evidence" is dialed up to 11.

Sorry, long answer incoming. It is far from complete too but I think it will help build strong intuition around your questions.

Will knowledge transfer? That entirely depends on the new problem. It also entirely depends on how related the problem is. But also, what information was used to solve the pre-transfer state. Take LLMs for example. There's lots of works that have shown them being difficult to train for solving calculations. Where they will do well on problems with the same number of digits but this will degrade rapidly as number of digits increase. It can be weird to read some of these papers as there will sometimes be periodic relationships with the number of digits but that should give us information about how they're encoding the problems. But that lack of transferability indicates that despite the problem solving and what we'd believe is actually just the same problem, doesn't mean it is. So you have to be really careful here, because us humans are really fucking good at generalization (yeah, we also suck, but a big part is our proficiency makes us recognize where we lack. But also, this is more a "humans can" more than "humans do" type of thing. So be careful when comparing). This generalization is really because we're focused around building causal relationships, while on the other hand the ML algorithms are build around compression (i.e. fitting data). Which, if you notice, is the same issue I was pointing to above.

  > Ie, you are the Oracle and whatever model is being trained doesn't know the answer, only if it is right or wrong. But I don't know if the reward function must be binary or on a scale.

This entirely depends on the problem. We can construct simple problems that both illustrate success as well as failure. What you really need to think about here is the information gain from the answer. If you check how to calculate that, you will see the dependence (we could get into Bayesian Learning or experiment design but this is long enough). But let's think of a simple example in the negative direction. If I ask you to guess where I'm from, you're going to have a very hard time pinning down the exact location. Definitely in this example there is a efficient method, but our ML learning algorithms don't start with prior knowledge about strategies and so they aren't going to know to binary search. If you gave that to the model, you baked in that information. This is a tricky form of information leakage. It can be totally fine to bake in knowledge, but we should be aware of how that changes how we evaluate things (we always bake in knowledge btw. There is no escaping this). But most models would not have a hard time if instead we played "hot/cold", because the information gain is much higher. We've provided a gradient to the solution space. We might call this hard and soft labels, respectively.

I picked this because there's a rather famous paper about emergent abilities (I fucking hate this term[0]) in ML models[1], and a far less famous counter to it[2]. There's a lot of problems with [1] that require a different discussion but [2] shows how a big part of the issue is how many of the loss landscapes are fairly flat and so when feedback is discrete the smaller models just wonder around that flat landscape needing to get lucky to find the optima (btw, this also shows that technically this can be done too! But that would require different training methods and optimizers). But when giving them continuous feedback (i.e. you're wrong, but closer than your last guess), they are able to actually optimize. A big criticism of the work is that it is an unfair comparison because there are "right and wrong" answers here, but it'd be naive to not recognize that some answers are more wrong than others. Plus, their work shows a clear testable way we can confirm or deny if this works or not. We schedule learning rates, there's no reason you cannot schedule labels. In fact, this does work.

But also look at the ways they tackled these problems. They are entirely different. [1] tries to do proof by evidence while [2] uses proof by contradiction. Granted, [2] has an easier problem since they only need to counter the claims of [1], but that's a discussion about how you formulate proofs.

So I'd be very careful when using the recent advancements in ML as a framework for modeling reasoning. The space is noisy. It is undeniable that we've made a lot of advancements but there is some issues with what work gets noticed and what doesn't. A lot does come down to this proof by evidence fallacy. Evidence can only bound confidence, it can unfortunately not prove things. But this is helpful and well, we can bound our confidence to limit the search space before we change strategies, right? I picked [1] and [2] for a reason ;) And to be clear, I'm not saying [1] shouldn't exist as a paper or that the researchers were dumb for doing it. Read back on this paragraph, because we've got multiple meta layers here. It's good to place a flag in the ground, even if it is wrong, because you gotta start somewhere, and science is much much better at ruling things out than ruling things in. We more focus on proving things don't work until there's not much left and then accept those things (limits here too, but this is too long already).

I'll leave with this, because now there should be a lot of context that makes this much more meaningful: https://www.youtube.com/watch?v=hV41QEKiMlM

[0] It significantly diverges from the terminology used in fields such as physics. ML models are de facto weakly emergent by nature of composition. But the ML definition can entirely be satisfied by "Information was passed to the model but I wasn't aware of it" (again, same problem: exhaustive testing)

[1] (2742 citations) https://arxiv.org/abs/2206.07682

[2] (447 citations) https://arxiv.org/abs/2304.15004

FieryTransition · 2024-05-21T07:19:29 1716275969

I was wondering about this too, it's not viable to have a coating which necessitates changing equipment periodically. It will add logistical problems and waste. I think you are right in that using solid metals instead, makes more sense, given antibacterial properties have been known to exist for these for a while. The question is if all types of pathogens can be removed without damaging the equipment, or if material research could create an alloy which just like stainless steel, could make the metal form an oxidation layer which would protect it in case of contact with corrosive liquids.

FieryTransition · 2024-04-28T23:15:45 1714346145

I use a mix of using llamacpp directly via my own python bindings and using it via llamacpp-python for function calling and full control over parameters and loading, but otherwise ollama is just great for ease of use. There's really not a reason not to use it, if just want to load gguf models and don't have any intricate requirements.

FieryTransition · 2024-04-28T23:12:17 1714345937

Because the way they are quantized takes time to get bug-free when new architectures are released. If a model was quantized with a known bug in the quantizer, then it effectively makes those quantized versions buggy and they need to be requantized with a new version of llamacpp which has this fixed.

FieryTransition · 2024-04-16T00:25:16 1713227116

I'll never figured out whether Einstein thought that the god as described by Spinoza is a real thing? Since there's so many misattributed quotes of him which frames him as talking about the judeo-Christian god as most people know by modern standards.

Also, there have been other early religions like Gnosticism in the west, which also falls into this category. Point is, even western religion has way more nuances.

shagie · 2024-04-16T00:47:21 1713228441

I would suggest that the answer to that is "yes" in that the Spinozan God is... for lack of a better word the Universe.

(Kaizō in 1923 https://books.google.com/books?id=vLm4oojTPnkC&pg=PA262#v=on... )

> Scientific research can reduce superstition by encouraging people to think and view things in terms of cause and effect. Certain it is that a conviction, akin to religious feeling, of the rationality and intelligibility of the world lies behind all scientific work of a higher order.

> This firm belief, a belief bound up with a deep feeling, in a superior mind that reveals itself in the world of experience, represents my conception of God. In common parlance this may be described as "pantheistic" (Spinoza).

https://www.gutenberg.org/files/3800/3800-h/3800-h.htm

(Ethics was one of the hardest reads I had back in modern philosophy class)

And while it is a gross simplification of Ethics, from news://rec.humor.funny ( https://everything2.com/title/Existence+of+the+System+Admini... )

    1. The System Administrator is defined as the most perfect user possible.
    2. The property of necessary existence means that anything which possesses it must necessarily exist.
    3. If existence is better than non-existence (see the ontological proof), then necessary existence is better still.
    4. Any perfect user must possess the property of necessary existence.
    5. Therefore the System Administrator must necessarily exist.

    However:

    6. Being perfect, the System Administrator cannot make mistakes, delete the wrong account, trash the root directory, mess up a tape load, etc.
    7. Being perfect, the System Administrator can not be capable of goal-directed action, because such action would imply that the network is somehow less than perfect in its current state.
    8. Therefore, the System Administrator is really more of a force of nature within the system.
    9. Arguably, then the System Administrator *is* the system itself.

    Counter-argument:

    1. None, since the System Administrator has been defined to the point where it is a totally useless concept, there's no point in arguing.

    At least this resolves one of the major issues: the Spinozist argument proves that *if* the System Administrator does exist, it cannot be intelligent.

---

The God of Spinoza and Einstein is the magnificence of the universe as it reveals itself to us. The universe is real as is its majesty.

TomK32 · 2024-04-16T04:59:22 1713243562

Re 6. Listing so many mistakes the System Administrator surely never does is Blasphemy!

FieryTransition · 2024-04-15T20:48:47 1713214127

If you meet him, and he thought discussing god was interesting, would you then do it? If he wanted to discuss your faith in god, would you do it then?

cassepipe · 2024-04-15T20:54:39 1713214479

Discussing is always interesting but I would probably interpret it as him not being able to explain his own abilities like some people just "see" good chess moves or just "know" that there are actually 48 matches on that floor

mistermann · 2024-04-15T21:41:07 1713217267

I would wonder what the truth of the matter is, but each to their own, the unknown is not popular/pleasant.

FieryTransition · 2024-04-02T14:40:08 1712068808

But isn't vulkan made to run cross platform? And why can't they write it in dx12 as well? Aren't those made to be more portable while offering more low level access than previous apis?

What is stopping you from implementing fast math using compute shaders or just hacking with those interfaces? Or are they just too slow when they go through the api layer? Or is that just a myth that can be worked around if you know you are writing high performance code? Pardon my ignorance!

remram · 2024-04-02T15:35:02 1712072102

They would work and would be fast but not the fastest the algorithm can be implemented on each platform.

FieryTransition · on Nov 12, 2023

Because it was made to run on Linux from the start. I remember being excited to work with .net core on Linux, then discovering that most of the system libraries for network programming and related were either not implemented or badly working, but you would only find out if dug deep into low level stuff. After that, I swore of using it seeing how much the hype for cross platform didn't match reality. Hopefully they fixed it, but seeing as there are other choices of languages, it wasn't so bad.

FieryTransition · on Oct 28, 2023

Maybe someone can explain this, because I never understood it. When a company sits on so much cash, I guess it doesn't mean cash which is liquid, but rather a variety of assets, right?

So when they just have to pull x billions out, it's not just liquid assets, but they will have stage and sell assets representing those funds.

So since it's not just cash, how does a company of this size, then determine what assets to sell? And if those assets are actually invested in something, or representing an entity of some sorts, how do they assess whether or not it will cause any damage or loss of profitability? The crux is, is the risk two fold? First let go of whatever the assets were invested in (one), and then buy a new company and hope it has ROI (two).

Or is it actually possible to have 1.5 billion dollars laying around in cash somehow? I know money is a made up idea, but that is still a big number for a bank/banks/asset holding company to just say good for and expect some kind of real monetary tangible value behind the symbolic currency.

dmoy · on Oct 28, 2023

> When a company sits on so much cash, I guess it doesn't mean cash which is liquid, but rather a variety of assets, right?

Cash is a very specific thing on a balance sheet. It has to be cash or very close to cash. "Equivalent", something like a <90d treasury that has virtually zero interest rate risk.

So when someone says "Google has 100B cash" it would mean literally cash or close enough to cash that it doesn't matter.

You'll note, if you read the 10Q, it's also wrong. Google has 30B in cash and cash equivalents, and an additional 90B in marketable securities - stocks, and bonds with >90d maturity.

That said, "marketable securities" are extremely liquid.

> Or is it actually possible to have 1.5 billion dollars laying around in cash somehow?

Yes? Depending on what you mean by "laying around in cash". It's not literal physical dollar bills, it's numbers in a computer.

1.5B is not much for a company that size. I'd image that is payroll and accounts payable for like a week or two?

> I know money is a made up idea, but that is still a big number for a bank/banks/asset holding company to just say good for and expect some kind of real monetary tangible value behind the symbolic currency.

Bank of America alone has like 2 trillion in US deposits.

jungturk · on Oct 28, 2023

Most larger companies will use treasury management software within their finance organization to handle these issues (asset mix, risk, prediction, transfer, etc...).

Scanning the players will show you how some of them solve some of the problems you mention.

https://en.wikipedia.org/wiki/Treasury_management_system

KptMarchewa · on Oct 28, 2023

>Or is it actually possible to have 1.5 billion dollars laying around in cash somehow?

Yes. 1.5 billion is less than Google's weekly operating expense.

>So since it's not just cash, how does a company of this size, then determine what assets to sell? And if those assets are actually invested in something, or representing an entity of some sorts, how do they assess whether or not it will cause any damage or loss of profitability?

There's a lot of smart people under CFO that determine that.

matthewdgreen · on Oct 28, 2023

It’s described in various articles as $121bn of cash, cash equivalents (debt instruments and marketable securities with maturities of < 90 days) and other short-term investments (which I think includes government bonds.) Short term investments make up more than 80% of it.

bakuninsbart · on Oct 28, 2023

Alphabet has 190k employees. Let's assume the average salary is 5k/month, that's already close to a billion just going to employees accounts each month.

FieryTransition · on Sept 28, 2023

Does that mean you could make any model at least always produce syntactically correct code as output?

brucethemoose2 · on Sept 28, 2023

Yes. The model has no choice, as the syntax is applied as the model is "picking" probable output tokens, not parsing complete output like some other methods.

Though its possible it would output syntactically correct nonsense.

FieryTransition · on Sept 28, 2023

That's very awesome, I feel like it would be fun to make a feedback loop with the LLM outputting syntactically valid programs which also log the status of various invariants and pre/post conditions as the program runs to validate correctness and maybe even train a LLM this way.

Would be interesting to see how far it would get writing programs, and wrt. the problem of stopping too early on a token, it could resume with some extra context from where it stopped.

Maybe construct program such that it is made by building blocks which can fit into context, such that each block/function has a pre/post condition and invariants are inside it, which the LLM will try to test against each time it runs. I think I just found my next side-project, I know there are many similar tools, but I haven't seen anything trying to couple the compiler, running the program, creating pre/post and invariant checking against emitted code yet.

Would be interesting to test this hypothesis to see if the llm can actually build a program this way. I think about it like humans have a short-term memory as well, and for each sub-structure of a program, the llm would have to work in a similar way. Then reserve a bit of context to control the long-term memory for general goals, or create something like an AST of thoughts which it would recurse through, while reasoning about it.

amilios · on Sept 28, 2023

How does this actually work though, since the model could e.g. abruptly end the generation giving something syntactically invalid? Doesn't it need to look at the whole output at some stage?

brucethemoose2 · on Sept 28, 2023

All generated output depends on all the previous output. The model "looks" at (mostly) everything every time.

Generation never actually stops, the model just emits a special stop token when stopping is most likely next token. Hence the grammar implementation can prevent this stop token from being emitted prematurely.

There was some discussion of models getting "stuck" where there is no syntactically correct token to emit. Some proposals included a "backspace token" IIRC, but I dunno what they actually did. You can look through the discussion in the PR.

amilios · on Sept 29, 2023

Oh yeah that's true! Just block the stop token. But yes, my thought is that there are scenarios where it can get "stuck" as you said. I'll look at the PR, thanks!

uoaei · on Sept 28, 2023

At every generation step you mask the tokens so that only syntactically valid outputs are possible. This includes the <end> token.