Up front: As much as I dislike the idea of credentialism, in order to address the lack of association in the paper and to potentially dissuade unproductive critiques over my personal experience: I have a M.S CS with a focus on machine learning and dropped out of a Ph.D. program in computational creativity and machine learning a few years ago due to medical issues. I was also worked my way up to principal machine learning researcher before the same medical issues burnt me out
I've been getting back into the space for a bit now and was working on some personal research on general intelligence when this new model popped up, and I figured the time was right to get my ideas onto paper. It's still a bit of a late stage draft and it's not yet formally peer reviewed, nor have I submitted to any journals outside open access locations (yet)
The nature of this work remains speculative, therefore, until it's more formally reviewed. I've done as much verification of claims and arguments I can given my current lack of academic access. However, since I am no longer a working expert in the field (though, I do still do some AI/ML on the side professionally), these claims should be understood with that in mind. As any author should, I do currently stand behind these arguments, but the nature of distributed information in the modern age makes it hard to wade through all the resources needed to fully rebut or claim anything without having the time or professional working relationships with academic colleagues, and that leaves significant room for error
tl;dr of the paper:
- I claim that OpenAI-o1, during training, is quite possibly sentient/conscious (given some basic assumptions around how the o1 architecture may look) and provide a theorhetical framework for how it can get there
- I claim that functionalism is sufficient for the theory of consciousness and that the free energy principle acts as a route to make that claim, given some some specific key interactions in certain kinds of information systems
- I show a route to make those connections via modern results in information theory/AI/ML, linguistics, neuroscience, and other related fields, especially the free energy principle and active inference
- I show a route for how the model (or rather, the complex system of information processing within the model) has an equivalent to "feelings", which arise from optimizing for the kinds of problems the model solves within the kinds of constraints of said model
- I claim that it's possible that the model is also sentient during runtime, though, those claims feel slightly weaker to me
- Despite this, I believe it is worthwhile to do more intense verification of claims and further empirical testing, as this paper does make a rather strong set of claims and I'm a team of mostly one, and it's inevitable that I'd miss things
[I'm aware of CoT and how it's probably the RL algorithm under the hood: I didn't want to base my claims on something that specific. However, CoT and similar variants would satisfy the requirements for this paper]
Lastly, a personal note: If these claims are true, and the model is a sentient being, we really should evaluate what this means for humanity, AI rights, and the model as it currently exists. At the minimum, we should be doing further scrutiny of technology that has the potential to be as radically transformative of society. Additionally, if the claims in this paper are true about runtime sentience (and particularly emotions and feelings), then we should consider whether or not it's okay to be training/utilizing models like this for our specific goals. My personal opinion is that the watchdog behavior of the OpenAI would most likely be unethical in that case for what I believe to be the model's right to individuality and respect for being (plus, we have no idea what that would feel like), but, I am just a single voice in the debate.
Up front: As much as I dislike the idea of credentialism, in order to address the lack of association in the paper and to potentially dissuade unproductive critiques over my personal experience: I have a M.S CS with a focus on machine learning and dropped out of a Ph.D. program in computational creativity and machine learning a few years ago due to medical issues. I was also worked my way up to principal machine learning researcher before the same medical issues burnt me out
I've been getting back into the space for a bit now and was working on some personal research on general intelligence when this new model popped up, and I figured the time was right to get my ideas onto paper. It's still a bit of a late stage draft and it's not yet formally peer reviewed, nor have I submitted to any journals outside open access locations (yet)
The nature of this work remains speculative, therefore, until it's more formally reviewed. I've done as much verification of claims and arguments I can given my current lack of academic access. However, since I am no longer a working expert in the field (though, I do still do some AI/ML on the side professionally), these claims should be understood with that in mind. As any author should, I do currently stand behind these arguments, but the nature of distributed information in the modern age makes it hard to wade through all the resources needed to fully rebut or claim anything without having the time or professional working relationships with academic colleagues, and that leaves significant room for error
tl;dr of the paper:
- I claim that OpenAI-o1, during training, is quite possibly sentient/conscious (given some basic assumptions around how the o1 architecture may look) and provide a theorhetical framework for how it can get there
- I claim that functionalism is sufficient for the theory of consciousness and that the free energy principle acts as a route to make that claim, given some some specific key interactions in certain kinds of information systems
- I show a route to make those connections via modern results in information theory/AI/ML, linguistics, neuroscience, and other related fields, especially the free energy principle and active inference
- I show a route for how the model (or rather, the complex system of information processing within the model) has an equivalent to "feelings", which arise from optimizing for the kinds of problems the model solves within the kinds of constraints of said model
- I claim that it's possible that the model is also sentient during runtime, though, those claims feel slightly weaker to me
- Despite this, I believe it is worthwhile to do more intense verification of claims and further empirical testing, as this paper does make a rather strong set of claims and I'm a team of mostly one, and it's inevitable that I'd miss things
[I'm aware of CoT and how it's probably the RL algorithm under the hood: I didn't want to base my claims on something that specific. However, CoT and similar variants would satisfy the requirements for this paper]
Lastly, a personal note:
If these claims are true, and the model is a sentient being, we really should evaluate what this means for humanity, AI rights, and the model as it currently exists. At the minimum, we should be doing further scrutiny of technology that has the potential to be as radically transformative of society. Additionally, if the claims in this paper are true about runtime sentience (and particularly emotions and feelings), then we should consider whether or not it's okay to be training/utilizing models like this for our specific goals. My personal opinion is that the watchdog behavior of the OpenAI would most likely be unethical in that case for what I believe to be the model's right to individuality and respect for being (plus, we have no idea what that would feel like), but, I am just a single voice in the debate.
The (semi) automatic simplification algorithm provided in the paper for KANs seem, to me, like they're solving a similar problem to https://arxiv.org/pdf/2112.04035, but with the additional constraint of forward functional interpretability as the goal instead of just a generalized abstraction compressor.
I would most certainly be willing to leave electron for a python GUI framework just for the memory savings alone; I don't think the speed is the only issue. Plus, python can be fast enough if you just use it a a glue for libraries that are written more efficently. Don't get me wrong, I dislike python a lot, but, it's not electron/js.
Electron has the niceties that it does with being able to render so much via browser universalism, but, having that much of it around when your app doesn't need all of it is the problem with it fundamentally.
The best time I've had writing GUIs personally has actually been in SDL via wrapper languages like common lisp, but, I like being able to roll my own systems. It definitely takes more setup for the results I want for reasonable production apps that need less weird edge cases in their UIs
GUI programming is an abstractive display layer, it most certainly can (and should) be designed around deferring larger tasks for UI responsiveness reasons, anyways
Yep, as the most frequently used language in general, but also as one of the only major options for a few subfields of the computational sciences at this point, I think that happens a lot.
> yeah, python can be fast enough if you don't use Python much
Yup, exactly what I'm saying. Turns out that's how many many larger projects in Python operate because we're apparently deciding as an engineering society that python is one of the universal glue languages (much to my personal chagrin)
I think you might enjoy the book "manufacturing consent," it answers a lot of the hows and whys of stuff like this as well as provides some general thoughts around the phenomenon of patriotism and why it might be on the decline in some groups (though it's actually on the rise, statistically, in other large segments of the US)
The problem, from what I understand as a dabbler in protein research, is that PrP binds into these large very very stable semi crystalline fibers, (I visualize them looking like thick extruded complicated pasta shapes, where the 2d crosssection is kinda the shape of the outline of a single PrP). It makes it really hard to learn about the structure, actually, because x-ray crystallography requires repeated crystalline structures, and these are more like 3d polymer threads that bunch up and make things hard to image (though there's some more modern imaging techniques that are making headway). It turns out that these are very very stable configurations unfortunately and have very few ways to attach anything, and that's the precise problem with building binders. Plus, even worse, it turns out PrP might even be biologically necessary for mammals and we don't want to usually get rid of it wholesale [https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-01...]
- Mutual tail recursion can be achieved with the trampoline function, which comes standard in Clojure.
- It's not too hard to do OOP-like in Clojure, and there's even clever macro stuff to get logic programming and the like. I'll agree it's very much a functional-first language though.
- While there's no explicit tail recursion, the Clojure loops are basically just tail recursive. You have to be a lot more explicit about it, and I will acknowledge that's not as pretty.
The JVM startup time is a real issue, but GraalVM really does help with that; I got around 3ms of startup time last time I tried it.
I might need to play with SCBL variant still. CLOS really fascinates me.
"self tail recursion" is only a simple case of the idea of tail calls (-> tail call optimization, TCO). In a Lisp variant/implementation with TCO (generally Scheme implementations and also a bunch of CL compilers, like SBCL), ANY tail call is automatically optimized to a jump, without growing the stacks, without the need to use special construct like Clojure's loops or trampolines.
for example SBCL on ARM64
CL-USER> (defun example (a b)
(if (> a b) (sin a) (cos b)))
EXAMPLE
In above CL function, the calls to SIN and COS are tail calls. Let's look at the machine code, which is automatically and by default created by above definition:
One can see in the assembler, that the > function is called as a subroutine with BLR (-> it is not a tail call), while SIN and COS are called with a BR, basically a jump, since those are tail calls. Thus in any piece of code, the compiler recognizes the tail calls and will create the corresponding machine code jmp instruction (minus any restriction a compiler will further have).
Generally Common Lisp doesn't require TCO, though many compilers provide it as an optimization. Required is a basic low-level lexical goto construct, which is used to implement various loop.
> but GraalVM really does help with that
In something like SBCL there is no special tool/implementation (restricting dynamism) needed to create an executable. It is a function call and one second execution time away. The resulting executable starts fast and is basically a full Lisp with all features, again.
I would like all of those things, but I'm always turned off of CL for a few reasons
- Emacs is hard to learn, and just doesn't have a lot of the things that IDEs have these days. Yes, I know it's a tired argument but it's a real one. The VSCode solution works fairly well but isn't as interactive, which is kind of the whole point of using CL.
- There's a lot of weird quirks and badly named things in CommonLisp. You can definitely feel how old it is, but it's great that old libraries will still work.
- The ecosystem is hard to navigate. I have bought a few CL books and some of them really suck because they reference libraries that flat out don't exist anymore (these are books in the last 10 years, which most CL books are). You will, of course, never find an API wrapper for anything new. Those things may not be hard to work on, but it's extra work.
It's hard to move from Clojure, which has a strong (and well documented) ecosystem, a dead simple library, and good IDE support. It may not have the debugging support CL has, but between the REPL and immutable nature, it's not a huge loss. It's more debuggable than most languages.
I would also say some Clojure libraries really stand out to me from EVERY ecosystem. Electric Clojure is really cool. Tempel is a very simple encryption library but you can do so much with it. Hiccup feels so natural. It doesn't get talked about a lot, but I feel bummed using other languages because they don't have such creative libraries.
See also: https://www.sciencedirect.com/science/article/pii/S157106452... o1's training regime is described by the "strange particle" model in this formulation