The basic problem is that GPT generates easy poetry, and the authors were comparing to difficult human poets like Walt Whitman and Emily Dickinson, and rating using a bunch of people who don't particuarly like poetry. If you actually do like poetry, comparing PlathGPT to Sylvia Plath is so offensive as to be misanthropic: the human Sylvia Plath is objectively better by any possible honest measure. I don't think the authors have any interest in being honest: ignorance is no excuse for something like this.
Given the audience, it would be much more honest if the authors compared with popular poets, like Robert Frost, whose work is sophisticated but approachable.
Huh, that sounds a little like claiming that AI can draw pictures just as well as humans because they look realistic at a first glance. But not if you check whether the text, repetitive elements, and partially-occluded objects in the background look correct.
The more basic problem is that their methodology would conclude Harry Potter is better than Ulysses, AC/DC is better than Carla Frey, etc etc. It is completely fine to enjoy "dumb" art - I like Marvel comics and a lot of the Disney-era Star Wars novels have been pretty fun. But using easiness and fun as a metric of quality is simply celebrating ignorance and laziness.
Why is AC/DC “dumber” than, I assume, a fantastic classic musician? I used to think that these artists are riff-riff, but they all turned out to be masters of their art and, ignoring the pop/rock/etc flavor and type, they may actually surpass the genius of a violin virtuoso. Quite a claim you’re making here, assuming that AC/DC popularity was due to “dumbness” overall and that it’s “fine”. It’s not fine, it’s the same step on the ladder. I find this facet of a deep-shallow distinction completely synthetic and (imo) coming from a limited technical-ish view on music. Vivaldi, AC/DC, Blackpink — it’s all an art of a genius level, just differently flavored.
I'm a bit confused about this AC/DC vs "Carla Frey" comparison. For starters, I can't find any musicians by the name of Carla Frey. There is however a free jazz pianist named Carla Bley. It feels like OP has maybe selected a niche artist that they're personally fond of. I had a listen and it's the sort of stuff you'd get in hotel lobbies. It's nice, and it probably means a lot to people who are really into jazz, but I suspect that few people will ever hear of this artist. Conversely, Angus Young of AC/DC is extremely musically gifted and I suspect that songs like Thunderstruck will have a large social impact for years to come. There's nothing dumb about their songs. Though I don't put much stock in Spotify listens, it has 1.6 billion vs Bley's most popular "Lawns" having around 2 million.
Sure, there are cases where great artists are not well known in their own time - from Mozart to Nick Drake - but recognition generally follows in the subsequent decades. If Carla Bley is who they inteded to refer to, they've had over 50 years to become recognised.
Apples are better than oranges. "An orange a day keeps the doctor away?" I don't think so. Newton's orange? Nope. Orange may be the surprisingly successful remains of a former telecom monopoly, but Apple is the most valuable company on the planet.
You are assuming that it is objectively true that "Harry Potter is better than Ulysses, AC/DC is better than Carla Frey". What is your basis for that? And what is your definition of "better"?
I get the point you're trying to make, and in some philosophical sense you're right. In reality though, people into a field are able to rank quality extremely well and it doesn't end up being the puzzle you're implying it is
Poetry is for the enjoyment and enlightenment of the reader. Who reads poems? The question is a little vague. Who reads poems on a birthday card? Everyone. In a children's book? Parents and children.
But this study focuses on literary poems specifically, using works by literary heavy weights like Chaucer, Shakespeare, Dickinson, Whitman, Byron, Ginsberg, and so on. So the question is who reads literary poems, to which the answer is: academics, literary writers, and a vanishingly small population of reader who enjoy such pursuits.
So the test of if ersatz AI poetry is as good as "real" poetry should be if target audience (i.e. academics) finds the poems to be enjoyable or enlightening. But this study does not test that hypothesis.
This study tests a different hypothesis: can lay people who are not generally interested in literary poetry distinguish between real and AI literary poetry?
The hypothesis and paper feel kind of like a potshot at literary poetry to me, or at least I don't understand why this particular question is interesting or worthy of scientific inquiry.
> Poetry is for the enjoyment and enlightenment of the reader.
Is it?
If I read a book and it makes my whole body so cold no fire can warm me, I know that is poetry. If I feel physically as if the top of my head were taken off, I know that is poetry. These are the only ways I know it. Is there any other way?
- Emily Dickinson
Clearly Dickinson believes it is far more than what’s at the surface. It is more than the rhyme and rhythm. They add to the effect, but they are not necessary conditions. It’s generally agreed upon that art is defined by making you feel.
Lay people are often uninterested in poetry because they didn’t study it. There’s nothing wrong with that. Many are put off by elitists, garbage, and the difficulty to parse. But this is true for any subject matter, we see similarities in programming and in science. You see even many from there point to the surface and dismiss despite the importance being underneath. But it also doesn’t have to be for everyone. It’s not about being smart or dumb either, as most people happily get into the depths of the subjects they enjoy. Be it a child’s obsession with dinosaurs, a teen’s obsession in videogames, or an elite academic. We all have that capacity but there are always hurdles to entry and sometimes the point is to stumble. When it is, removing the hurdles harms the domain, even though it’s almost always done with the best of intentions. Eventually you gotta touch the stove top to learn it’s hot
To me, it sounds more like claiming that AI writes better code than Linus Torvalds because a group of random non-programmers preferred reading simple AI-generated Python over Linux kernel C code.
Not sure that I agree with this metaphor, as the utility of code is both subjective (readability) and objective (performance). There are objective ways in which C code simply can’t be matched by a Python implementation. If they were equivalent in terms of performance, I think use of C would crater.
With poetry, the “utility” is entirely the subjective experience of the reader. There is no objective component.
The vast majority of people seem to prefer Avengers 17 over any cinematic masterpiece, the latest drake song would be better rated than a Tchaikovsky... We should let them play and worship chat gpt if that's what they want to waste their time on
I don't understand the logic of calling superhero movies lesser/unserious like this, it's very snobby. Movies and music are made to be entertaining, the avengers is more entertaining than your "arthouse cinematic masterpiece that nobody likes but it's just because they aren't smart enough to understand it". It's also lazy and ignorant to ignore the sheer manpower that goes into making a movie like that.
I don’t fully agree with putting down “fun” movies like the Avengers, but at the same time “serious” art is not primarily for plain entertainment.
People might find “serious” art meaningful and it might spark feelings in them, but that’s not the same as getting an adrenaline rush from exploding cars in an action scene.
Of course there are also cases where the boundary between “fun” and “serious art” is not so clear, there are always exceptions to any attempt to define what makes something “serious art”. Art can also be subversive and run counter to traditional expectations of what art “should” be. But I don’t think the Avengers is an example of that.
Movies, music, wiriting, all human arts, are made to make their audience feel something. "Entertaining" is only a small and honestly ill-defined subset of this, no more valid than any other approach.
I don't think analyzing the black square or Ulysses, or Arnold Schönberg's works (just random examples, I could go on and on) in terms of if and what they make you feel is an ill fated course of action. It's also not what people actually do.
On the other hand a lot of other stuff can be broadly analyzed in terms of making people feel something. Painkillers, excuses, hugs, titles.
So it seems your generalization is neither necessary nor sufficient for "human arts".
> your "arthouse cinematic masterpiece that nobody likes
You're reading way too much into my comment. Any block buster from the 80/90s absolutely shits on 90% of block busters released today. I'm not talking about obscure 1950s czechoslovak cinema here...
> ignore the sheet manpower that goes into making a movie like that.
A lot of work doesn't make something good, especially when cgi quality actually gets worse year after year. FYI the entire LOTR trilogy had 30% less budget and 4x the runtime of the last avenger movie... And they actually filmed things outside of a Hollywood studio
The only lazy thing here are the scenarists and the directors shitting out the blandest movies ever.
But then again if all we care about is raw entertainment then sure, it's perfect, very easy to digest, lots of colors and not too much to think about, the cinematic equivalent of fast food. You can even buy avengers branded toilet paper and bottle water, that really shows how much they care about movies!
Well said. There's tons of blockbusters and other popular movies from the 80s/90s that were absolutely made for the "masses", but were genuinely great films, and far better than almost any blockbuster from the last 5-10 years, especially all the comic-book stuff. Alien(s), Back to the Future trilogy, Terminator 1/2, Ghostbusters, Beetlejuice, I could go on and on. And of course the LotR trilogy if you look at the early 2000s. Movies just aren't as innovative or risky these days; something as quirky as Ghostbusters wouldn't be made now (but Hollywood is happy to make remakes and sequels of that franchise now, 40 years later).
Film is such a nascent art form. The 90s as “peak blockbuster action” is a valid stance on taste but hard to defend as superior to all that came after. Christopher Nolan’s Dark Knight is leagues aways from the 90s Batman, as an auteur friendly and obvious comparison. Pixar another on the animation front.
There have been great films made in every era, but the trend towards tighter writing, more legible and compelling action, and emotionally impactful story telling is strongly trending upwards overall.
And nothing will ever top the merchandising mania of the 80s!
I hope you're referring to Joel Schumacher's kitschy drivel, and not to Tim Burton's masterpieces (both of which are IMO vastly superior to Nolan's take on the subject).
It would be nice if people actually stated why is x better than y rather than expecting everyone to hold the same opinion as them. Makes for better conversations.
I don't get why people having this narrowminded view of literature/movies, you don't see it that much in culinary conversations
In a culinary conversation nobody is trying to make the case that the chicken mcnugget is objectively superior to fresh pasta in a handmade pesto sauce. So you don't need to tell people "please just go away with this mcnugget nonsense". You wouldn't be expected to explain why one is better than the other. Most people that taste food understand immediately what you mean.
This reminded me so much of Spaceballs! And the yogurt merchandise towards the end! Such a great movie that has so many obvious "flaws" like the mirror under the speeder on the desert planet when they comb the desert. And yet I've actually watched that movie more often than even the actual real Star Wars movies (meaning the first three made - all of which are timeless awesomeness)
> Any block buster from the 80/90s absolutely shits on 90% of block busters released today
You sure it's not survival bias, as in, you only are thinking and remembering the good ones over a two-decade period and comparing them against what movies came out this year. When in reality, there might be tons of blockbusters in those era that were just as bad as your average one today?
what a ridiculous comparison. Of course a superhero movie is more entertaining than a film that is explicitly designed to avoid mass appeal.
The nuance is that movies today is not where the most creative talent is directed anymore. The shift started with prestige TV taking off in the 2000s, and episodic content on streaming services surpassing film as a mass-market artform in the 2010s, with the pandemic driving the nail in the coffin.
I loved the late 2000s / early 2010s superhero movies. Spiderman, The Dark Knight, Iron Man, etc. These were great films. Today, the MCU is just eating its own tail with the most bland, repetitive crap. It's all designed to incentivize the same die hard fans to keep forking over their hard-earned cash with all the cross-film teasers and the need to watch every film to understand all the references and moving parts. I understand the business model—it's actually the same as comic books now—because people don't casually go in to see random movies anymore, they do that at home on Netflix, so they have to target the repeat viewers. It's visually impressive, and the acting is good enough to keep a relatively large subset of the population coming back, but for someone like me who wants at least a little bit of novelty or creativity in the plot or characters, it's just so become so mind-bogglingly boring.
Sheer manpower doesn't make it good. You should have just made a point about entertainment, which it definitely does provide. A case can be made that this was the only thing they were going for. You would have had a good point. Implying that just because a lot of people did a thing together that means it has merit is kind of a strange thing to say. It's definitely not self-evident and you make no attempt to elaborate on it.
In your opinion, perhaps. Other films are made to be provocative-- to make you think or reflect. Certainly, a lot of the "arthouse cinematic masterpieces" aim for that as a goal rather than purely entertainment.
You're arguing against a strawman here... nobody is saying making an avengers movie is low effort. Certain aspects of an avengers movie though require less effort.
There is more to art than entertainment. For example Oedipus Rex [1] - distinctly not entertaining; but art, and powerful in an incomparable way, anyway.
_____________
[1] Don't look it up if you don't know what that is.
Oedipus Rex, also known by its Greek title, Oedipus Tyrannus (Ancient Greek: Οἰδίπους Τύραννος, pronounced [oidípuːs týrannos]), or Oedipus the King, is an Athenian tragedy by Sophocles.
And I'm perfectly aware that you're trying, hard, to make some sarcastic point, but that's what everybody in the English-speaking world calls it. Or did you want to talk about it in Greek? I'm fine with that. After all, it was in Greek that I've watched it, as a child, here:
Did you think that I'm just name-dropping a Greek tragedy to appear erudite and culturrred? Again: I'm Greek. I grew up with that stuff. They even teach us some of it in school (Antigone, for one).
That's clearly your opinion; I can't speak to the motivations of whom you refer to as "the other guy" but I have zero interest in either attempting to troll you or in looking cool.
Uhm. Sorry to bother you with something totally off-topic.
I have an almost lifelong itch, which I couldn't successfully scratch so far.
It's about the meaning of this surename: Κούβελας
Usually it is transcribed in English as Kouvelas, in German as Kouwelas, and in French it can be Couvelas, and AFAIK it is pronounced something like Koo-well-as(s) in Greek.
Does it have any 'speaking'/describing meaning, like Miller, Carpenter, Fisher, Baker and so on, or is it something like 'from a place called this', maybe distorted over generations?
I only get nothing from sites like this https://forebears.io/surnames/kouvelas , and the few people with Greek heritage I knew couldn't tell me either, so far.
Movies and music are usually made to be entertaining, but sometimes they're made as an artistic outlet for the creator.
I was listening to Schoenberg's "Suite for Piano" the other day. Did he make it to be entertaining? I don't know, interesting maybe. I wouldn't put it on at a party.
It's true that snobbery is off-putting, but if you're looking for artistic merit, then some works last longer than others. If you're looking for something to enjoy with your popcorn, then there's that too.
> The basic problem is that GPT generates easy poetry
I was going to come in here and say this. I'll even make the claim that GPT and LLMs __cannot write poetry__.
Of course, this depends on what we mean by "poetry." Art is so hard to define, we might as well call it ineffable. Maybe Justice Potter said it best, "I know it when I see it." And I think most artists would agree with this, because the point is to evoke emotion. It is why you might take a nice picture and put it up on a wall in your house but no one would ever put it in a museum. But good art should make you stop, take some time to think, figure out what's important to you.
The art that is notable is not something you simply hang on a wall and get good feelings from when you glance at it. They are deep. They require processing. This is purposeful. A feature, not a bug. They are filled with cultural rhetoric and commentary. Did you ever ask why you are no Dorothea Lange? Why your photos aren't as meaningful as Alfred Eisenstaedt's? Clearly There's something happening here, but what it is ain't exactly clear.
Let me give a very recent example. Here[0] is a letter from The Onion (yes, that Onion, the one who bought InfoWars The Onion) wrote an amicus brief to the Supreme Court. It is full of satire while arguing that satire cannot be outlawed. It is __not__ intended to be read at a glance. In fact, they even specifically say so
> (“[T]he very nature of parody . . . is to catch the reader off guard at first glance, after which the ‘victim’ recognizes that the joke is on him to the extent that it caught him unaware.”).
That parody only works if one is able to be fooled. You can find the author explaining it more here[1].
But we're coders, not lawyers. So maybe a better analogy is what makes "beautiful code." It sure as fuck is not aesthetically pleasing. Tell me what about this code is aesthetically pleasing and easy to understand?
float InvSqrt(float x){
float xhalf = 0.5f * x;
int i = *(int*)&x;
i = 0x5f3759df - (i >> 1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}
It requires people writing explanations![2] Yet, I'd call this code BEAUTIFUL. A work of art. I'd call you a liar or a wizard if you truly could understand this code at a glance.
I specifically bring this up because there's a lot of sentiment around here that "you don't need to write pretty code, just working code." When in fact, the reality is that the two are one in the same. The code is pretty __because__ it works. The code is a masterpiece because it solves issues you probably didn't even know existed! There's this talk as if there's this bifurcation between "those who __like__ to write code and those who use it to get things done." Or those who think "code should be pretty vs those who think code should just work." I promise you, everyone in the former group is deeply concerned with making things work. And I'll tell you now, you've been sold a lie. Code is not supposed to be a Lovcraftian creature made of spaghetti and duct tape. You should kill it. It doesn't want to live. You are the Frankenstein of the story.
To see the beauty in the code, you have to sit and stare at it. Parse it. Contemplate it. Ask yourself why each decision is being made. There is so much depth to this and it's writing is a literal demonstration of how well Carmack understands every part of the computer: the language, how the memory is handled, how the CPU operations function at a low level, etc.
I truly feel that we are under attack. I don't know about you, but I do not want to go gentle into that good night. Slow down, you move too fast, you got to make the morning last. It's easy to say not today, I got a lot to do, but then you'll grow up to be just like your dad.
I would say Davis's definition of "objectively better" here is "nobody who reads these poems carefully could possibly conclude that this AI crap is better than Walt Whitman, the only explanation is Walt Whitman is so difficult that the raters didn't read it carefully."
The Nature paper is making a bold and anti-humanist claim right in the headline, laundering bullshit with bad data, without considering how poorly-defined the problem is. This data really is awful because the subjects aren't interested in reading difficult poetry. It is entirely appropriate for Davis, as someone who actually is interested in good poetry, to make a qualitative stand as to what is or isn't good poetry and try to define the problem accordingly.
The data would still be awful, and people would pay less attention to the study because it’s not a priori surprising that ChatGPT would write worse poetry than the most celebrated poets in history.
If I use bad data to conclude that “Java is faster than C++ in most cases” you can be sure it will receive a lot more attention than if I reached the opposite conclusion based on similarly bad data.
> The data would still be awful, and people would pay less attention to the study because it’s not a priori surprising that ChatGPT would write worse poetry than the most celebrated poets in history.
[emphasis mine] You've inadvertently made the poster's point for them. You have written Spivak's "I knew it" reaction, just phrased more glibly.
I don't see how the authors are dishonest about it, or that the rebuttal refutes any of the actual claims made. They make it clear in the paper that they're specifically evaluating people who aren't especially interested in poetry, and talk at length about how and why this is different from other approaches. I suppose the clickbait title gives a bad first impression.
To summarize the facts: when people were asked to tell if a given poem was written by human or AI, people thought the AI poems were human more often than human poems. People also tended to rate them higher when they thought they were created by AI than when they thought they were human-made. It's speculated in the paper that this is because the AI poems tended to be more accessible and direct than the human poems selected, and the preference for this style from non-experts combined with the perception that AI poetry is poor led to the results.
The selection of human poets is cooked to give the result they wanted. I will grant the authors may have lied to themselves. But I don't think honest scientists would have ever constructed a study like this. It is comparing human avant garde jazz to AI dance music and concluding that "AI music" is more danceable than "human music", without including human dance music! It's just infuriating.
They expressly state the result is likely because the AI poetry was more simple and direct than the poetry selected, which is more accessible for the average person not interested in poetry. They compare and contrast this with other studies where this was not the case.
Yes, it's comparing apples and oranges; that's the whole point. It doesn't make the experiment itself flawed.
It seems to me that the whole study was intended to manufacture a result to grab headlines. Scientific clickbait. It doesn't matter how transparent they are, because that is mostly there to cover their asses.
Hum, but it should have compared against human poems that go for a similar style no? Otherwise, it doesn't tell us much, except that AI was not able to make more complex poems? And maybe that people who don't like poetry when asked prefer simpler poems?
> I don't see how the authors are dishonest about it, or that the rebuttal refutes any of the actual claims made.
Suppose I prepared a glass of lemonade and purchased a bottle of vintage wine—one held in high regard by connoisseurs. Then I found a set of random bystanders and surveyed them on which drink they preferred, and they generally preferred the lemonade. Do you see how it would be dishonest of me to treat this like a serious evaluation of my skill, either as a lemonade maker or a wine maker?
Sure - but it could still be pretty relevant if we want to ask about the future of beverage making and consumption, especially if new technology enables everybody to mass-produce lemonade (and similar sugary beverages) at home at minimal cost.
But much like the "debate" between linguistic prescriptivism ("'beg the question' doesn't mean 'raise the question'") and descriptivism ("language is how it is used"), both perspectives have relevance, and neither are really responses to the other.
I certainly hope people keep writing great, human, poetry. But generative ML is a systemic change to creative output in general. Poetry just happens to be in some ways simplest for the LLMs, but other art is tokens and patterns as well.
Personally, I think this would be a sin. To call something art which has no depth. We have too many things that are shallow. I think this has been detrimental to us as a society. That we're so caught up with the next thing that our leisure is anything but. What is the point of this all if not to make life more enjoyable? How can we enjoy life if we cannot have a moment to appreciate it? If we treat time off as if it is a chore that we try to get done as fast as possible? If we cannot have time to contemplate it? A world without friction is dull. It's as if we envy the machines. Perhaps we should make the world less tiring, so we have the energy to be human.
Human art most definitely can be reduced to tokens, since that's also essentially how we compress and transmit it.
Now, whether a statistical token generator makes "real art" is subjective (as human art already is). And again, I'm actually quite sympathetic to the "humans are special" perspective.
But the point of my comment is that this philosophical stance is not a practical reply to what will actually happen in terms of social dynamics and content creation/consumption. Whether we call it "real art" or not, generative tools exist and will be used. So, it makes sense to understand them, even if your goal for doing so is to mitigate their incursions into "real art."
In other words, art must adapt. Which, it always does.
> Suppose I prepared a glass of lemonade and purchased a bottle of vintage wine—one held in high regard by connoisseurs.
I would tell you that there have been results about the blind testing of wines held in high regard by connoisseurs that might make you not want to choose that for a comparison.
The blind tasting studies prove that connoisseurs can't discern the price of wine by taste. They can tell whether or not they like it perfectly well. A good bottle, not an expensive one.
Does this not undermine the premise of the original metaphor (i.e. a glass of lemonade is somehow inferior to a fine wine)? Seems like a lot of goal post shifting.
A glass of lemonade is not inferior. It's a different thing entirely. You can compare good lemonade to bad lemonade, or good wine to bad wine, but asking a group of people who prefer lemonade to compare a glass of lemonade with a glass of wine tells you nothing about the quality of the lemonade or wine in question.
The human poets in the test wrote high-brow poetry, the AI generated low-brow poetry, and the audience of laypeople who were surveyed preferred the low-brow poetry. There's nothing wrong with a straightforward rhyme scheme or anything—it's not bad to be low-brow—but it's not a useful comparison.
I believe you missed the OP's point. Poetry is to be processed. That's a feature, not a bug. Now that we're in an analytical conversation you need to process both papers and OP's words. Like poetry, there is context, things between the lines. Because to write everything explicitly would require a book's worth of text. LLMs are amazing compression machines, but they pale in comparison to what we do. If you're take a step back, you'll even see this comment is littered with that compression itself.
You're making a load of generous claims for yourself without giving your thought process:
> The basic problem is that GPT generates easy poetry
> were comparing to difficult human poets
What's your qualitative process for measuring "easy" vs. "difficult" poetry?
> rating using a bunch of people who don't particuarly like poetry
How do you know these people don't like poetry? Maybe they don't seek it out, but certainly poetry is not just for poetry lovers. Good poetry speaks to anyone.
> the human Sylvia Plath is objectively better by any possible honest measure
Agree. Poetry is the compact written reflection of an expanded or conflicted soul ... it requires lived experience, self awareness, and ability to compress out superfluous details in language.
The question of whether the human soul can be tricked by an AI illusionist and to what degree is a non sequiteur.
Humans eat the meal. At best, and charitably considered, AI eats the menu. Not the same.
In other news I asked AI to convince me it understands American capitalism, and to explain it to me as if I lived in Los Angeles - you know so I could really see/feel it.
It did decently well concluding it will, for example, balance the demand for "ample parking" with supply. Now, I leave you to assess whether that's an awesome and intelligent example, or an AI love song that like Marshall Tucker reminds "can't be wrong."
> the human Sylvia Plath is objectively better by any possible honest measure.
Except for arguably the most important one, creating something that people enjoy. Just because you dont like it doesnt make it worthless. I guess the actual question is do the raters actually get any enjoyment out of the ai poem or do they just intensely dislike both?
Poetry, like humor, involves the use of the reader's expectations, but is typically most effective when subverting those expectations.
There's a lot of bad poetry in the world that just follows readers' expectations. I should know, I've written some of it. Unfortunately, I'd suspect that most readers' understanding of poetry lacks that crucial element of subversion, and so an LLM – which mostly just spits out the most-probable, most-expected next token – looks like what people think poetry is.
An LLM would not have created the curtal sonnet form, because it would've been too busy following the rules of a Shakespearean or Petrarchan sonnet. Similarly, an LLM wouldn't create something that intentionally breaks the pattern of a form in order to convey a sense of brokenness or out-of-place-ness, because it's too locked in on the most-common-denominator from previous inputs. And yet, many of the most powerful works I've read are the ones that can convey a disjointed feeling on purpose – something an LLM is specifically geared not to do.
Poetry aims for the heart, and catches the mind along the way. An LLM does not have the requisite complex emotional intelligence, and is indeed a pretty poor simulation of emotional intelligence.
Consider Auden's Epitaph on a Tyrant, which is powerful because it is so suddenly shocking, as it describes something that sounds perhaps like an artist or author, until it very suddenly doesn't, on the last line:
Perfection, of a kind, was what he was after,
and the poetry he invented was easy to understand;
he knew human folly like the back of his hand,
and was greatly interested in armies and fleets;
when he laughed, respectable senators burst with laughter,
and when he cried the little children died in the streets.
One could literally take Claude 3.5 Sonnet New or o1-preview and disprove this in an hour or two just by prompting the AI to try to exhibit the type of poetry you want and then maybe asking it to do a little bit of automated critique and refinement.
You can also experiment with having a higher temperature, (maybe just for the first draft).
You claim that LLMs can't make poetry like that. I bet they can if you just ask them to.
They could, but they probably won't. Poems like GP are basically using the power of emotional manipulation for good, and companies like Anthropic try very hard to prevent Claude from having that capability.
What he said was basically that it just couldn't create unexpected verses or break form. Since supposedly it can only do the most probable token -- but that's not how sampling works unless you use temperature 0. And it can easily be instructed to break from a strict form (which would create a new variation of the form) for effect if it made sense.
You could also ask it to create a new form and it could. I don't work for you so I don't have to create examples, but anyone who has used the latest SOTA models for any amount of time knows this capability is expected, and if you were really interested then you would try it. If you feel the result isn't very good, ask it to improve it.
I could program even a markov chain to generate a lot of odd unusual potentially interesting stuff, but no one would call any of it a new form of poetry, because establishing something like that requires social status, which robots don't have.
It can be obvious and go to the heart. I'm not sure Wilfred Owen's Dulce et decorum est is anything other than straight down the line, but it made me cry when I first read it.
That said, maybe the subversion is in how the reality is contrasted with the marketing.
I see 'subversion' as more broad. In good poetry, subversion is constantly happening at a micro level, through playing with meaning, meter, word choice. I think it's very easy to identify AI-generated poetry because it lacks any of that -- but on the flip side, if you don't understand the rules, you don't understand how to subvert them.
Even in Dulce et decorum est -- though the meaning's straightforward, there are plenty of small, subversive (and pretty) ideas in the meter. For example, the line "He plunges at me, guttering, choking, drowning" is an unexpected, disruptive staccato that really sounds like guttering, choking, drowning. It's a beautiful poem and is overflowing with such examples.
(I think this applies to art as a whole, not just poetry.)
Even the last line of the poem is a twist on expectations and plays on the irony of nationalism and patriotism, the beautiful latin phrase that inspires young men and the harsh reality of trench warfare that it leads them into. No LLM is going to do that.
Funnily enough, the phrase doesn't even shy away from the fact that you are dying for your country. The poem really gets at the fact that the way you die is horrible and that in itself is enough to counter the romantic notion. The reality of dying for your fatherland is not cleanly and painlessly dropping from your mount in a heroic and successful cavalry charge, even if that is good.
>There's a lot of bad poetry in the world that just follows readers' expectations. I should know, I've written some of it.
Apparently, in 'the good old days (of the internet)', your poetry would be published by yourself, on your webpage - complete with a starry-twinkling background, a 'number of visitors' counter in the bottom right, and in a pink font.
> Notably, participants were more likely to judge AI-generated poems as human-authored than actual human-authored poems
There is clearly a significant difference between AI generated poems and human generated poems.
A random group of people probably do not read poetry. It would be be interesting to see what people who do read poetry regularly do on this. Also, which they rate as good, rather than just "human authored".
I find with both the little AI generated poetry, and the AI generated paintings that show up in my FB feed, both look a bit rubbish. FB is pretty good for experiencing this because the work (usually an image) shows before the cues that it is AI generated in the text and comments.
As someone who reads poetry regularly and has played around a bit with AI-generated poems, AI poems can be quite impressive, but have a certain blandness to them. I can see them conforming very well to the average person's concept of what a poem is, whereas the human written poems might be less pleasingly perfect and average, more stylistically and structurally experimental etc. The LLM version is less likely to play with the rules and expectations for the medium, and more likely to produce something average and "correct", which makes some intuitive sense given the way it works.
I think when asked to rate poetry as human or ai authored that human poetry does look more like a random smattering of semi-related words which, I assume, is what folks think machine generated poetry would look like.
Perfect grammar and perfect meter doesn't read as AI to most folks yet.
LLM’s also have no life experience, so they can wrote poems, but those poems aren’t communicating anything real. Poetry in the vein of Whitman and Dickinson and Plath is very much about a person expressing their very personal experiences and emotions.
I'm reminded of how people are bad at generating and recognizing truly random patterns. I imagine the famous poets have something in their writing that's an outlier. I wonder if the human-authored poetry looks odd enough to cause problems with our fake detectors, while the mediocre grey goo that AI creates better fits expectations.
Anecdote incoming - I read poetry, weekly if you will, over about 15 years now.
I also play with LLM's often, for creative side projects and work commercially with them (prompt engineering stuff)
I don't find it far fetched that individual poetry can be indistinguishable at times when AI generated. I was asking it to write in iambic pentameter (sonnets) and it consistenly got the structure right, it's approach to the provided themes were as complicated or glib as I wanted. But that's all subjective right, which leads me to my main point.
My view of poetry over the years, has always been centred around the poet, the poet living in a time and place. As a generalisation most people buy into the artists life because it may represent some part of themself.
If someone managed to write an intriguing corpus of texts using LLM's that was extolled, I think that would almost be besides the point. What is important is the narrators life, ups and downs, joys and woes. Their ability to convey a memorable story even heavily relying on AI would still be impressive. Anyway sounding a bit wanky I will stop lol
(I do think LLM's write a little too perfect and that is easy to think it is not human, but you can kinda prompt them to throw in errors too so who knows)
The program Racter (which, from what I understand, was a basic MDP) was generating poetry in the 1980s that was published, read, criticized, and presented in museums: https://www.101bananas.com/poems/racter.html
I remember this as one of its poems was used on on the t-shirts of the computing social club that I was part of as a postgrad student:
More than iron
More than lead
More than gold I need electricity
I need it more than I need lamb or pork or lettuce or cucumber
I need it for my dreams
If it was like other poetry generation programs of the 80’s and 90’s, it was generating a lot more crap than gold. People definitely were picking out the most cohesive examples, and probably even improving on them themselves.
> Despite this success, evidence about non-experts’ ability to distinguish AI-generated poetry has been mixed. Non-experts in poetry may use different cues, and be less familiar with the structural requirements of rhyme and meter, than experts in poetry or poetry generation. Gunser and colleagues14 and Rahmeh15 find that human-written poems are evaluated more positively than AI-generated poems. Köbis and Mossink16 finds that when a human chooses the best AI-generated poem (“human-in-the-loop”) participants cannot distinguish AI-generated poems from human-written poems, but when an AI-generated poem is chosen at random (“human-out-of-the-loop”), participants are able to distinguish AI-generated from human-written poems.
This is a huge difference. Writing is a two-step process: idea generation, and selection. The first part is similar to what a randomized algorithm or an LLM might do, in smaller chunks (and indeed, the history of aleatoric processes in creative endeavors is long -- see Oulipo for one example in literature.)
The second step -- selection -- is the heart of creativity. It's about taste. Knowing what is and isn't "good."
When you consider the creative processes of Dada, Duchamp, Brian Eno -- I think it becomes clear that this automation of creative generation is a continuation of existing trends rather than a sudden disruption.
If an LLM were able to, on its own, generate and select poems, independently developing a new style of poetry that resonated with readers -- that would be something else entirely.
Maybe my logic is choppy but I'm unconvinced that it is useful to use (inexperienced) popular sentiment as a marker of some kind of reference for cultural quality here. The average person has read at most 100 poems; people who really like poetry will be reading quite a bit more, and that experience has relevance in a context such as this. I'm more curious what happens when we look at specifically those humans who engage in substantial, regular engagement with a given artform.
The study is interesting but that the general public enjoy the most statistically calculated, 50% grey inoffensive, unchallenging & formulaic poetry isn't too novel -- I don't much read poetry and I'm sure I'd be voting for the AI also.
Just for fun, I had ChatGPT rewrite your comment as a poem. Here’s the result:
————
My logic may be choppy, but I’m not yet sold,
That popular sentiment should be the gold,
For judging culture's worth or depth or art,
When most have seen so little, played no part.
The average person, read just a hundred lines,
While those who love the craft cross many signs,
With pages turned, they gain a richer view,
That shapes what art they’ll seek, and how they choose.
The study’s neat, but here’s what I believe:
The crowd enjoys the safe, the bland, the brief—
The 50% grey, inoffensive way,
The formulaic poems that softly sway.
I don't read much myself, I must admit,
But even I might find that AI's wit
Could speak in ways that sound like common sense,
But lacks the depth of art's true recompense.
I would argue that 99% of popular songs have fairly poor quality lyrics (though it doesn't really matter with regard to artistic merit of the finished product). Singer-songwriters in folk-adjacent genres such as Joni Mitchell, Sufjan Stevens, and Don McLean come closest to actual poetry IMHO.
Yeah, the 1st, 2nd, and 4th, lines are very clunky and sound horrible read aloud with oddly placed syllables. The "view" and "choose" rhyme is horrible (the 3rd line's first rhyme is barely a rhyme at all). The line breaks are just as bad with poor examples of enjambment.
Hmm, I’m not convinced. That’s just the comment in the guise of poetry. There are a bunch of dangling implications in this “poem” where a real master would weave the implications together.
I guess I’d also say it is not only doing the right thing that counts, one must also be doing it for the right reason.
AI “art” is mimicry, burdened by the inevitable mundanity of the majority of its training corpus. The avant garde remains exclusively the domain of a comparative handful of forward thinking humans, in my humble opinion.
Or said another way: AI art is kitschy, and I don’t think it can escape it.
Just for fun, I also rewrote the comment as a poem:
Perhaps my logic flounders, but I'm unconvinced
That popular opinion can be the test
Of how we should judge our culture's best.
Can we who think that poetry's a bore,
And haven't even read five score,
Place ourselves above those who adore?
I laugh at memes and share the swill.
Of course I like the muck and fill
At the bottom of the hill.
It's completely ideal to get an average person's opinion on something as primal as poetry.
Poetry is for everyone, not just poetry connoisseurs. It's a simplified primal expression of language, taking the form of pretty soundbites & snippets, pristine, virginal, uncorrupted by prose and overthinking. Poetry is not the domain of middlebrow academics.
I used to think that, but I think this is only true if you want to measure broad market appeal. Very few things are broadly marketable, and many of them have niches. "Middlebrow academics" are the ones who go to the poetry shelf of their local bookstore and pick up anthologies and they are the ones who go to poetry slams, and so on.
I honestly think that there might be some truth to that.
If you look at Boston Dynamics, these are some of the very best roboticists on the planet, and it's taken decades to get robots that can walk almost as well as humans. I don't think it's incompetence on Boston Dynamics' end, I think it turns out that a lot of the stuff that's trivial and instinctual for us is actually ridiculously expensive to try and do on a computer.
Washing dishes might not be the best example because dishwashers do already exist, and they work pretty well, but having a robot with anywhere near the same level of flexibility and performance as a human hand? I'm not going to say it's impossible obviously, but it seems ridiculously complex.
Probably not, but the difference is that we can generate art at the "pixel" level instead of the "hand" level. Not really a way to do that for most other stuff.
I mean a literal PAINTing is made with paint, obviously.
If you were to try and create a robot hand that painted as well as humans that would probably comparably difficult to any other task involving a human hand. I was saying that we solve the AI art problem by skipping straight to an end state (pixels) instead of the same mechanism a human might.
Why? Plotters do this all the time. It's not hard to imagine something like LLaVA hooked up to a plotter generating whatever the LLM imagines on a paper medium at a stroke level.
Paintings have texture and are three-dimensional, which I'd imagine you'd have realized having seen them in real life.
>You’re being kind of an ass, which is fine but I just felt it should be acknowledged.
Very generous coming from the poster that's saying "we can make a thing like the other thing, assuming it's not actually like the other thing in any way that isn't superficial." It's like you're shooting for a gold medal in the obtuse olympics.
The obtuse thing is you were talking about washing dishes and boston dynamics robots and then when I asked about painting, your response was that doesn't matter because of pixel art.
>Sorry, just a bit confused, isn’t focusing on an artifact of the painting, namely the three dimensionality of it, a superficial detail?
Artifact of the painting? Do you appreciate painting at all? It's fine if you don't, but that's kinda the whole issue in this thread. People who don't appreciating things making value judgments about the things they don't fundamentally care about.
I'm not sure how the texture or three-dimensionality of a painting is 'superficial.' Besides the fact that it literally isn't, it's an actual facet of the painting that is objectively there, it just seems to reflect a lack of understanding on your part. If painting is all superficial anyway, what does it matter that a certain element of it is as well?
I said we have a workaround with art because we can skip to an end state and then you acted like a douchebag and said it’s not the same and that if I had seen true paintings I would understand.
>I said we have a workaround with art because we can skip to an end state
I guess you still don't see the irony of this?
> douchebag and said it’s not the same and that if I had seen true paintings I would understand
I didn't say anything about "true" paintings, and your own insecurity is your own issue you need not project as accusations coming from others.
It's readily apparent if you view paintings that they are not "flat."
If you want to be happy with your pixel art approximation, don't let me stop you. It was you who suggested it was an end run around an actual problem and all I did was point out how that's apparently not true.
> Do you appreciate painting at all? It's fine if you don't, but that's kinda the whole issue in this thread. People who don't appreciating things making value judgments about the things they don't fundamentally care about.
I don't really think you're "asking" a question here, just kind of asserting that you don't think I appreciate paintings. That's fine, albeit a pretty dishonest way to speak, and perhaps my feeble brain isn't smart enough to appreciate paintings in the same way that your brain does. I have been to the Met a bunch of times and looked at the classic paintings and enjoy them, I guess not to your satisfaction though.
I don't think I know what the word "appreciate" means, and I suspect what it really means in this conversation is going to be the details that you think are important.
> I'm not sure how the texture or three-dimensionality of a painting is 'superficial.' Besides the fact that it literally isn't, it's an actual facet of the painting that is objectively there
I think we're using slightly different definitions here.
If someone told me that he only liked his girlfriend because she has DDD breasts, I might say he's really focusing on a superficial detail. The woman having DDD breasts might still be objectively true, but I would still call that focusing on the superficial.
Modern art generation programs can actually generate things like brush strokes and whatnot, but obviously it will be flat when represented on a monitor, so sure if you feel like you can see the depth associated with that then more power to you.
> I didn't say anything about "true" paintings, and your own insecurity is your own issue you need not project as accusations coming from others.
Again, you're speaking dishonestly. You didn't say the word "true", but you did suggest that if I had seen a painting in a museum then I'd appreciate them better. Specifically, you said "I'm curious, have you been to a museum that has good old paintings?"
Now obviously I don't have the same giant brain that you do, so maybe I can't "appreciate" your writing style correctly because I "have a gold medal in being obtuse", but that sure seems like passive aggressive dismissal to a dumbass like me.
>I don't think I know what the word "appreciate" means, and I suspect what it really means in this conversation is going to be the details that you think are important.
Like I said before, your own insecurities are your own issues and not something you need to repeatedly insist I'm poking at. When I say appreciate, my point is that it's a basic element of art and art theory. It's like in wine tasting, there are basic elements to it and one would expect that anyone experienced with wine tasting would have some understanding of those basic elements. Here in painting, the texture of a painting is a basic element of what a painting is.
>If someone told me that he only liked his girlfriend because she has DDD breasts, I might say he's really focusing on a superficial detail. The woman having DDD breasts might still be objectively true, but I would still call that focusing on the superficial.
This isn't an issue of the definition of "superficiality" but what it means to focus. You saying it'd be superficial to emphasize body parts a certain ways. Not that body parts are inherently superficial.
>Modern art generation programs can actually generate things like brush strokes and whatnot, but obviously it will be flat when represented on a monitor, so sure if you feel like you can see the depth associated with that then more power to you.
Yeah, and they'll be flat when printed too. But if you go to a museum and look at some old or new paintings, which is why I asked if you did that, you will immediately see that paintings aren't even remotely flat.
>Again, you're speaking dishonestly. You didn't say the word "true", but you did suggest that if I had seen a painting in a museum then I'd appreciate them better. Specifically, you said "I'm curious, have you been to a museum that has good old paintings?"
No, I suggested that if you had been to a museum you'd have observed basic facets of paintings. You keep putting this value judgment into it (ironic considering your accusations against me) that I'm not. Sure, I might be acting like a bit of a jerk but that's because I found your "Oh yeah, sure I was talking about robots but that doesn't matter with painting because of pixel art" which is fundamentally disingenuous as we were discussing the capabilities of robots' "hands".
> "I'm curious, have you been to a museum that has good old paintings?"
Good old paintings are dense with layers of texture and paint that come off the canvas, and it'd be readily observable to anyone seeing them in person, regardless of whether they had taste or a "giant brain."
> Now obviously I don't have the same giant brain that you do,
> Like I said before, your own insecurities are your own issues and not something you need to repeatedly insist I'm poking at. When I say appreciate, my point is that it's a basic element of art and art theory. It's like in wine tasting, there are basic elements to it and one would expect that anyone experienced with wine tasting would have some understanding of those basic elements. Here in painting, the texture of a painting is a basic element of what a painting is.
This doesn't actually dispute what I said. You're making an assertion that I'm not appreciating a "basic" element of art. You keep saying I'm exposing some insecurity, but this is exactly what some holier-than-thou sanctimonious asswipe who thinks that they truly "get it" would say.
> You saying it'd be superficial to emphasize body parts a certain ways. Not that body parts are inherently superficial.
I'm arguing that emphasizing the 3-dimensionality of this is a superficial decision. I'd argue less-superficial stuff would be something like "what is the intended meaning of this painting".
> "Oh yeah, sure I was talking about robots but that doesn't matter with painting because of pixel art"
I didn't really say that. I said, and this is I believe the third or fourth time I've repeated myself, with AI art I was suggesting we have a workaround to avoid a lot of robotics by fast-forwarding to an end state. You call it "pixel art" to dismiss it and that's fine but it's also pretty stupid.
As I said in my first response (which you very dismissively responded to because fundamentally you don't have much to go on outside of being self-righteous about your superior "appreciation" of art apparently) if you want to be literal of course a PAINTing would require paint. And if the texture of it is really important to you then fan-fucking-tastic, Stable Diffusion and DALL-E isn't going to replace that any time soon. You win I guess? I never disputed this fact.
I used the term "painting" in a less literal sense, and it's actually not in any way weird for me to do so. There are digital art programs like "Corel Painter" and "Paint Shop Pro" and "PaintStorm", which as far as I am aware do not actually involve any real paint. They use the computer display as a metaphor for the canvas, and these things give some facsimile of something like "painting".
You're then of course free to say "well real artist appreciators like me don't consider that painting", and that's fine, but I'd like to point out that you are on Hacker News, so a person using the term "Painting" to mean something like CorelPainter isn't really weird at all, and getting annoyed by that really comes off as pedantic.
> Good old paintings are dense with layers of texture and paint that come off the canvas, and it'd be readily observable to anyone seeing them in person,
I've seen plenty of paintings in person. Clearly it's not as important to me as it is to you. That's fine.
ETA:
Wait, how exactly is it "fundamentally disingenuous" to say we have a workaround for AI art that doesn't involve hands? It's not "disingenuous", I didn't really know what specific aspect of painting you were fucking talking about, so I mentioned we can work at a pixel level. I didn't know that you were going to be a fucking pedantic douche and just say LOL THAT'S NOT PAINTING AND YOU'RE INSECURE AHAHAHHA FUCK OFF!
I think this line of reasoning is really bizarre, as if there's this straight-line path of progress, and then we stop the second it starts doing shit that we consider "fun".
Who is to say that "washing dishes" (to use your example) is a less complicated problem than art, at least in regards to robotics and the like?
it's not a matter of what's complicated, it's a matter of what it replaces. the quote isn't reflecting on what's easiest to solve, it's reflecting on the impact that it has on culture as a whole.
a tangible impact of the current generation of AI tools is they displace and drown out human creations in a flood of throwaway, meaningless garbage. this is amplifying the ongoing conversion of art into "content" that's interacted with in extremely superficial and thoughtless ways.
just because something _can_ be automated doesn't mean it _should_ be. we actively lose something when human creativity is replaced with algorithmically generated content because human creativity is as much a reflection of the state of the art as it is a reflection of the inner life of the person who engages in it. it's a way to learn about one another.
in the context of the broader discussion of "does greater efficiency everywhere actually have any benefit beyond increasing profits," the type of thing being made efficient matters. we don't need more efficient poetry, and the promise of automation and AI should be that it allows us to shrug off things that aren't fulfilling - washing dishes, cleaning the house, so on - and focus on things that are fulfilling and meaningful.
the net impact of these technologies has largely been to devalue or eliminate human beings working in creative roles, people whose work has already largely been devalued and minimized.
it's totally akin to "where's my flying car?" nobody actually cares about the flying car, the point is that as technology marches on, things seem to universally get worse and it's often unclear who the new development is benefitting.
I'll agree that AI has flooded the internet with low-effort slop. I feel like I can make a pretty strong argument that this isn't new, low-effort SEO spam has been a thing for almost as long as search engines have, but it does seem like ChatGPT (and its ilk) has brought that to 11.
> just because something _can_ be automated doesn't mean it _should_ be.
I guess agree to disagree on that. If a machine can do something better that a human, then the machine should do it so that the human can focus on stuff that machines can't do as easily.
> I guess agree to disagree on that. If a machine can do something better that a human, then the machine should do it so that the human can focus on stuff that machines can't do as easily
Machines exist for the pleasure of humans, not the other way around
This isn't some kind of "division of labour, we both have strengths and weaknesses and we should leverage them to fill pur roles best" situation
Machines are tools for humans to use. Humans should not care about "doing the things the machines aren't good at". All that matters is can machines do something that humans do not want to do. If they can't, they aren't a useful machine
Replacing humans in areas that humans are passionate about, forcing humans to compete with machines, is frankly inhuman
> Replacing humans in areas that humans are passionate about, forcing humans to compete with machines, is frankly inhuman
I don't think it's going to "force" anyone out. We didn't suddenly fire all the artists the second that the camera was invented. We didn't stop paying for live concerts the moment that recorded music was available to purchase.
> Machines exist for the pleasure of humans, not the other way around
I am not good at art. I find it pleasurable to be able to generate a picture in a few seconds that I can use for stuff. It allows me to focus on other things that I find fun instead of opening up CorelPainter and spending hours on something that won't look as good as the AI stuff.
I could of course hire someone to do the art for me, but that cost money that I don't really have. The anti-AI people who just parrot "JUST PAY AN ARTIST LOLOLOL!" are dumb if they think that most people just have cash lying around to spend on random bits of custom art.
Last time I checked, I am human. The AI art manages to allow me to enjoy things I wouldn't have been able to easily achieve before.
> as if there's this straight-line path of progress
I think your rebuttal is really bizarre. OP is simply saying what they want AI to do.
> Who is to say that "washing dishes" (to use your example) is a less complicated problem than art
I think dish washing is a bad example, because we have dishwashers. But until the market brings AI and robotic solutions to market at an affordable cost that actually fulfill most people's needs, it will continue to be a net drain on the average person.
You don't get to tell people what they want or need.
I guess what I was getting at (and I'll acknowledge that I didn't word it as well as I should), is something along the lines of: "what if automating art is a necessary step if we want to automate the boring stuff?"
I think you are probably right. But what is really frustrating about this is the lack of alignment on what people want vs what industries need.
We talk so much about how capitalism is built around people's needs, but that betrays another reality, which is that people only get what capitalism produces.
If we were a planned economy we could skip right to an android in everyone's homes. But we wouldn't even have the tech for the android with a planned economy. So instead, we have to feed capitalism what it needs so it can innovate. Which sometimes is just a net loss for everyone in the meantime.
Like any other art, the painful truth is that it is all subjective.
It kills me that despite how elitist I am with the music I listen too, that I have spend decades now carefully curating, there is no such thing as "good music". What we music snobs call "good music" is really just what makes us personally feel good coupled with the ego stroking of self described sophistication.
Could someone link me to a poem that an LLM did that they personally find in some way remarkable or beautiful or moving? Something with a bit of truth and/or beauty in it?
Not something that "rhymes" or has a rough poetic structure. I have only seen complete and utter garbage from LLMs in the poetry realm, not just a bit bad, but jarring and unfeeling. Which is fine, I don't hold it against them personally, ya know. Just it really has been pisspoor.
Which wouldn't be a bother at all, except along with the "poem" there often is someone saying "wow, look what it did! such a good poem!", which has made me suspicious that the person doesn't know how LLMs work nor how to read poetry - only one of which is a really serious loss for them, I suppose.
Anyway, that sounds like I want someone to send me something so I can sh*t on it, but I'd sincerely and happily read anything with an open mind! Is there some excellent poetry hiding out there which I have missed?
Just for kicks, I asked 3.5 Sonnet to write some sentences that didn’t exist in its training data. I googled the output and sure enough they appear to be unique. Most were semi meaningless strings of words, but I thought this one was quite poetic:
“Vintage umbrellas hosted philosophical debates about whether raindrops dream of becoming oceans or if puddles remember their cloudborn youth.”
Oooh that is very surprisingly nice! "cloudborn" is a great word there, how lovely, it slows you down at the right moment for a little explosion then at "youth". I wasn't expecting a genuinely excellent answer here :)
So it turns out then that the trick may be to find a way to get them to avoid aping the oceans of human mediocrity they've been spoonfed! Funny, it's the very same reason some poets go off and live in the woods.
> Could someone link me to a poem that an LLM did that they personally find in some way remarkable or beautiful or moving? Something with a bit of truth and/or beauty in it?
A lot of people couldn't link you to _any_ poem, human-written or ai generated, in response to that question.
I have no idea if you will care for it, but my family and I appreciated what ClosedAI's CustomGPT RAG (and my LLMpal) generated. This is slow loading (the vector database was built from this one big html file), and you can scroll down to see it: https://h0p3.nekoweb.org/#2024.11.20%20-%20Carpe%20Tempus%20...
I definitely do care for it, it's very nice! Thanks for sharing.
I am not sure if I understand exactly how you got it to pump that out, as your blog is a bit hard to locate oneself in and read around in. Like, it's fun and trippy but a tad disorienting. I went off and had a nice re-read about tiddlywiki though - I had learned of it at one stage and thought it looked very interesting, and your blog certainly is tantalising!
I briefly outline the procedure (also in case anyone else wanted to do so) in the page. I export the entire document into a json (~19k entries) and break that up into 20 different json files (so that my work will fit into the space ClosedAI provides for RAGs). The exact prompt sequence is provided on the page (I wrote two one-liners). Almost all of my work in achieving that collaborative output with my LLMpal is in the actual construction of the underlying content of the corpus that was haphazardly fed into its vector database. It did all the rest.
I do appreciate the vertigo of it, `/nod`. The size alone (at ~60MB of text) is already a problem, let alone the topics I handle. There are very few humans who have read even half of it, and, presumably, AI specimens will comprise most of the thorough interpreters of my work. I also anticipate the vast majority of the few humans who more directly interact with my work will increasingly do so mediated through AI.
If you ever write your own, lemme know. I'll read. The proof that I do listen carefully is in the text itself.
I was meant to write a line or two of "test test test", you know, but then the feeling of banging away on the keys was enjoyable, so I did an "elaborate" test post, as a sort of joke. It's a bit "adult", and a bit ridiculous, but it's a thing.
I will read more of your stuff! It looked ripe for investigation, I did click around and throw my eyes here and there. I'll go again. And I have seriously noted that if I did make a tiddlywiki, I'd have at least one reader. That's pretty cool, I must say.
My university English professor in teaching Wallace Stevens to us said "It doesn't have to make sense because it makes sense to the author." I guess the magic of poetry is when you decipher what it is the author is trying to avoid saying plainly. It's not just a man making ice cream, it's about life and death.
A lot of good art requires a good palate. Taking random people off the street and asking them about AI art is going to reveal really stupid results every time.
If we keep doing this, the clickbait articles and research saying AI can do everything better than humans will never cease.
I wonder how well experts in poetry would do? It seems like a strange area to use non-expert humans for evaluation because these days most people reading any poetry are at least enthusiasts.
The missing clause in the headline is: AI poetry is indistinguishable from human poetry and is rated more favorably ... by people who don't read much poetry.
I would make the claim this paper shows the weakness of AI generated text. People are comfortable with familiar things. We know AI is sampling probability distributions in a latent space and by definition don't sample outside of them. Human poetry is creating net new things. Net new ideas which aren't just rehashes of past ideas. And this can definitely be uncomfortable for people, especially the layman.
In other words, general public is afraid of Jazz, they like pop music. But Jazz is where the new stuff is. AI is creating pop music, not Jazz.
Basically an uninteresting conclusion. Of course a "non-expert" reader isn't going to be able to distinguish between AI and Walt Whitman--a "non-expert" reader likely won't even know who Walt Whitman is. "Expertise" is needed to even make the question meaningful.
I consider myself a “sophisticated” reader of poetry. I’m especially fond of Eliot - who is an acquired taste.
I’ve never read the real Eliot poem, and don’t like it much.
I searched for it. It was written before Eliot was known, and before he moved from America to Britain.
That’s important, as I’d assumed it was taking place in Britain, and Eliot’s poetry is extremely location sensitive.
It also seems to be part of a “triplet” - three poems that go together. Eliot was very inconsistent about which poems went with which. The best practice is to include any possible “other parts.”
Reading the other poems in the triplet provides important context, but they’re still not great literature.
Maybe people don’t like it because it was just one of Eliot’s crappier poems?
Edit: the Eliot poem was published shortly after he moved to the UK. But that makes the American context provided by the other poems more important.
Important correction: "English poetry". It is massively different from the most Indo-European poetry which adds very strict https://en.wikipedia.org/wiki/Metrical_foot rules. In Russian poetry, for example, even a single missed metrical foot violation is considered as a severe mistake and would be noticed by every reader. Also all good poets avoid use common rhymes (also verb rhymes) as cliché (which is considered as a mistake there), while AI tends to use common rhymes by design. There are exceptions (like Russian futurism poetry), but other than that AI fails massively.
I think the concept you are describing is called accentual-syllabic verse. And it definitely exists in English too, it just became much less popular for some historical reason.
I am much more familiar with Russian poetry compared to the English (and I still like it more), but I wouldn’t call violating metrical foot format a mistake in general there. It’s an instrument like any other, which can be used well. Moreover, due to generally having longer words compared to English, you can often see unstressed syllables where a stressed one would be expected by the strict interpretation of the metrical foot, otherwise you couldn’t ever use words with four or more syllables in poetry. I am not an expert but I’ve heard arguments that you shouldn’t use the concept of metrical feet for Russian poetry in general because of that.
You don’t have to look at futurism and other avant-garde movements to see experimentation with the strictness of verses. There was plenty of interesting variations in the Silver age of Russian poetry, and even earlier — you can find quite a few examples of different metrical foot variations and violations in Pushkin’s poems, and even Lomonosov, while preferring highly regular structure, was not above varying it when appropriate.
I do agree with your overall point of AI mostly failing to produce interesting poetry, though admittedly I haven’t experimented much with that. But Russian poetry is much less regular than you describe it.
As a software guy I’ve been dabbling with poetry for last 2 years [1] as an extension to my code writing. I think poetry and code go together rather well, there are similarities when it comes to elegance, construction and structure.
Liking or not liking poetry is irrelevant. There are some interesting things happen when one actually writes poetry. You are much closer to the source of ideas that can at times feel unique and inspiring.
Generative LLMs can easily mimic style, any _existing_ style. Can they capture a original higher thought form or is generated poetry an extremely smooth word salad?
Something which seems to be under-appreciated is that poetry is intentionally imprecise. There is something about precision which reduces emotional response. Searching for the double meaning, or metaphorical meaning of a word seems to enhance its ability to produce emotion. If you want a working example of this, try comparing technical documentation (among the most precise writing out there) to any sort of popular poem. The poem is necessarily imprecise, and this imprecision seems to be a fundamental facet of the expression of emotion.
This actually seems to give LLMs and edge: their imprecision potentially matters less, and leaves more room for the reader to fill in the gaps.
One of my sessions basically starts by discussing with the model what we will write and where - some of my past models are Japanese, so they could sometimes suggest specific poems or concepts, but I expect that these will always be outliers and most of the future collaborations will be with people that are not fluent in Japanese.
In those cases, I try to at least agree on some specific image/concept/word and then I go looking for an appropriate haiku (I also rely on help from my calligraphy teacher, who is Japanese too).
But just a couple of days ago I started asking ChatGPT to write haikus for me, based on a short description, and the results seems pretty good to me.
An example:
There is an instagram model who goes by the nick "Sensual snake".
So I asked ChatGPT "please compose for me a haiku, in Japanese, about a sensual snake.
Result:
Sensual Snake
妖蛇の
密やかに這う
月の影
Translation:
The mysterious snake,
silently it slithers,
in the moon's shadow.
I do not speak Japanese, so I asked if it could be possible to just replace Mysterious with Sensual or Sensuous - while still respecting the syllable count, and ChatGPT replied:
The key word 妖蛇 (yōja) already has a layered meaning that can imply "bewitching," "sensual," or "mysterious.”
I also showed this to my teacher and it looked legit to her too (I informed her about how I got this only post-facto).
So, even if it is definitely too early to tell, it looks like AI can do a decent job in the admittedly "rarified" context of Japanese Haiku.
"Where is the Life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?" - Choruses from "The Rock"
The LLM's statistically average everything they've trained on.
I would expect them to do well at any kind of fuzzy emo sort of task, as well as (at almost the other end of the spectrum) in identifying patterns (such as in radiology images, or images of any kind along with any other data set analysis).
But both of these sorts of tasks are estimations, they're not expected to produce precise factually correct results.
The use of non-experts as judges renders this meaningless.
Imagine an algorithm that drew random shapes. Then ask people who've never engaged with art theory or history to compare its output with what they see in a modern art museum. You'd get results similar to this.
Many forms of art are actually quite inaccessible to novices. What poets see in poems or composers hears in compositions is nothing like what someone from the outside sees or hears. It's a language of its own and if you don't speak the language, you simply won't get it.
This is a statement of fact, I'm not judging if its good or bad or what the layman's opinion is actually worth. But it is empirically how communities of artists operate.
The thing is that, for most of society, art exists to serve a utilitarian purpose (make us feel good/bad/inspired/destroyed/whatever other influence). Essentially just to induce a "vibe". Anything other than what most people intuitively feel after consuming it is secondary and, frankly, unimportant in the grand scheme of things.
Except that the direction of art is usually influenced by the artists, without too much calibration to what the uninformed public wants.
It’s actually a really interesting question if this affects the market, at all. Because if AI can generate “art” which satisfies the public, but not artists, will the public just go for that, instead?
Or is it the case that the public never really cared for or consumes that art form, and that the entire market for art is to people who (to at least some degree) specialize in that form?
The main problem with AI is not that it can or can't write poetry as well as humans. It's that it's the next step in a long process to divorce human experience from art so that we lose one more beautiful facet of human existence. And the reason why that's happening is because technology needs to take away the essence of human experience from production so that we can be more irrelevant and anonymous, which is essential to being cogs in the technological production machine.
> We propose that people rate AI poems more highly across all metrics in part because they find AI poems more straightforward. AI-generated poems in our study are generally more accessible than the human-authored poems in our study. In our discrimination study, participants use variations of the phrase “doesn’t make sense” for human-authored poems more often than they do for AI-generated poems
I think this says more about the state of (English based) poetry over the past 150 years than it does about the ability of AI to generate competent poems. With the advent of Modernism, the poetry-industrial complex's ideas about what constituted a Good (English) Poem diverged significantly from what the general (English speaking) population expected a poem to be, how it was constructed, presented, etc. Teaching these new forms of poetry schools left generations of people confused and disinterested. Yet people need poetry in their lives; they found that poetry in popular music and Hallmark card verses.
Of course, popular poetry isn't dead. Poetry book sales in recent years (in the UK) has "boomed"[1][2]. I'll not argue the merits of the poetry being generated by the latest crop of Poet Superstars; at the end of the day their work sells because people find comfort and joy in it - and that is a positive outcome in my view!
> Given people’s difficulties identifying machine-written texts, and their apparent trust that AI will not generate imitations of human experience, it may be worthwhile for governments to pursue regulations regarding transparency in the use of AI systems.
I think AI does threaten the careers of current Poetry Superstars. Building a website to pair an AI generated outline image with some cozy AI generated verses about a given situation/emotion/discovery should be an easy project to build. Allowing users to personalise the output so they can use the results as a gift for loved ones etc might be a viable product?
But I don't see much value in forcing anyone using AI to produce creative assets to label the output as such. For one thing, there's no guarantee that anyone using those assets will maintain the labelling. A much better approach would be for poets who don't use AI to help them craft poems to label their work as "100% Human" - in the years to come it might even become a positive selling point!
I believe this will happen in many human domains, but it doesn't really matter. Nobody is going to stop writing poetry because of this and I doubt there's much of an audience for AI generated poetry.
There are forms of poetry/art/etc. that we've never thought of, that have never been conceived before. An LLM being what it is won't conceive these. Humans will continue to generate language the pattern/structure/meaning of which has never been generated by LLMs before.
Thought experiment to prove this: if you trained an LLM on every utterance of human language before the 5th century BC would you get any idea we would recognize as modern?
I think that's the wrong perspective on it. People want to compare how an AI does at one thing to how the best people in the world do at that thing.
What you really want to do is compare how good the AI is at that thing compared to the _average person_ at that thing, and I would guess that generative AI outclasses the average human at almost every task that it's even moderately competent at.
People like to point out how it can't pass the bar exam or answer simple math questions or whatever, and how that _proves_ that it's not intelligent or can't use reasoning when _most people_ would also fail at the same tasks.
Almost all the Gen AI models already have super human competency if you judge it across _everything it can do_.
We're deluding ourselves by thinking it's happening to poetry! This study is ignorant and dishonest, it should have never been published in the first place: https://cs.nyu.edu/~davise/papers/GPT-Poetry.pdf
AI research is worse than all the social sciences combined when it comes to provocative titles/abstracts that are not supported by the actual data.
The basic problem is that GPT generates easy poetry, and the authors were comparing to difficult human poets like Walt Whitman and Emily Dickinson, and rating using a bunch of people who don't particuarly like poetry. If you actually do like poetry, comparing PlathGPT to Sylvia Plath is so offensive as to be misanthropic: the human Sylvia Plath is objectively better by any possible honest measure. I don't think the authors have any interest in being honest: ignorance is no excuse for something like this.
Given the audience, it would be much more honest if the authors compared with popular poets, like Robert Frost, whose work is sophisticated but approachable.