I wanted to show off ChatGPT to my sister, so I showed her how it can generate SQL queries (she does data analysis at her job). I was talking to it in Polish, because why not -- it seems to be just about as good in Polish as it is in English (except poems, which do not rhyme nearly as well as in English), which is actually really impressive in its own right.
My prompt was "consider a table 'hospitals', with columns (...), and table 'procedures', with columns (...). Write an SQL query that shows which hospital had highest number of procedures in June".
I wrote the above in Polish, and one thing that impressed me was that while June is "czerwiec" in Polish, "in June" is "w czerwcu". Nevertheless, the produced SQL correctly said "WHERE miesiac = 'czerwiec'".
Anyway, the SQL was something like "SELECT ... as procedures_amount, (...)", and when when I showed this to my sister, her, being the grammar pedant she is, immediately complained "procedures_count, not amount!". So, I ask the ChatGPT "Why was my sister irate and exclaimed "procedures_count, not amount!"?". And the bot answers, correctly, that "amount" is incorrect according to the rules of Polish grammar (remember, this all happens in Polish)".
And now, the part that blew my mind. The bot starts explaining what the rules are: that you use "count" for measurable objects, but "amount" for uncountable ones. However, it did not use the correct word for "uncountable" ("niepoliczalny"). Instead, it used a completely made up word, "niemiarytmiczny". This word does not exist, as you can confirm by googling. However, a Polish speaker is actually likely to completely miss that, because this word actually sounds quite legible and fits the intended meaning.
Again, to drive this point home: the bot was at that moment lacking a word for a concept it understood, so it made up a word that seemed to it to convey its meaning, and it actually got it right. It blew my mind.
The reason why it can "make up words" is because it does not use "words", but "tokens" (which can be smaller or larger than a single word).
In this specific case, it probably understands that the token "nie" can be prepended to (almost) any polish word (like "un" in english) to generate a negation of that word.
Cool story, though.
EDIT: Note that (for example) Google Translate has no problem tackling the word "niemiarytmiczny" and "correctly" translating it into english.
It’s not about “nie” (as indeed, appending it to adjectives does form negations). The word “miarytmiczny” does not exist either. However, it will likely be understood by native speakers anyway, as the adjective made from the noun “miara”, meaning “measure”, even though the correct derivative adjective is “mierzalny” (measurable).
In that case, Google Translate's attempt at parsing that word completely failed: it seems to interpret it as "niemi-arytmiczny" or "niemia-arytmiczny", rather than as "nie-miarytmiczny". Funny.
My prompt was "consider a table 'hospitals', with columns (...), and table 'procedures', with columns (...). Write an SQL query that shows which hospital had highest number of procedures in June".
I wrote the above in Polish, and one thing that impressed me was that while June is "czerwiec" in Polish, "in June" is "w czerwcu". Nevertheless, the produced SQL correctly said "WHERE miesiac = 'czerwiec'".
Anyway, the SQL was something like "SELECT ... as procedures_amount, (...)", and when when I showed this to my sister, her, being the grammar pedant she is, immediately complained "procedures_count, not amount!". So, I ask the ChatGPT "Why was my sister irate and exclaimed "procedures_count, not amount!"?". And the bot answers, correctly, that "amount" is incorrect according to the rules of Polish grammar (remember, this all happens in Polish)".
And now, the part that blew my mind. The bot starts explaining what the rules are: that you use "count" for measurable objects, but "amount" for uncountable ones. However, it did not use the correct word for "uncountable" ("niepoliczalny"). Instead, it used a completely made up word, "niemiarytmiczny". This word does not exist, as you can confirm by googling. However, a Polish speaker is actually likely to completely miss that, because this word actually sounds quite legible and fits the intended meaning.
Again, to drive this point home: the bot was at that moment lacking a word for a concept it understood, so it made up a word that seemed to it to convey its meaning, and it actually got it right. It blew my mind.
https://drive.google.com/file/d/1jRXiQc1g6M64S0rmWKPX6RlFDm6...