Hacker News new | past | comments | ask | show | jobs | submit login
How babies and young children learn to understand language (lithub.com)
233 points by Hooke 4 months ago | hide | past | favorite | 190 comments



Words beginning and endings are learned too because people raising young kids are spending a great deal of time decomposing and telling them simple words. Repeating "cat", "cat", "cat" with pauses.

Statistical learning --and there are studies about this-- is also obvious when multi-lingual kids make up words that do not exists.

They'll use words from one of the language they know to come up with words (or words beginning / ending) in another language. These words, statistically, could make sense. And they'll pronounce them "properly". Yet they don't exist.

So it's not just the words: it's the pronunciation too.

As the father of a fully bilingual kid (french / english) that was fascinating to watch.


This really is a thing. My son seems to have learnt the grammar, but not so much vocabulary of English, so he'll borrow vocabulary from Spanish, and force English grammar over it to invent entirely new words.

Example: he doesn't know how to say 'to lie' in English ('mentir' in Spanish). So he says: "You are _menting_" to say that somebody is lying.


> Words beginning and endings are learned too because people raising young kids are spending a great deal of time decomposing and telling them simple words. Repeating "cat", "cat", "cat" with pauses.

I'm pretty sure that years ago I read in a textbook somewhere that the effect size for this kind of thing is very small relative to other "parent talk" behaviors like exaggerated intonation?

(But I also wouldn't be surprised if that was just the author's take and not at all a settled matter.)


Thank you for this. My kids are bilingual in Dutch / English and often make up such words.

I like that they'll use words from different languages in the same sentence. School disagrees though. I've once heard someone say that bilingually raised people have a higher chance of becoming schizophrenic, although I really doubt that has any major influence.


I have one good reason for not doing that, the main reason for language is communication, if you mix two languages, the other person needs to know two languages to understand you, instead of 1,so we try to stick to one, chosen based on context


Same here. One thing that was funny watching my daughter making the same mistakes italian students make when learning english, even though she was born and raised in an English country. Learning a language that's phonetic makes reading so much easier, when dealing with English it really throws you off the lack of the feature


Sometimes I feel a little bad we're foisting 4 languages on our kids, plus two more within a 20 minute drive so I expect some basics there will be picked up too.

Our eldest likes language battles: she's corrects me in another language but the game is that I must continue in the 'wrong' one.


Its also contextual maps for statistics. Spoon and fridge are always mentioned in kitchen context.

(Actor + wordsequence + location) * repetition_as_filter makes for a significant data point in language. Thus words are first uttered in context and a feedback is expected.

"pot" hits pot with spoon in kitchen.


> Think about listening to a language unknown to you, one with different words, grammar and prosody. You will be at an utter loss to identify its words, let alone their meaning.

My experience learning French basically. I'd say understanding where one word ends and another starts was much easier for English and German. On paper I was able to grasp the rough meaning very quickly thanks to vocabulary shared with English and Latin, but listening took a year: I was facing a solid wall of sound, no cracks.


This may come off as a little inappropriate but one thing I think about when learning another language is the usual mistakes or quirks that native speakers of that language display when speaking English. It turns out that quite often those are reflective of the correct form in their language.

It's helped me sound more natural in Japanese and Hebrew.


I speak a bunch of languages, and this is especially helpful when learning languages that have a specific own idiomatic sentence structure. You get a pretty good insight into how people form sentences in their native language based on how they garble English. You can often derive where people are from based on those typical mistakes, even in a few sentences on an online forum.


I was facing the same issue when I learned English (I'm French), and I found that watching TV Shows with the subtitles helped a lot.

So you get to hear that wall of sounds, but you get the words spelled out under the picture.


YouTube has a lot of informal conversational content in all sorts of languages, too. Podcasts etc.


Helps to understand the language easier


My strategy is to keep listening to the same recording on repeat. 1000 times if needed. It is good if it is a high-quality recording. After a while, my brain starts picking out the word.s


I can speak German, English and Dutch (which are very close Germanic languages) and also understand a bit of French and Italian. I guess I could learn their basics relatively quickly if I tried hard enough.

However, I actually need to learn Croatian, a Slavic language. This is extremely difficult for me and I make hardly any progress. The vocabulary is so different from everything else I know, I just can't remember the words. The grammar is quite challenging as well.


I did give some thought whether your native language makes it easier or harder to learn others, but all I have is anecdotal cases from my life. I wonder if there are any scientific studies on the topic.

E.g. a curious thing I noticed is the frequent complaint about Latin being very hard to learn because of the seven grammatical cases, three genders and morphology -- words just don't stay the same. As native Russian speaker I find these things absolutely normal and easy to understand, but I can image this must be a nightmare after English.

Slavic languages you mention are very curious in how different yet how similar they all are. One of the most impressive thing I saw with respect to learning languages is the Interslavic language [1]. Apparently if you speak any of Slavic languages you can understand it quite well, even though you usually are completely lost with most of the other Slavic languages. You still need to learn to speak it, but the fact that you understand foreigners with zero training just blows my mind. Feels like you awakened the memory of your ancestors of something.

[1] https://www.youtube.com/watch?v=NztgXMLwv4A&t=122s


I'm Polish, I've never learnt any foreign Slavic languages, but I can mostly understand Ukrainian, Belarusian and Slovak from passive exposure on a few trips.

The experience is weird - when I first went to Ukraine I couldn't understand almost anything. After a few days it suddenly "clicked" and I realized how the most common sounds and word endings relate between Polish and Ukrainian - and since that moment I basically got all the words with the same roots for free (which is like half the language). It also gave me Belarusian as a side effect :)

Of course it's not actual language speaking, I'm just understanding every other word and connecting the dots.

With Slovak it was even quicker, but somehow it hasn't given me Czech - their pronuciation is just too weird, despite the fact it's almost the same in writing :).

I'm not sure Interslavic provides much value. For me it's basically Slovak, so if you come from East Slavic language you'd probably get the same benefit spending that time learning some Slovak, and you'd then know some actual language instead of an artificial one.


There’s a surprising amount of vocabulary difference across Czech and Slovak, in addition to the pronunciation differences you mention. Pre-split everyone on both sides grew up hearing bilingual broadcasts, so they picked up the differences ambiently. I’ve heard that it’s a lot less of a given whether the younger generations presumptively understand each other these days. But I’ve also seen evidence that many still choose to engage with people / content / opportunities on the other side enough to get to solid working familiarity anyway.

This comes from limited first-hand experience and more extensive second-hand cross-generational experience. Take it as you will.


Slovak children usually grow up with Czech narrated cartoons, so they are able to understand Czech more easily. I heard that Czech children does not receive this language training for Slovak, so they have a harder time understanding Slovak language. I never "learned" Czech in school but I watched a lot of cartoons as a child (born '93) and read books in Czech so I have no problem understanding Czech language as a Slovak. I have a hard time understanding Polish though, never clicked for me.


I can confirm this is true. Czech republic is cca 2x the population size of Slovakia and its historically more developed part, so during one state union a lot of media were in czech language and it became our second language without thinking about it. Also Czechs did get a decent exposure to slovakian language.

But if there is no exposure, its becomes visible how grammar is very similar, but most words are just a bit different (very few are completely different), and pronunciation varies so much across whole region (even within given country) that its not easy or even possible to understand each other out of blue, without prior exposure.

I got some exposure to Polish TV during 80s, since commies couldn't put together more than 2-3 channels on TV and those were anyway pretty bland. I can cca understand it, but can't say a single sentence well enough. If I read polish text, I have to read it loud in my head and then I grok it easily, otherwise too much 'cz', 'w', words are too long etc and I lose meaning very quickly.

But in general Polish is a bit further away from either Slovak or Czech languages. We were and still are literal brothers (CZ and SK), extremely similar in so many regards, still see no good rational reason why we split up (of course I know real reasons, but those are nasty as are the people responsible for the split).


The US state department has some estimates of time to become proficient in target languages for English speakers. Germanic and Latin based languages take the least amount of effort:

https://www.state.gov/foreign-language-training/


Interesting link. Some of those estimates are very, very optimistic though...


It helps that staff are participating in these programs full-time, no other duties required.

A story I've heard first hand is that after the program, department staff can discuss diplomacy in the target language, but struggled to order a coffee!


As a dutch person i once spend 2 years in Ukraine. Hardly anyone spoke decent english there, so i was forced to learn (russian) the language. Within a year I was able to speak it moderately. Key is to focus on words only. Dont care about gramar. Its not important and will be fine later on. First learn 2000 to 3000 words and you will be able to say a lot of things.


This has been an approach I've seen used to great effect by people learning your language! My wife strings together Dutch words with English grammar (we're both American) and it's a common "mistake" that I experience when visiting a taalcafe.

Our brains are all extremely capable of moving a few words so we can understand things in context. When speaking Dutch, I'm spending a lot of time thinking about a verb, then attempting to string it together into something comprehensible.

What I also found to be almost universally true, is that if you are learning a less "popular" language (like Dutch or Ukrainian), just actually give a shit enough to learn more than "yes/no/please/thank you/my name is" will garner you a lot of good will from native speakers. Some countries get so many tourists that hearing the same 10 words butchered over and over and over and over and over again eventually starts to wear down on locals, making them jaded.

When you can walk into a room and somewhat confidently hold your ground, you become interesting and a novelty that people want to interact with. I went to a block party last weekend and spoke to most of the people there throughout the evening. Sure, everyone knew I was "foreign" and occasionally I had to ask someone to repeat something or re-word it, but that's a minor complaint for them considering the other option is to speak back to me in a language they don't have as good of control over. Despite popular belief, even in many "English friendly" countries, the normal citizens aren't actually that comfortable speaking unless they work in an environment that demands they speak English every day. This goes doubly so for older people. My next door neighbor is about 75 years old and speaks English pretty well (worked at Philips for 20+ years), but the _quality_ of our conversations went up 3x when I could understand enough Dutch for them to speak naturally to me.

All that to say, I agree with you! Vocabulary and basic grammar to get you started and then after that it's all about learning words and practice practice practice.


(As someone who has lived in different countries) Making an effort to learn "enough" of a language of the country you expect to be living in for a while is a gesture of decency. I feel that good will is justified, especially in cultures that see a lot of immigration and tourism. People try to live in a coherent society, which becomes hard if a large group of people is incapable of communicating naturally and reasonably fluently. Most Western European and Northern European people speak excellent English, but it's not nice to force a large group of people to switch to English and hamper natural communication style because somebody is slightly disrespectful and lazy.

You can get away with a lot in Dutch and especially Flemish Dutch though, because the local dialects are so strong (for being a relatively small area). Unless you look exotic, people don't always immediately pick you out as a non-native speaker.


Do you have any observations about Frisian ? It is said to be the language closest to English.


> Key is to focus on words only. Dont care about gramar.

Absolutely! The way they teach foreign languages in school is insane. "OK, you know 20 words and can't say a thing, now it's time to learn past tense".


And before learning words, learning and practicing phonology. Then, add vocabulary while still practicing phonology. It's insane people and institutions assume one could be understood when speaking a language when its most basic building block is not acquired.


Can I ask what language you're thinking of here? When I was taking classes, the A0-A1 level class was all in the present tense for the full length of the course. We might have touched on past tense in the last lesson and most of the books I've seen for this language (Dutch), structure things as present/future tense first, and then past tense after.


These are my memories of learning English in school as a foreign language. French courses that my wife took recently also had past and future tenses in A1.

Even if it's not the tenses, still schools typically put a lot of focus on grammar and order of words but not even nearly enough into speaking. I have an suspicion that these corses are modelled after native-language programs which rightfully focus on grammar because everyone knows how to speak already. But starting from it is madness.


If you can say “subject verb noun” that’s good enough in many cases to at least get your point across.

Anecdotally, I know that I can understand people speaking English with what a school teacher would consider atrocious grammar, as long as the words are pronounced close enough to be recognizable.


the language of Ukraine is Ukrainian


As a foreigner a lot more useful to learn russian. Everybody speaks it in Ukraine. And other countries as well. Even at our university in Ukraine, russian was the main language amongst foreign students. Its sensible right now true.


As a foreigner a lot more useful to learn Russian.

Given that the state language is Ukrainian, and its overall dominance in media and culture -- there's no way this statement could possibly make sense.

Everybody speaks it in Ukraine.

This gets repeated a lot, but it's just not true. It's true that virtually everyone has some working comprehension of Russian because of earlier Soviet influences, and because the two languages are so similar (and in many parts of the country, the "Ukrainian" that is spoken is highly Surzhyk-influenced).

But realistically only about 70-80 percent of the population speak Russian fluently and comfortably. Given a choice, the vast majority would clearly prefer to speak Ukrainian (and many people have been switching voluntarily as a matter of preference since 2014; the government's mildly coercive efforts having nothing to do with this, really).

Even at our university in Ukraine, Russian was the main language amongst foreign students.

Probably because it's the only one among the two that they were able to study before coming there (and because they saw Russian as being more useful in other countries, as you say).

And even so, this applies only to certain universities in certain cities.


I spend many years till couple of months before the invasion in Ukraine. I have never met someone who doesnt speak russian. Only the elderly people have sometimes a mixed slang between russian and ukrainian. But other than that everyone speaks it. There are some hardcore nationalists connected to Bandera (pro nazi group) that refuse to speak russian, but remaining people dont care and speak both.


I have never met someone who doesn't speak Russian.

Then you haven't traveled broadly in Ukraine. And more importantly you're missing the point. The vast majority do speak and understand a reasonable amount of Russian (hence they will almost never object when you use it with them; they get that you're a foreigner and are doing the best you can) -- but they don't speak it fluently and comfortably, and it's not their preferred language in everyday use.

Only the elderly people have sometimes a mixed slang

It's more prevalent among the older set of course, but still this is just not true across the board. Surzhyk (or less pejoratively: Russian borrowings/breakings) are everywhere, though they are often subtle and it may take some training to detect them.

Part of the problem is that there are no well-defined boundaries (and there's only a barely defined notion of what constitutes "standard Ukrainian"). They're literally still in the process of cleaning up the nation's preeminent (and clearly Soviet-, if not exactly Surzhyk-influenced) dictionary.

There are some hardcore nationalists connected to Bandera (pro nazi group) that refuse to speak Russian

Now you're getting into pure BS territory.

This is obviously something you've read or something you've heard said a lot, but not something you know from direct observation.


Not a dog in the fight, at least this particular fight, but this might sound to many as "just learn the language of the oppressor". This gets thorny real quickly.


I had the same problem coming from the same Germanic/Romanic languages as you, trying to learn Finnish. It's not just the vocabulary, the whole construction and modality is often different, and it's hard to map things one on one. Then I learned Swedish, and it was just a funny dialect between Dutch and German.


Yes Finnish is way out there. We're raising our kid bilingual EN/FI, and I am really curious to know how he will align these two languages in his head, and what kinds of insights he will get. I suspect he will be a sponge for additional languages.


A big problem here is that slowed-down "teacher talk" does French learners an incredible disservice.

The classroom version of the language is at least a little bit different from natural, connected speech in every language I've studied. But classroom French is effectively a completely different dialect from the everyday spoken language.


Shameless plug: try Latudio - https://www.latudio.com/ - we have a listening-only approach, but you can pause sentence anytime and tap on words to see the translation. I'd say give it a try and let me know if that works for you, I'd be happy to hear.


fyi one of the App Store screenshots misleadingly shows "German" language in "Preview" when in reality it doesn't seem to be available at all


Oh, thanks for letting me now, we'll fix it with our next update. German is in preparation but not ready for preview yet, unfortunately.


I think French is notoriously difficult to understand from speaking language because they tie together so many words and sounds. Contrasted with a language such as Finnish, which is hard to learn, but relatively easy to understand and write because both pronunciation and spelling are just what you'd expect.


This may be a hot take, but I'd argue that there is no such thing as a language that's easier or harder to understand. Just languages that are more or less different from the one you grew up speaking. If they're more similar, then the learner can repurpose skills they already had from their native language. But that's not ease, per se, it's getting a head start.

To take French phonology as an example: objectively speaking, enchainement is an aid to comprehension, not an impediment. Now that I'm used to it, I find non-native speakers who don't do it to be harder to understand because they've effectively dropped an entire information channel from the language. Which isn't to say I didn't have a hard time getting used to it when I was learning. I absolutely did. But that's not because enchainement is inherently difficult; it's because I first had to un-learn some assumptions about the structure of language and how spoken word morphology works that I was bringing with me from my native language. And because I was being hindered by pedagogical methods that, in effect, try to hide the problem instead of solving it.


Native English-speaker here, moved to a German-speaking nation 20 years ago and have become a fluent German speaker as a result. Just want to add my 2c about that wall of sound ..

At the beginning I found it very difficult to parse German as I heard it on the street and in general life circumstances. It wasn't until I rigorously started looking up words I 'thought' I heard, with a dictionary, and gained about 100 of these, before I could accurately parse a spoken sentence.

I think its very important to use a dictionary when learning another language. It wasn't until I got a massive English/German dictionary that I felt I stood a chance.

Another thing that really helped was using sub-titles and watching TV, even if it was a show I wasn't interested in - it taught me so many words that I had heard, but not recognized.


They seem to teach it pretty fast to any idiot in the French Foreign Legion from what little I know. I have a feeling people learn fast with the 'right' incentives.


90% of recruits fail, so I wouldn't say they've got some awesome fool proof method.


What % fail due to lack of language skills? Or did you really mean 90%. Source?


They also actively teach it to them. They are immersed around the clock, using other languages is forbidden (recalcitrant recruits used to get hazed in the past), and they put in efforts to teach it in a practical and conversational way.


This has always been my struggle with French as well. I have never quite able to crack the aural/listening component of the language.

How did you overcome the “wall”?


Unfortunately there’s no trick. You get better at listening by listening a lot.

Most people just severely underestimate the amount of work it is. You probably need 1,000 hours of listening to be decent, and 2,000 to be strong. If you practice 30 minutes a day, that will take 11 years.

So the only “trick” is that you need to find things that you genuinely enjoy doing in French. So that you can practice for multiple hours each day without burning out.

I’m a native English speaker who has a very high level in Spanish. It takes a long time for your brain to figure out how to decode the type of mumbling people use in casual speech.

Yes, practice vocabulary, practice grammar, read. But you’re probably already doing this. You just need to listen to a ton of stuff right at the edge of your abilities and you’ll notice improvements every few months.

My one final tip would be to not get in your head so much. Literally everyone who has learned a language will tell you that if you review vocab, do grammar exercises, and listen to the language for 1-3 hours every day, you’ll learn it.

But it happens so slowly that it doesn’t feel like it will work while you’re on the journey. It’s sort of like the gym: if you lift every day and eat healthy, you will have muscles in a year or two. But it’s really demotivating because you won’t notice a drastic difference even after 3 months.


I'm not the person you asked, but I'm in the process of learning French. I'm a native English speaker who had a much easier time with listening comprehension for Spanish and German, my other languages.

What I find helps me to make progress is two things:

1) prioritize vocabulary (you need to know a word to have any hope of recognizing it in speech)

2) listen to "comprehensible input" at your level. I like this guy: https://www.youtube.com/@FrenchComprehensibleInput who has levels labeled for his vids, I also like https://www.youtube.com/@wanderingfrench because I'm interested in Canadian French and I find her especially clear and easy to understand, as well as charismatic.


I respectfully disagree wrt the prioritization of vocabulary. Yes, you do need the basics covered (a few thousand words). But what I realized is that if I am listening or reading a sentence with an unknown word in it, in most cases I just figure out from the context what that unknown word means (I check it against a dictionary and 9 times out of 10 I'm close enough).

Obviously this does not work if too many words in the sentence are unknown. And I'm not saying not to learn new words. But it is far more important in my opinion to read / listen so much that you get faster and faster. Especially if you are listening to speech, where you can't pause / rewind, and if you spend too much time on one thing you just get left behind entirely. Don't know a word - just skip it / ignore it and concentrate on the whole stream.


Listen a ton and then some more. For the first couple hundred hours or so preferably with subtitles, there will come a point where you realize you rely on the subtitles less and less. That’s the point where you can start to turn them of.

Audiobooks also work.

And work on your grammar and vocabulary. Listening gets easier if you intuitively know what you should hear (tenses, conjugations, plural or singular words etc).


Aside from my native language I speak three foreign languages, and I just started learning the fourth one. Besides that, I wasted a lot of time on another language that did not stick, so I kinda know what works and what doesn't.

1. Consistency. Make sure you practice a little every day. For example, I use an app that tells me every day what vocabulary I should practice that day. It takes ten minutes of my day, and does wonders.

2. Communication. Start talking to someone as soon as possible. If you're learning a Germanic or Romance language, after 6 months you should be able to find someone online to chat with. Of course it's going to take you 15 minutes with a dictionary to understand the other person's message and another 15 minutes to write the reply, but it truly does wonders because it allows your brain to see the language as a language.

3. Fun exercise: try vocalizing your internal monologue in foreign language. Don't focus on correctness at all, it's about familiarizing yourself with the language.

4. Find some interesting media in your target language. It's going to be difficult because most important stuff is available in English, but for example you might try movies or YouTube channels. Especially the latter is great because YouTube videos rarely ever have English translation, so you know that either you watch it in your target language or not at all. There are apps that allow you to listen to foreign radio stations. Why not doomscroll in your target language.

5. Don't give up until you reach level B2 and you can talk without consciously thinking about it.

6. Understanding native speakers talking naturally to each other is literally the hardest part, so don't treat it as a benchmark until you reach fluency.

7. A common mistake is to treat the whole thing as a "sequence to sequence" task and think about the target language in the context of your native one. Your actual goal of learning the word X is to have your brain understand the abstract concept behind it.

8. It's going to suck and you'll hate it. There's no way around it. Keep practicing and adjusting your techniques to your liking. One day something will "click" and you'll actually "feel" the language.


Personally as an English/Japanese bilingual speaker and a programmer (programming languages!), I think the most important thing is incentive.

Why do you want or need to learn that language? Is it important? Is it valuable? Is it fulfilling?

As I've found, the biggest "wall" to learning languages is convincing yourself that learning the language is worth the immense hassle and effort.


Yes! This for sure plays a huge role as well. How many times have we as programmers picked up a new language, done the 'Hello World', and then set it down and went back to one we already know? Spoken languages are much the same. They all include various concepts that are shared (verbs, nouns, adjectives, grammar concepts, loops, variables, functions, etc) but the actual process of putting them together in the idiomatic order is the hard part.

But without the carrot at the end of the stick, it's unlikely you'll be disciplined enough to stick to it. A new job is a natural place for many people to pick up a new programming language because everything else is also new. For languages it is no different. Can you learn a language outside of the country where it's commonly spoken? Sure, but it's going to be 100x more difficult and you have to generate all of the discipline yourself. If you can crack that problem and build the discipline, then you've basically won the war if you can hold in there "long enough for your enemy to starve." :D


Here is my trick:

I take a recording of a conversation in the language, one that's made for language learners.

Then I listen to it until I know it BY HEART.

Usually that takes about 100 listens.

Every time I listen, I understand a bit more. Sometimes I look at the transcript to understand a section that evades me.

I don't move on until I can recite the whole thing BY HEART.

Then, I take the next dialogue.


This is next level! Thinking back to my language textbooks, this would be brutal for me on a mental level.

Do you think it could work as well with a short story or novella? Perhaps one of the books by Olly Richards? I find the dialogues in textbooks absolutely mind numbing most of the time and while easy books for learners aren't that interesting anyway, at least it's a full story and not people ordering lunch or a trip to the doctor for the 99th time!


The thing is, the dialogue has to be short to really stick to your brain!

Like, less than a minute.

Repetition is key, and if the text is too long, it’s not repeated enough.

To me, it’s not mind-numbing until I can “sing along” because I know it by heart. And then I move on to the next, slightly harder, text.

You don’t have to always listen to it actively, you can put it in while doing other work.

Some language self-learner books are better than others.


I'd really like to give this method a go. I'm trying to learn French and struggling with the listening component. Do you have any sources for dialogues that you used?


I used the Langenscheidt and PONS courses for Polish in the past. But I think they are only available in German.


Ah okay. I think most of the dialogues from the books I'm thinking of were between 3-5 minutes. Something shorter might work better, indeed.


I've never had a French class or any language courses in English so I just don't know; do foreign language classes in general not start with phonetics? Vowels, consonants, syllables, then words. From there is a long road ahead but that should lets you(at least seemed to let my small brain) break down pure stream of audio in TVs and media content into spaced syllables that occasionally forms into words.


I find it's sometimes quite difficult to tell where the word boundaries are when a person with a strong French accent is speaking English. I think the difficulty is because French encourages the emphasis to be in completely different places to English.


It clicked after a year of practice, the change was quick and noticeable. I started learning after moving to France and it helps a great deal even though at work it's English. Not sure how long it would take if I was learning it elsewhere.


It definitely takes longer if you don't live in an environment where it's spoken. Before I moved to NL from the US, I did duolingo every day for 3 months (from the time I accepted the offer to moving) and by the time I got on the plane, I had moved from the beginner module to the intermediate one. After landing and getting settled, I ventured out to really test my skills at the local supermarket and I was completely and utterly useless. Perhaps the only thing I could recognize was the total at the end that I needed to pay.

We take for granted regional dialects when speaking our own language and in some languages, dialects can be a huge component of the language. Being able to just go outside and sit on a bench and listen to people speak in the language you want to learn is a huge advantage. Every day, you will passively pick up some words from the context of passers-by. You'll also start to subconsciously figure out what words are "common" in a different way that the lists of "1000 most common words in X langauge" can convey. You also learn how to put them together in context and how some words travel together within a certain context.

If I was tasked with learning a language abroad, I would spend 2-4 hours a day consuming native content and make it a priority to speak with a coach online 1-2 times a week, trying to work my way up to holding a conversation for an hour straight.

After living here for almost 2 years now, I can just barely get to 1 hour when speaking to my boss (super grateful btw!) but I'm really drained mentally by the end of it. With each meeting it gets easier though and now I can somewhat even make out the local dialect. Learning a language is a numbers game and the reason we often attribute superior language learning skills to kids is simply due to the fact that they have so much free time to listen, absorb, speak, and make mistakes without people judging them because they're kids. I'm fully convinced that adults can learn just as fast or faster, but our own egos often get in the way to putting in the work to learn.


The good news is that now that you’ve cracked French, you’ll find Spanish, Italian, and Portuguese easy. They’re all basically the same language with divergent dialects, but practically identical grammars and structure.


Spanish, Catalan, Galician, Portuguese, Italian, are all in a cluster together and a bit apart from French. I can read French, but I think it's equidistant from the Iberian/Italic language cluster and English.

So the first language in that cluster after French won't exactly be "easy". Easier than without French yeah but the subsequent ones will be much easier and on the same level of difficulty from one to another as then you mostly just change the ending of the words and the most used connectives.


I’m a native English speaker, and French (and some very rusty Latin) was my bridge into that cluster - they’re absolutely heaving with common cognates and structures from my experience.

What I will say is that my French has become much harder to access since - I will find myself lapsing into a creole for the first few days when I switch from one to the other.


That makes sense given that a huge chunk of modern English is derived from Norman French.


I can't imagine what was happening to my brain when I, an English speaker, started learning Korean. It was tough.


Yeah, no kidding that’s rough.

“est-ce qu'il y a”…. Three syllables, representing six words.

We taught both kids French by dropping them unprepared into the local village school at age 6. Whatever part of their brain taught them English as infants kicked back in and did the same thing for a second time.

Amazing to watch. I wish I could do that.


Yep, https://en.wikipedia.org/wiki/Language_immersion. There are schools and kindergartens that specialize on this, though the degree of immersion varies.


It's good that they mentioned babies hearing in the womb. I've known many mothers that read to or play music for their babies. They say they feel them respond to some things, too, where they seem to sense their surroundings. I'm just reporting what they told me since I haven't studied the literature on this stuff.

One thing I didn't like was the paragraph on how they differentiate words with no formal training. I feel that gives a false impression. Parents usually teach their children a lot about language. They give them visual cues, speak at varying rates, change their own tone in some situations, and so on.

The babies are soaking up the world on their own using one set of mechanisms. They also often receive highly-supervised training from a trusted source. Later, they get formal training on top of that. Even when not training, much of the content they see and hear is presented in a structured way that helps connect ideas. For instance, listening to the radio or TV with their parents would let them hear a lot of structured speech.

Babies are highly trained. They might also do the statistical learning. They're a mix of the two.


> They also often receive highly-supervised training from a trusted source. Later, they get formal training on top of that.

Languages as they are spoken have many quirks and patterns that are different than how most parents believe the language works, and those quirks are readily adopted by infants even though few parents if any would consciously teach them. A great phonetic example is "choo choo chrain": we often pronounce words very differently from how we think we're pronouncing them. There are also plenty of grammatical facets that are similar—cases where there's a stark difference the real rule and the rule as formally taught or where a rule is picked up that the parent doesn't even know exists (such as the order of adjectives).

As for the formal training, schools don't generally teach any natural dialect of spoken English (at least in the US), they teach a formal written dialect of English that differs in many respects from the language that we pick up as infants. This dialect is the source of many of the misunderstandings mentioned above, and yet children will learn natural language as spoken regardless of the attempts of well-meaning teachers to correct them.

To me this all suggests that supervised learning for the natural spoken language is unnecessary. Children assimilate their native language from the way it is spoken around them. The main purpose of the formal instruction is to teach children to speak a special dialect in formal settings that will mark them as educated precisely because it is distinct from the one that they'd have picked up naturally.


I'd say it's more a matter of setting standards so that drift is reduced and ESL learners don't go bonkers with a dialect that doesn't generalize well, than as a mark of education. For example the differences are already bad enough between American and British English.


ESL here, difference between American and British is nonexistent to me. Sure, some of the word choices differ slightly, like torch vs. flashlight, or rucksack vs. backpack, and even then most of the time, both languages have these words in dictionary, it's just the default that's different. And ESL people are usually in the exact situation you describe, because they're being taught British English during lessons, while learning American English from everywhere else - TV, videogames, Internet. We manage just fine.

Still, given that everyone gets taught grammar for their native language at some point, too, I agree this is in part to reduce language drift. Makes sense - a modern nation with millions or tens of millions of citizens need institutional means to maintain social coherence. Formal education here (at primary/secondary school level) is less about marking someone as educated, and more about ensuring that people from opposite ends of the country can communicate just fine, because without it, their dialects would drift apart within few generations.


> Parents usually teach their children a lot about language. They give them visual cues, speak at varying rates, change their own tone in some situations, and so on.

They do, but this training is not necessary for a child to acquire language. Idioglossia (private languages between two people) is an extreme example, but also the process of a pidgin becoming a creole, where the younger generation of a pidgin speaking population fills in the missing details to create a more complete language, even without ever being exposed to such a language.


That’s interesting. Thanks for the examples. Are there any places where people post such real-world observations of unsupervised, language learning?


> One thing I didn't like was the paragraph on how they differentiate words with no formal training.

You don't like it when you're wrong, that's understandable. Of course your reaction, to just deny that you're wrong and learn nothing isn't actually going to help you be less wrong in the future.


In Indian epic of Mahabharata, there is story of Abhimanyu who learns how to defeat a military strategy in the womb, while his father was telling it to his mother. Which he later uses in epic war.


> In Indian epic of Mahabharata, there is story of Abhimanyu who learns how to defeat a military strategy in the womb, while his father was telling it to his mother. Which he later uses in epic war.

I don't see what fiction has to do with this.


Play this, by your stomach, let my words massage it and rub it - Nas


I can't emphasize how happy it makes me to see a Nas lyric on HN.

Here's the track if anyone's interested: https://genius.com/Nas-queens-get-the-money-lyrics


The article starts getting good in the last 2 paragraphs, explaining the actual science of observed transitional probabilities, then it suddenly stops.

Its almost as if the writer ran out of coffee, or his scientific mind went on strike. What was that ?


So this bit is just plain wrong

> For this the infants were presented with words and non-words and found to have longer listening time for the nonwords. This indicated they had already become familiar with the words by listening to the continuous sequence of syllables within which they had been embedded. The only way that could have happened was by monitoring the TPs between syllables—the infants were capable of statistical learning.

First, "longer listening time" doesn't mean that they identified syllables. Second, it's a group effect, not individual, and the differences are small. It's NHST, and thus their p < 0.04 and p < 0.03 means very little. Third, two minutes of speech shouldn't be able to affect the underlying speech recognition. It's too fast. If someone talks some unknown language to you for two minutes, you still don't have a clue. Fourth, the task was extremely artificial.

This is a common pattern in cognitive psychology. Small effects, artificial tasks, ignoring other explanations, and then making a grand claim. I'm not denying that the babies, as a group, may have reacted differently to strings of phonenemes not in the training data, but I do deny that this is sufficient evidence for anything.


Yeh, but it came as a surprise so it must be true.

Besides, I get this strong vibe that there is a substantial faction in linguistics that is hell-bent on proving that infants learn language statistically, which I think has everything to do with "proving Chomsky wrong" (note the bit in the article about dismissing arguments for universal grammar in chapter 3 of the advertised book). I reckon that has to do with Chomsky's style of arguing that can rub people the very wrong way and seems to have. Depending on whom you ask, either Chomsky has been comprehensively debunked and nobody in linguistics listens to him any more (that's the statistical learning people) or, well, he hasn't and his theories are still valid (his friends). In any case, all this partisanship can only hurt anyone who really wants to get to the bottom of things and understand how language works.

And the recent grandiose claims from AI researchers are certainly not helping, either.


The Chomskyan idea that has been "debunked" or at least has little credibility as a theory of language, is his syntax models. Syntactic Structures was a formidable influence on linguistics, but it (and the models that follow from it) do not connect well to the way "the brain" processes language. One of the underlying ideas behind Universal Grammar, namely that parts of language acquisition and processing have a biological base, is not debunked.

That both Chomsky and AI proponents can rub people the wrong way, doesn't help the debate, indeed.


>> One of the underlying ideas behind Universal Grammar, namely that parts of language acquisition and processing have a biological base, is not debunked.

That's my understanding also but in discussions I've followed that was the main sticking point rather than the specific form of this "language endowment" as I've heard Chomsky call it. E.g. see Alex Clark's "Linguistic Nativism and the Poverty of the Stimulus":

https://onlinelibrary.wiley.com/doi/book/10.1002/97814443905...

This was recommended to me by my MSc thesis advisor who I believe felt I was a little too eager to accept Chomsky's arguments for linguistic nativism. I should read it again, it went over my head at the time.


It's a book ad.


It seems like this an excerpt from a book? But it also stops after saying “article continues below” (at least on mobile)


Tip for parents - teach your babies Sign Language. They can communicate with you much earlier than they can learn to talk clearly. So much easier when a baby can tell you they're all done or want more than to have to figure that out without communication.


As a parent of three kids, total waste of time. For one, ASL isn't the only way to communicate with signs, and it's not the most efficient way to communicate with signs with kids -- they'll develop their own ways of signaling what they want. It's more of a parlor trick than anything else -- "Look what my kid can do!". I've got 3 kids and they were all very good at communicating what they want. We taught a few signs to our oldest kid, but it wasn't any easier to figure out what he wanted, and by the time he was using them effectively, he was basically talking already, we didn't bother with the other two.

(Going to clarify a bit: If the purpose is to communicate better with your kids before they can talk, it's a waste of time. Learning ASL is a valuable thing to do for many other reasons.)


This goes against fairly general advice that I’ve received from early intervention specialists, including for my own child (not diagnosed as neurodivergent, just speech delayed).

In our case the kid picked up signs we were teaching him (not ASL, just intentional signs) weeks before corresponding words, and there are still words he signs and understands but can’t pronounce. The point was to introduce expressive language for functional concepts (hunger, wanting help, wanting to be picked up, etc.), and beyond its role as a precursor to spoken words, it absolutely made our lives easier.

It’s a well known phenomenon that the ability to pronounce words lags behind the ability to express “language”. Your kids clearly didn’t need it, but it’s still a useful tool.


Consider for a moment that maybe your experience isn't universal. I took both of mine to baby sign classes and both were able to communicate things such as hungry, tired, milk, nappy etc long before they were able to talk. Along with other parents at the same classes it was absolutely not a waste of time in terms of being able to communicate better with the children before they could talk.


It doesn't need to be a formal sign language, but intentionally developing systematic hand based communication with infants and other caretakers is pretty useful.

Using ASL signs as the base means you may have a better chance of others understanding, which is great (unless you're using signs to communicate base running strategy).

Lots of kids can communicate a little earlier with hands than speech, and there's lots of situations where hands are visible but speech is inaudible. More tools for communication is usually better.

My kiddo is 13 now, and speech is usually more convenient, but we still use a couple systematic signs (shaken t for toilet, all done), and the experience of looking for signs is helpful in a crowded place even when using ad-hoc signs.

That said. Like all parenting advice, try things out, if it sticks, keep using it. If not, no worries, nothing fits all.


We’ve had great experiences with teaching the babies we’ve had sign language.


You just need 3 signs, More, All done and something for feed me.


Babies absolutely will still communicate by utterance, gaze, reach, and body language what they want. My son has a particular cry that means either that he wants food, wants to comfort nurse, or wants water. When we hear that my wife and I can just present those three things one by one and he’ll select what he wants. Don’t feel like you have to follow this advice.


> Babies absolutely will still communicate by utterance, gaze, reach, and body language what they want.

Oh, I totally get that. But SL makes it that much easier.

> Don’t feel like you have to follow this advice.

I wasn't commanding anyone to do something. Good Grief.


Certainly no one should feel obligated to follow anyone's advice! There's too much contradictory parenting advice going around to be able to follow it all.

That said, there's a marked difference between the kinds of assymetrical communication that you're describing and baby sign language. We've done sign with each of our three children and it's remarkable to watch them pick it up—they clearly understand that these hand gestures are words that carry specific meaning and get a lot of delight out of learning and applying them correctly and being mimicked and understood. The language center of the brain clearly develops much faster than the parts responsible for controlling the tongue, and sign gives them a great outlet for that language.

Also, you don't have to do anything complicated to give them just a little bit of sign, so unlike a lot of parenting advice it's not difficult to try out. Just say a word while making a sign (ASL if you want but we'll sometimes just make up one on the spot) and then give them the thing or do the thing. "More", "water", "milk", "diaper", "good night", "up"—just pick one and try it! Once you start and see their delight it's hard to stop.


> Certainly no one should feel obligated to follow anyone's advice! There's too much contradictory parenting advice going around to be able to follow it all.

As one of many, many parents, my observation has been that there is so much contradictory parenting advice because there is so much variation in the kids, in terms of behaviors, personalities, the order in which they bootstrap their motor vs verbal skills, body awareness, etc., etc. As a result, consuming parenting advice is like going down a checklist of things to try, and the kid won’t respond to most of it.


Of course you don’t have to, but I don’t think this was intended as an instruction, rather as helpful advice. I would second it, my son still uses a few signs (especially “sorry”) even now he’s two.


Does anyone here have direct experience raising multilingual kids? Specifically the situation where one partner speaks English natively and the other speaks two languages natively? For sake of example, let's say they're German and Spanish. (And you want the child to learn all three natively.)

My plan is to divide the languages by person and place:

- always talk to Parent 1 in English, no matter the location

- talk to Parent 2 in Spanish at home and German when outside the home, adhering strictly to this location-based method. The extended family mostly speaks Spanish, which makes the "home" association stronger.

This seems easier to me than dividing the languages by time (only speak Spanish on M/W/F, German on Tu/Th/Sat) or other divisions, but I'm open to any suggestions.


I'm Dutch, my girlfriend is Croatian, we speak English to each other, we live in Costa Rica where my daughter goes to school as well.

I speak Dutch to my daughter. My girlfriend speaks Croatian to her. My daughter goes to school in Spanish. Setup with grandparents is the same as with me and my girlfriend.

We're mostly consistent (though occasionally my daughter asks me to read something to her in Spanish or Croatian, and similar for my girlfriend). It generally just works, my daughter speaks Dutch, Croatian and Spanish well - she plays in Spanish, and sometimes answers in Spanish as well even if we speak our own language to her, but mostly she switches language based on context. Surprisingly (to us) she has absorbed a lot of English as well and generally understands what we're talking about, though we've made no particular effort to teach her that (she gets basic English in school, but it's not much, she's 3). Occasionally she uses a word from one language while speaking another, which can be funny because she does adjust the conjugation, so she'll apply Croatian cases to Dutch words and things like that.


> ... Occasionally she uses a word from one language while speaking another, which can be funny because she does adjust the conjugation

Yup all parents of multi-lingual kids experience that. It's funny and yet, to me, it's precisely what statistical learning TFA mentions is about.

Our kid used to do the same and still does (but very rarely now that she's 9 y/o): they borrow a word from another language and adjust it in a way that, statistically, makes sense.

At times they can, by "chance" (probabilities really I guess), create a word that is actually the correct word even if they didn't know it.

She could take "labyrinthe" (in french) and then transform it to "labyrinth" (in english) and pronounce it properly. Now of course people would use "maze" instead but, still, "labyrinth" is "accidentally" proper.

BTW there are studies about just that: how language is acquired by analyzing the made up words that multi-lingual kids do create.

Kid is totally fluent english/french and can definitely play with spanish kids, speaking spanish.

P.S: we spent quite some time in Spain (in Zagreb too FWIW but we don't speak croation): wife speaks spanish fluently, I understand it and so does the kid... We contemplated moving to Costa Rica. We may still do it... How's the pura vida there?


It has pros and cons, like anywhere :) If there's specific things you want to know, feel free to email me at the address in my profile.


My wife and her parents live with us and speak primarily Spanish. My wife and I speak primarily English to our son. He is bilingual with some funny quirks:

* He speaks based on his listener. So Spanish to abuelita, English to papa, and dealer's choice to my wife.

* In the early days, he would mix-translate, e.g. saying "another more" since presumably that was how he translated "una mas" in his head.

* He picked up my non-regional accent that I thought I never used anymore.

* His understanding varies based on his vocabulary in a given topic. He understands science in English, but colors generally tend toward Spanish.

* I consciously spoke a lot of Spanish to him when he was young. He didn't fall for it and only speaks English back to me.

So, I wouldn't necessarily overthink it. The goal is to get her exposure to hearing the language and building the vocabulary, and secondly, to compell her to use it to express what she needs.


So, this describes both my and my wife’s situations as kids, and now my kid’s situation.

Let me tell you, three languages is a stretch. In both my wife’s and my cases, the languages which got used stuck - and I don’t mean our parents didn’t speak them all to us - just that in our respective cases the utility value of the third language, which was in both cases an ancestral tongue not related to where we lived or spoken by both parents, was not adequate to make it stick. We each acquired the family language and the local language.

Our kid, we speak English with at home, Portuguese in the public sphere. She’s acquiring both and not muddling them to any great degree.

There’s a notion we share that other languages we each speak fluently would be good for her to learn, but based on our own experiences we intend to take a different approach. Once the two languages which will actually present utility to her day to day are in place pretty firmly, it’s time to ship her off to her grandparents for the summer, where she will be immersed in a tongue that will in that circumstance provide her utility. It was how I ultimately acquired my third language - two months with cousins and their family, aged 7.

So in short, your mileage may vary, but each of us found learning a third language that only one parent and nobody else spoke a chore, and it did not stick via primary language acquisition mechanisms, but had to come later.


You don't want to try things with your kids.

One language per parent works. The other approaches? It's already quite challenging not to mix language in simple bilingual families.

Your second rule sounds super weird. Are you going to change language as you walk through your door? Additionally, at some point the kid will choose a language. To me does not sound solid.

Maybe you can make a teddy bear use a third language. Not that weird (in the first 5 years, I guess, lol). And that would introduce the kid to it.

EDIT: also if you talk only at home you'll get limited to talk about sofas, order your room and stuff like that.

And if you get more than one kid, all this house of cards will fall. The kids will speak between them whatever they want and you can do very little to change that.


The thing is that Parent 2 already uses both languages on a daily basis, for work and for life. So in a worst case scenario the two just get mixed for the kid.

Avoiding one or the other isn’t really doable.


I think it's enough to use one to address to the kid. If the other is present in the environment the kid will learn it to some extent. But I think there is value in simplifying the first steps with language. It's a hell of a struggle. And scary if it does not go well/quick.

I am not an expert, just what I've observed.


You won't have as much control here as you think. Try your best to be consistent as early as possible and hope for the best.

We are not as complex but we're trying this:

Mom (Russian/English): 100% Russian direct to baby, English to others.

Dad (English only): 100% English all the time.

Environment: English


My SO and I were raised respectively in the UK and France. We both speak fluent French and English.

Our plan has been: we each speak our mother-tongue (French for me, English for her), and see what happens.

My first child understands english perfectly, but replies in French, and my second is too young yet to say any word, but she understands both languages.

Now we've moved to France and found ourselves speaking more and more in French to each other, and -unfortunately- our household is now speaking mostly French rather than keeping English in the mix. But this makes me think maybe we should all speak English at home now that French is acquired by my eldest... not sure!


I think a lot also depends on whether the kids are in school or daycare already. Those will take care of advancing one of the languages.

I'd say in your situation it doesn't really matter as long as the skills in English are maintained - should be easy enough nowadays.


I remember reading that for young children if you always speak the non-local language at home, they'll pick up the second language very quickly at daycare/school.


I can speak to my experience as the child raised multilingually, though not quite the same situation you are proposing. I was raised in the US to a an American dad and a French mom, and they both speak both languages. My parents decided they would only speak to me and my younger brother in French, assuming (luckily correctly!) that I would learn English by simply being in an English-speaking world. My experience, and I think my parents and their multi-lingual friends would agree, is consistency. My parents even went so far as to pretend not to understand us if we spoke to them in English! Not even to ask for a word translation! If you speak English, you always speak English. If your spouse speaks Spanish, they always speak Spanish. My feeling as a recipient of this education is that its probably better if you both speak the same language to your children, but I know that splitting them up can also work. Again, the key is consistency, and the simpler you can make the rules about who speaks what language the better.


That actually sounds like a fun game for multilingual people.

Partner A picks language A, Partner B picks language B.

Now, have a conversation with each other where you only speak in your chosen language. The listener must listen and translate in their head and then respond in their chosen language.

First person to slip up and change languages loses.


I think this should work well though I might try and flip the Home language every few years since you want them to be tri-lingual. I don't have any kids yet, but my plan is to speak the less common language where we live at home and then the other language out in the world. That seems to be the approach many friends have taken as well to good effect.

Assuming these languages are actually English, Spanish and German, I would find a local club in the minority language and make some local friends where you can speak that language and your kid can come along for the ride.


I can chime in.

My wife is from Germany, and I'm from Portugal, we both speak English with each other which is easier (DE is very hard for me to learn), but she talks DE with the little one and I talk PT with him.

He can speak perfectly DE, PT, and English depending on which language he listens to, which is quite impressive for me as a parent.

He can even maintain a proper conversation and do translations between languages.

Sometimes he adds a word from another language in a phrase but he seems to know what that word is in the correct language.


I learnt German at the age of 25.

I distinctly remember the first time I was exposed to it (before learning it), it sounded like water gently flowing down a creek. Then I learnt the basics, and my brain started to catch on to patterns.

However, I had to go through the written form to learn properly. I found it hard to parse and remember words when I was only hearing them. Unlike young children, obviously.


I think this is a common misconception. Children also find it difficult, but they have a decade to figure it out while adults usually only give themselves a few years. Even at the end of that decade, many children still have problems reading and writing well into adulthood.

As adults, we can learn so much faster than kids precisely because we already know how to read and write conceptually. If you can read effortlessly in your native language, you will not be satisfied with your progress in another language until you can read at the same level. What often holds adults back is embarrassment and how that affects our ego. If we could let go and just engage without overthinking it, adults absolutely could pick up languages much faster than kids.


Children also have no choice than to be immersed around the clock. And they have very patient teachers who are elated about every new word they master and who will stubbornly correct them if they get something wrong.

Most adult learners can't do full immersion or underestimate its importance and don't commit even if they get the opportunity.


My impression is mostly the same, but to add to the point about embarassment, native speakers will often not correct adults making mistakes in their language (and avoid showing that a mistake was made) while the same person would correct a mistake for a child.

Children also get exposed to much simpler language and get to learn a language more bottom up.

Also somehow my impression is that pronunciation/accent really is harder to adopt as an adult than as a child.


If we truly listen to the pronunciation of small children they often substitute easier sounds for difficult ones for many years. We just expect that from children so it seems normal.


Yes, this is also true. I have some strategies when I speak with others in trying to repeat their answer back to them, but with the correct grammar/vocabulary and then answer their question and I'm pretty sure I picked this up from volunteers at a language meetup who did it to me. It not only provides feedback to the learner, it also helps to confirm that you're both on the same page with the information that was spoken.

Generally, this works well for kids and adults without being demeaning but it's also important that everyone understands their role in the situation beforehand if possible.

Regarding accent, I think it really is a matter of practice and will. I agree that it's harder as an adult, but we shouldn't allow people to use that as an excuse if they want their accent to improve. By no means is that a requirement, but it is something you should ask your adult friends if they are learning your native language. Though, I would argue this is probably not even worth it unless they are at a B1-B2 level. If they come from a similar language family (germanic, latin, etc) as you, then perhaps working on their accent at the beginning will prove fruitful, but otherwise, it's probably a lot of new sounds that they need to learn how to even say "correctly" before they can start adjusting how their mouth moves. For me, I've found that mouth training a little easier when I'm not spending 10 seconds formulating a reply in my head.


Chomsky describes why the word identification method described in the article can’t work in English (though a more elaborate method is one of the few successes he knows in applying statistical methods to language): https://youtu.be/92GOS1VIGxY

The arguments against universal grammar are no good either: For example even though children may hear lots of examples it isn’t nearly enough to derive a hierarchical grammar. It also doesn’t explain why language is hierarchical (just like it doesn’t explain why we can’t speak and hear like a modem)


Honestly, the first thing I did was ctrl+F and chomsky, and as there was no reference I didn't bother reading the article. Thanks for inserting some credit where credit is due. This is coming from a background of serious interest in computational linguistics in the late 1990s... I'm aware Chomsky's since dethroned, but the contribution was massive. I am greatly disinterested in these days of popular generative ML, where LLMs ~= hall of mirrors(internet-scale abject kerfuffery) and then people guffaw in awe of the remix.


Once you go beyond Markov chains and into vector models (Transformers or even LSTMs), next —-token-- syllable prediction can capture grammar.


That some aspects of grammar can be captured by statistical analysis was never really in dispute. The OP is slightly confused about the hierarchy part. Chomsky never said that you couldn't discover by statistical analysis that languages have a hierarchical structure. He rather said that a baby, based on the data it has available, would not be able to determine statistically that certain rules of grammar are defined over this structure rather than over the linear sequence of words.


I mean, contrary to Chomsky, that certainly advanced statistical models are able to capture all grammar - at least, nit worse than ordinary humans.

It feels that Chomsky’s understanding of ML stopped at the level of Markov chains.


Don't waste your time arguing Chomsky supporters, it's a cult. They keep wringing non-falsifiable theories out of lengthy hallucinations, but it all reveals itself to be just Trek physics later on.

He's not just problematic in politics, he is in linguistics too, just the BS is harder to notice. I suppose his compiler theories were legit, and his pioneering spirit that lead to establishment of multiple fields of researches might be too, but leading theories he'd created are just as dubious as the first man made shed built on a discovered island would be.

And the problem is not just that early speculations in a novel field are often wrong, it's that his supporters don't care. They'd regurgitate those scientific theories(tm) ad infinitum and waste resources for the whole humanity. So don't bother trying to fix them and making them motivated.


First, Chomsky (and I) were talking about language acquisition in children where there aren't billions of examples, so it's completely irrelevant if some other system can do something, the question is how do humans do it.

Second, there isn't any evidence that LLMs have captured grammar rules in any meaningful sense, just as they can't do addition or any other recursive computation.


Is there any work demonstrating this? For example, how do statistical models capture adjunct/argument asymmetries in extraction?


Attention is one of the core parts of transformer architecture, so I would be surprised if they have any trouble understanding this asymmetry.

Could you provide a testable hypothesis? I would be happy to test it on GPT4.


Sure, here are a couple of examples of ECP violations removing ambiguities.

1a. How often did you tell John that he should take out the trash?

b. How often did you tell John why he should take out the trash?

(1a) can either be a question about frequency of telling or frequency of trash disposal, whereas (1b) can only be a question about frequency of telling. I asked GPT-4 to explain how each sentence was ambiguous and it seemed to entirely miss the embedded readings (the ones about frequency of trash disposal) for both sentences, while finding some other ambiguities that were spurious (such as suggesting erroneously that (1b) could be a question about how many different reasons you gave John in a single instance).

Similarly, (2a) has both a de re and a de dicto reading, whereas (2b) has only a de re reading:

2a. How many books did Bill say that Mary should read?

b. How many books did Bill explain why Mary should read?

That is, (2a) can be asked either in a scenario where Bill has said "read 10 books!" or in a scenario where Bill has said "read Book A, Book B and Book C!" without necessarily counting the books himself. (2b), on the other hand, only has the second kind of interpretation. I've had mixed results with GPT-4 in this case (depending on exact choices of vocabulary, etc.), but it certainly makes some mistakes. For example, it says that (2b) can mean "John explained the reason for a certain number of books that Mary bought".

As the sibling comment points out, it would not show very much if GTP-4 did correctly determine these ambiguities as it has had access to much more data than a child. You would also need to show that the same statistical techniques would work when applied to a realistic dataset.


Thank you for providing these examples.

I asked GPT4o, and it has no trouble with understanding 1a: https://chatgpt.com/share/daea469f-d823-45e2-9d6b-f6bea82a26...

As a side note, my instinctive reading is on the telling frequency. Sure, one can make a garden path sentence, but (for my own ESL eyes and ears) it would be more straightforward to say, "Did you tell John how often he should take out the trash?"

2b does not feel right on its own (and I am not an AI). I can understand it, but it feels like reverse engineering rather than reading a normal sentence.


The whole issue is the difference between (1a) and (1b), not whether the AI can understand one of the sentences under some of its available interpretations. Indeed, with GPT-4o, I get the same result as you for (1a), but also a description of a spurious parallel ambiguity in (1b). Part of the trouble here is the inconsistency of results depending on the exact phrasing of the question and random variation in GPT-4's responses. I wouldn't be surprised if it sometimes gives the right answers, but I don't think it does so reliably.

(2b) is what's known in classical terms as a 'subjacency violation', so yes, it sounds imperfect. Nonetheless, native speakers agree on which interpretations it can and cannot have. GPT-4 does not have the same capacity, as far as I've been able to determine. You sometimes have to be a little creative with scenario construction for sentences like (2b) to click.

"Ok, So Bill explained why Mary should read War and Peace, then he explained why she should read The 39 Steps, and then he explained why she should read some other titles that I can't remember. I wonder just how many books Bill explained why Mary should read."

But try constructing a scenario for the other interpretation and you'll find that it's still just as bad.


Well, prompting is not a nice addition to LLM; it is a necessary thing to do.

> Nonetheless, native speakers agree on which interpretations it can and cannot have. GPT-4 does not have the same capacity, as far as I've been able to determine.

This one is an expectation, not even a factual statement. A factual statement would be "95% of English native speakers with a college degree" or so. Among less educated, the numbers could be depressingly low, even for much more straightforward tasks.

Then, the question is how a given ML model fares against real data, not against some platonic ideal.


We can see similar results with much simpler examples where the judgments are incontrovertible. Here's a classic pair from Chomsky:

a. John expects to present himself.

b. Who does John expect to present himself?

In (a) 'himself' refers to 'John' whereas in (b) it refers to whichever person is the answer to the question. How does GPT-4o fare?

> Who does "himself" refer to in the sentence "John expects to present himself"?

> In the sentence "John expects to present himself," "himself" refers to John. It means that John is the one who expects to present himself.

> Who does "himself" refer to in the sentence "Who does John expect to present himself?"

> "Himself" refers to John. In this sentence, John is the one who is expected to present himself.

The statistical model is confused by the superficial similarity between (a) and (b), just as Chomsky predicted decades ago.


Well, again, I'm not sure what your prompts are, but here we go:

https://chatgpt.com/c/7ead1985-83be-4dce-9fd7-29aafb248f01

> The statistical model is confused by the superficial similarity between (a) and (b), just as Chomsky predicted decades ago.

Well, WE are statistical models as well. So, any too-broad claims on the inability to understand natural language by ANY statistical models are falsified the moment they are spoken (or written); unless you go into Penrose-style Neoplatonism.

---

Vide your doubt if I find a (counter)convincing example. Sure, any benchmark of human vs. model performance is an empirical verification. And yes, some artificial models may struggle with some tasks. For example, for the Winograd schema, there is a leap between GPT3 and GPT4: https://d-kz.medium.com/evaluating-gpt-3-and-gpt-4-on-the-wi....

What it's (to me) is ex-cathedra defining what is the English language. The actual natural language people use is full of utterances that are not correct (yet are easily understandable) and "correct forms" that are rarely used and, if so, misunderstood by anyone save for those who put conscious effort (linguists, lawyers, etc).

For any discussions, it is essential to know if we are working with a real (and highly probabilistic) natural language or what of its concrete models (i.e., an abstraction).


My prompts are in my comment. Your ChatGPT link sends me to a login screen.

Humans (or English speakers at least) aren’t confused at all by the pair of sentences in my last comment. If you’re just going to try to deny the plain facts about how these sentences are (and aren’t) interpreted by English speakers then that’s really just a kind of grammatical flat-Earthism. The judgments at issue aren’t remotely subtle.

> Well, WE are statistical models as well.

This is begging the question. Chomsky would deny this.


So, I think that we differ at a fundamental level.

While you prefer to work in a Neoplatonic world of ideas, I prefer empirical facts and convictions that all models are approximations.

English grammar is not fixed per se; it evolves with region (have you ever been to Singapore?) and time. Your judgment (or Chomsky's, or anyone), however founded, is not a fact. It is an opinion up for experimental scrutiny.

I don't say that your examples are incorrect. Still, measure the percentage of correct (or consistent) answers for humans against particular models. Otherwise, it might be maths, might be philosophy, might be arts, but it is not (empirical) science.


As far as I can see, your issue with Chomsky has nothing to do with the performance of modern LLMs. You just reject all the data that generative linguists take to be crucially informative as to the grammatical structure of natural languages. You would hold the same view if LLMs had never been invented. So it is really a common case of AI and cognitive science talking entirely at cross purposes.

> English grammar is not fixed per se; it evolves with region (have you ever been to Singapore?) and time.

Sure, but this is not the case for the examples I gave. There aren’t dialects of English where (b) has the interpretation that GPT-4o thinks it can have. It’s no use trying to muddy the empirical waters in the case of completely clear judgments about what English sentences can and can’t mean.


There is no example or standard that would satisfy you. Any failing example can be added to the training set in the next version and even if it couldn't it wouldn't matter because you could find a person somewhere that would also fail it.


Anyone have a link to the work he cites?


I haven't watched the video, but probably a reference to this and subsequent work along the same lines:

https://aclanthology.org/W04-1307.pdf

https://www.ling.upenn.edu/~ycharles/papers/quick.pdf

https://sites.socsci.uci.edu/~lpearl/courses/readings/Yang20...


Meaning what, at a few kilobaud?


> When we write, we leave spaces between words. Readingwordswithoutsuchspacesisdifficult.

immediately losing credibility because quite a number of languages, such as Chinese and Japanese (and I think Korean too), are written without space between words or characters. In fact, until quite recently (last 100 years or so), written Chinese had no punctuations.


Don't even have to look that far. Scriptio Continua, writing without spaces, was all the rage in Greek and Latin, and the idea of using all those fancy separators like spaces and full stops only caught on in Europe from 800-1400. [0]

[0] Wiki Article: https://en.wikipedia.org/wiki/Scriptio_continua


The characters aren't connected so the brush is lifted and a space left behind. Of course this also goes for the space between signs of a compound word, but this also holds for "compound word", which is itself a compound word. Also they had \n = EoL alright.

Negative spaces are so important in *-graphy


Agree with your main sentiment. For accuracy:

> quite a number of languages, such as Chinese and Japanese (and I think Korean too), are written without space between words or characters.

Correct with Japanese and Chinese. Modern Korean (few or no hanzi/chinese characters) does use spaces.

> In fact, until quite recently (last 100 years or so), written Chinese had no punctuations.

Eh… ok, sure, I guess.

Punctuation was formalized in the 20th century, yes.

There are also lexical markers than can be (and often were) used to mark the beginning or end of certain ideas. While not punctuation, these characters serve a similar purpose in delineating linguistic chunks.

The following page gives some simple examples:

https://www.theworldofchinese.com/2021/09/how-china-adopted-...


I believe it doesn't start with an explicit attempt to identify words, but from the attempt to compress the sound into fewest meaningful dimensions, and the words naturally result from it. It's a low level process that creates these words, as well as the apparent boundaries.


I've always been curious whether learning a second language from childhood has a positive or negative impact on a child's development.


Related, but I'm currently reading "Thirty Million Words: Building a Child's Brain", it's fascinating!


Makes sense, perhaps this is also why nursery rhymes chop up words into syllables following the melody.


> Just think how you pronounce the syllable ham when referring to a piece of meat and when talking about a fury animal—a long ham and a short ham-ster.

Hang on. Y'all pronounce these differently? I've lived in four U.S. regions and have a pretty generic middle-American accent and I'm having trouble even thinking what the distinction might be.


Canadian (Toronto) English speaker, although I’ve lived in the US (NYC and Seattle) for a while and it probably rubbed off on me.

To me they sound pretty much same if you’re very consciously saying the words as units, in isolation, “trying to pronounce them”.

If I say them a little more casually, though, the sounds that come out of my mouth are a little bit closer to “haam” and “hmster” - the vowel sound definitely gets emphasized in the former case and clipped almost entirely in the latter case.

It’s really easy to trick yourself into thinking you’re saying a word “properly” unless you’re very very conscious of the sounds you’re producing. As a Canadian, I know this - the stereotype Americans have of the Canadian accent is exaggerated, and it’s much stronger in rural areas, to the extent that I didn’t really think I sounded any different to the Americans in TV or movies. But Americans could still pretty quickly tell I was Canadian when I moved down there. The word “about” is one of the canonical tells, and although I don’t say “aboot” or anything, I learned that I do really say it in an accented way. It wouldn’t be apparent if you just asked me to pronounce the word, but in a sentence my mouth would gloss over the vowels, saying something more like “abut”.


I tested them out loud in the context of sentences: "Can I get some ham on that sandwich?" and "Did you get your kid a hamster?" and the "ham" part still sounds exactly the same to me.

It could definitely be an accent or something that I just happen not to have heard much. It just surprised me that I couldn't think of what difference the author was thinking there might be.


I pronounce the vowels in Mary, marry, and merry differently— like mail, marathon, and merit– but apparently that's not true in many places in the US. My regional accent is pretty mild, but I did grow up in a famously heavily accented area.


Boston by chance?


Yes MA, but closer to RI, which is fairly different in some ways. The closest famous example is Emeril Lagasse, who most people assume is from New York. One big factor is the short O pronunciation being a lot rounder. Coffee is cawfee, like it is closer to NYC.


We're talking about allophonic differences here, which are notoriously hard for native speakers to hear.

How well do you hear the difference between the /p/ sounds in words like pill, pat, punk vs. spill, spat and spunk?


Not very well at all. I barely even think of P as having its own sound at all. It has an air puffing sound, but the vowel is the only difference I actually hear with those words.


Ok, that almost certainly means you can't hear the difference between the aspirated and unaspirated allophones. That's absolutely typical of native speakers: they can't hear the difference between allophones.

Now try standing in front of a mirror, and hold a Kleenex (tissue...) in front of your mouth while pronouncing those words. You'll probably see a tiny flutter when pronouncing 'spill', 'spy' etc., but a large movement when pronouncing 'pill', 'pie' etc. The tissue is showing a difference that your ears have a hard time with. Not because the difference is hard to hear--a Thai speaker would have no problem--but because the sounds are allophonic in English (but phonemic in Thai).


It's hard to really evaluate in isolation, but if I put them in sentences I slur them about the same.


Hm, is it saying "hamster" is /æ/ whereas "ham" is /æː/? I can't actually find a source that uses IPA yet bothers with vowel length right now ...

Even in dialects where it is not traditionally transcribed (such as American - partly because long vowels are most common in place of a disappearing /r/), usually there is in fact an audible difference when you aren't thinking about it. This is part of why computer speech always used to sound so terrible. The most blatant example is when comparing vowels before /t/ vs before /d/, which means that "matter" (short) and "madder" (long) are distinguishable even in accents where /t/ is pronounced like /d/ in this context.

Note that this is orthogonal to stress (since both syllables are stressed in this example), and also orthogonal to the badly-named "long vowel, short vowel" taught in school (which is actually for completely different vowel sounds and which omits several other vowel sounds).


Now you're making me wonder if I just don't make enough distinctions in my speech or something, at least when it comes to the short A sound.

If I speak these sentences out loud, the words sound exactly the same to me:

"That doesn't matter at all." "He was madder than hell."

Incidentally I'd say both A's here also sound just like my A's in ham and hamster! It's a very pinched sound, almost nasally. My choir director has had to train it out of us by exaggerating it and making it sound even more ridiculous because it has no place in singing.


Sign of the quintessential American. For me, a Jamaican, "madder" and "matter" are very distinct. And I don't think I'll ever get how a "t" becomes a "d".


[t] is just the voiceless version of [d] and vice versa. The more lazily you pronounce the [t] between two vowels, the closer to [d] it becomes because it's less work to just keep the vocal chords vibrating.


I mean, everything is dialect-specific, and there is a lot of variation between dialects around /æ/ in particular (trap-bath, bad-lad, Mary-marry-merry, and the whole mess of /æ/-raising/tensing - is ban-back an example that's always split?).

But I'd still say it'd be interesting to record your voice and check the actual timings even if you can't consciously hear the length. Assuming you can actually practice running speech while thinking about it, of course.


It’s not about vowel length. It’s about the time spent saying the actual word


For dialects that have this (not all English dialects do), the difference in time for the two words is precisely the difference in time for the two vowels.


Perhaps. And perhaps not. Many would elongate or voice ("voice" in the linguistic sense meaning activate the vocal cords) the m in ham - not just the lengthen the a - in a way that they wouldn't when saying hamster, in which they might pronounce the "m" as a glottal stop (the reason why hamster is often misspelled as "hampster") or as a voiceless bilabial consonant.

Regardless, it's a red herring, because vowel length can also refer to "long" vs "short" vowels as in Bake vs. Back. Thats a different, and in my view more common, meaning of vowel length


Elongating the voice is exactly what I mean by an allophonic difference.

As for 'hamster', no one (that I know of) pronounces the /m/ as a glottal stop, although people sometimes epenthesize a voiceless consonant [p] (by devoicing the end of the /m/). Where some English dialects get a glottal stop is for an intervocalic or word-final /t/ (in addition, of course, to the glottal stop in the middle of 'oh-oh' and 'uh-uh'). I've also heard American English speakers glottalize word-final /k/ and sometimes /t/, but a glottalized stop is not the same as a glottal stop.


> The most blatant example is when comparing vowels before /t/ vs before /d/, which means that "matter" (short) and "madder" (long) are distinguishable

This doesn't resonate with me - the `ma` in both those words are pronounced exactly the same and are of the same duration when speaking.

I checked a few youtube videos now before responding (some science video with 'matter' in the title and some video with 'madder' in the title) and there isn't a distinguishable difference.

Do you have a few links to videos that show a difference?


I'm an ESL, mind you but

I think the vowel gets pretty cramped with similar short vowels (hem-ham-hum) so the tendency is to make it more different. It drifted away in general in American English, but the vowel in "air" stands in the way in non-rhotic accents, so the change is much more random and word specific.


Australian English is supposed to have phonemic vowel length, such that "can" (the container) and "can" (the auxiliary verb) are pronounced differently.


Australian English indeed has phonemic vowel length. For the average speaker, the only distinction between 'but' and 'Bart' is length.


Author is in England.


I'm in England and I don't get what they're saying.

Edit - ah I see reading the context now, it's not the pronunciation but the speed. Hamster has a very fast ham sound


Speed (or more accurately, length) is a part of pronunciation. English speakers don't often think about vowel length because it has no semantic significance like in some other languages, but there's still some phonemic variation, depending on one's dialect.


> Y'all … thinning …

Kind of gave it away there.


I edited the 2nd one because it was an actual typo, but ironically the American south is not one of the regions where I've lived. I just find "y'all" clearer than any of its alternatives!


I gave up trying to read this after closing the 3rd popup


What a silly question to ask. They learn by training on the entire web, hiding one word at a time.

/s




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: