Monday, June 6, 2011

Is Human Speech Worse than Computer "Languages"

I promise. This is my last post on Kluge by Gary Marcus.  I just couldn't let ride his misconceptions about language and how it works, especially since he apparently validates his opinions by claiming familiarity with Chomsky and also uses linguistic terms like phoneme as if he knows what that means.  He doesn't.  Nor does he understand the difference between written and spoken language, which are not the same and not wholly governed by the same sites in the brain.

Language evolved as a rapidly spoken vocal communication. Writing was invented after hundreds of thousands of years of speaking.  Writing systems are only a reflection of the spoken  language.  There are sentences and entire discourses that exist only in written English--or any other written language, and there are sentences and entire discourses that exist only in spoken English. 

Sometimes, writers do attempt to reproduce spoken language in writing, as in short stories or novels. However, even the best of these just remind readers of the spoken language.  For instance, if I wrote "Jaeatchalunch?" do you read that as the spoken sentence, "Did you eat your lunch?"

In speech, there are not necessarily any spaces between words, and the palatalization of "did you" in "dija" or "ja" is normal American English. Similarly "eatcha" for "eat your" is normal, casual spoken English (although you may say "jaeacherrlunch" if you are a strong r-pronouncer).  No matter how correctly you think you speak, even in formal situations, you will palatalize to some degree, so that you'd say "Didju" and "eachyerrlunch." 

I know many of you reading this are saying, "I never would say anything that way." Well, just listen to yourself and others.  To say "Did you eat your lunch?" without turning d+y into j, and t+y into ch, would make you sound very threatening.   Marcus doesn't comment on this phenomenon, but from what he has written, I know he'd think it another black mark against language.  Actually, it's one of the ways we can communicate so rapidly.

He says that language is inferior to computer code because language is often ambiguous.  He "proves" this by using sentences devoid of context.  However, the essence of speech is that it is in context, which includes the physical and social environment of an utterance, as well as the topic of the conversation in which it is uttered.  Some linguists say that there is no context free utterance. When context is considered, few utterances are ambiguous, and if they are, the hearer can always say, "Huh?" so the speaker will rephrase.

This takes very little time. A native speaker of a language can utter 30 or more words a minute. If the words he or she used average 5 sounds each, then the speaker has uttered 150 sounds in a minute, and that's figuring on only 30 words. Try counting the number of words a co-conversationalist utters in a minute while you time them with a discreetly hidden stop watch.

In order to understand sounds that come at you as rapidly as speech sounds, all languages have phonemes and allophones.  Any other sounds that come at you as quickly as speech sounds are heard as buzzing or other indistinguishable noises.  Allophones of phonemes let hearers know where in the word a sound is being made, but they are perceived as being one sound.  For instance,  say "little." The first l is made with the tongue tip touching the ridge just behind the front teeth and the blade of the tongue down in front.  The second l is made with the back of the  tongue humped so that the back of the tongue is down, creating a throaty sound.  When a native English speakers hear the fronted l, their brains recognize a new word has started, but when they hear the back l,  the brain marks it as ending a word. 

 There is nothing inevitable about this.  Speakers of English consciously  hear both l's as one sound.  Speakers of Russian, however, hear each as a separate sound.  Japanese speakers hear l and r as one sound. That is the two l's in English are one phoneme.  In Russian, they are two phonemes.  In Japanese, l and r are one phoneme, but in English they are two.(Although note that Molly was originally a nickname for Mary, and Sally, a nickname for Sarah.)

 The variant sounds that are heard as one single one, are called allophones.  Allophones always give information to the hearer about what kind of sound is coming or has come, or where in the word the sound is made.  That is, they build up redundancy in the acoustic message.  All languages are 50% redundant, both so they can be spoken extremely rapidly and also so that if a hearer is distracted or it is noisy, he or she can still decode what was said.  Have you ever had the experience of asking, "What did you say?" And, before the answer comes, the words pop into your head and you say, "Oh, never mind. I got it." Your brain was a little slow in decoding the allophones into phonemes.

What has this to do with Marcus? Well, he says the different pronunciations in phonemes are messy and make language harder for children to learn.  Since children as young as two, and even 18 months are already beginning to sort out what is an allophone of a phoneme and what sounds are separate phonemes, this isn't much of an argument.  Moreover, it shows that Marcus hasn't a clue about how and why phonemes work. 

He complains that when we make an s in see our lips are spread. but when we make one in sue, they are rounded.  He sees this as a complication.  It isn't. It's easier to have the lips rounded before an oo because that vowel is made with rounded lips.  Similarly, the ee is made with lips spread.  By getting the articulators in position for the next sound before you actually make it, you can  speak a lot faster, and because of allophones, hearers can understand you.

To show the depths of Marcus's lack of understanding of how language works, he says, correctly as it happens, that "the way in which we produce a particular linguistic element depends on the sounds that come before and after it."(p. 96). However, he parrots this truism as an explanation of the variation in pronunciation of ough in The Tough Coughs As He Ploughs the Dough (by Dr. Seuss). I can add through and thought.  There are in English at least 6 ways to pronounce the spelling ough.  Dr. Marcus, pray tell me what sounds that come before or after this spelling that account for the 6 pronunciations?  There are none. 

The reason for the chaotic spelling--which is not spoken English--is that English spelling is a mess.  There has never been a spelling reform, so we are plagued with spellings that reflect older pronunciations, such as the f in of,  and spellings that include letters from Latin counterparts that were never pronounced in English, like debt, and, as in the ough spelling, different dialectal pronunciations combined with now dead sounds. 

The gh sound in Old and Middle English was pronounced like the German ch in nicht (English night).  It disappeared entirely in some dialects; hence, through, though, plough, and the "silent letters" in right, light, sight, and taught. In other dialects, the old gh became an f sound, as seen in laugh, draught, and cough

As the middle class rose in England, people from all parts of the country moved to London, bringing their dialects with them.  The dialects mixed so that people who had an f in thorough, dropped it because the pronunciation without the f became standard for that word, but the pronunciation with the f became standard in cough, for instance.  Why certain pronunciations become favored and others not is a sociolinguistic question too big to discuss here (see Language the Social Mirror, 4th ed. for chapters explaining this.)

You may recall from other posts that dialects differ more in vowels than in consonants; hence, the variation in vowel sounds in the ough spellings. Dialect mixing in London accounts for this variation. The messiness doesn't lie in English, however.  It lies in our stupid spelling system which is in desperate need of a thorough reform. The sound production in speech is maximally efficient for rapid spoken language.  Nothing like it exists in the biosphere of earth.  It is marvelous, in the original sense of that word.

 Marcus, like so many non-linguists, supposes that Chomsky is the expert in linguistics. In fact, however, British linguists like Michael Halliday, Sidney Greenberg, Geoffrey Leech, and John Lyons have long surpassed him, as have sociolinguists, anthropological linguists, and many, if not most, linguists who study actual speech production. (I showed Chomsky's limitations in my 1972 doctoral dissertation. I collected actual data which was not explicable by transformational grammar)  Chomsky made a big splash when he said if you want to know what the human mind is capable of, you have to figure out what language is capable of.

However, Chomsky ignored two important properties of language: it is always produced in context, and it is dependent on human physiology.  Chomsky was right that all languages are capable of an infinite number of sentences, but wrong when he said any sentence can be made infinitely long. Sentences are limited to the amount of breath a speaker can muster for one sentence, and also by the attention span of hearers.

If you  want to show what language does, you have to collect naturally produced utterances in a given context. You can't just make up sentences in your own head and then say, "See, that's how language works." We know that people can  make up sentences that nobody would ever say and use them to prove points.  That's what Chomsky did and Marcus copies him.   Marcus offers sentences that are exceedingly unlikely to be said by anyone, and then uses them to "prove" that human language is a kluge and that computer codes are superior.   For instance, he gives:

           People people left left.

Marcus claims this is "perfectly grammatical." To whom? Certainly not for me or anybody I tried it on. He also said that this "perfectly grammatical" sentence boggles most people's minds.  Well, it didn't boggle mine.  I understood it immediately, just as I understood why it isn''t grammatical.  It should be "The people that people left also left.  The rules for using the and a are very strict and have to be adhered to.  You can't just omit them.  The same is true of relative pronouns like that or who.  They can't be omitted willy-nilly.  There are definite rules for when they must be said and when they can be omitted. 

I added an also to his sentence because in writing, especially, having left left is awkward, but, in speech, inflection can substitute.  (Also is not a grammatical marker.) Oh, Marcus thinks that such inflection and even body motion show the awkwardness of language.  To him, a "pure" language would need no such auxiliary systems.  Well, intonation, the rise and fall of the voice, and even body language, are part of the communication system of humans and allow flexibility in conveying messages orally.  Considering how many tens of thousands of years people had speech but no writing, we can understand why the three systems co-evolved.

Marcus's biggest complaint about language is its potential for ambiguity.  It is easy to create inherently ambiguous phrases and sentences if you don't provide context.  Marcus uses the example of "John's picture," which can mean 'a picture of John,' 'a picture made by John,' or 'a picture owned by John.' Yes, but in context, both the linguistic context of what is being talked about, and the social context of what we know about John, there isn't likely to be such ambiguity.  Context disambiguates.

What Marcus also fails to recognize is that the potential ambiguity of language allows speakers to use their old language to express new things. The fact that all utterances are context bound also helps speakers use their old language to express new things.  If all words had just one meaning, as Marcus claims they should--and as computer languages do--then our mental lexicons would have to be huge, huger than they are.  Moreover, every time a speaker wanted to express something new, they'd be likely to have to make up a new word.  Using an old word in a new context so that it takes on a new meaning is the most efficient way to have a communication system in  which anybody could impart new ideas or information.

Marcus is correct when he says that computer code is unambiguous.  Each "sentence" means only one thing.  But you can't write poetry in such a code. Nor can you create an infinite number of utterances in it.  Marcus gives as an example of "clumsiness" of human language (p. 103) "It was the banker that praised the barber that alienated his wife that climbed up the mountain," and then challenges the reader to say who climbed up the mountain.  I agree with Marcus that this is a clumsy sentence, but I would add that it's not a sentence that occurs in actual speech.  Moreover, it is a lousy written sentence.  For all of Chomsky's nonsense about recursion going on ad infinitum in actuality, there is a limit both for speakers and hearers on how many sentences can be folded into each other. There are also limits on how many pronouns can be used and when.  For instance, this sentence has to be rephrased so that his can't refer to either the banker or the barber.

Marcus says that the proof that language is riddled with ambiguity is that computers have such a hard time translating or understanding human languages.  I think the correct conclusion to that truism is that computers are stupid and have limited ability to use context.

I had to write programs to do computer translations of English into Russian and vice versa in order to get my doctorate, so I know how frustratingly dumb computers can be. They've improved since then, but they still can't match the human brain. Or the marvel of evolution that language is.