1. I’m on Lexicon Valley talking about why linguists tend to be good at pronouncing words in other languages (and no, it’s not because we speak all of them). 

    Technically, this is a cross-post from myself, but since it’s an expanded version of a post that I wrote in my first month of blogging ever, you probably haven’t seen it before. It was shockingly popular at the time though, considering I only had a handful of followers. 

    If you’re inspired by tip #7 to take up the IPA, you may find this post on vowels and this post on sonority/manner of articulation helpful. If you’re more into learning languages for reals than faking them (not that these tips are inconsistent with real learning, actually), I’ve got quite a lot in my language learning tag

     
  2. image: Download

    Writing Skills: XKCD is on point about language again.
Here’s a study from this year on kids who use abbreviations while texting, and here’s a summary of previous studies: 

The first study, published in 2008, showed that 11 and 12-year-olds in Britain who used more textisms — whether misspelled words (“ppl,” instead of “people”), grammatically incorrect substitutions (“2” for “to” or “too”), wrong verb forms (“he do” instead of “he does”), or missing punctuation — compared to properly written words tended to have slightly better scores on standardized grammar and writing tests and had better spelling, after controlling for test scores in other subjects and other factors. A 2009 study, conducted by some of the same researchers on 88 kids between 10 and 12 years old, found similar associations between high textism use and slightly better reading ability.

Hovertext from the xkcd comic: I’d like to find a corpus of writing from children in a non-self-selected sample (e.g. handwritten letters to the president from everyone in the same teacher’s 7th grade class every year)—and score the kids today versus the kids 20 years ago on various objective measures of writing quality. I’ve heard the idea that exposure to all this amateur peer practice is hurting us, but I’d bet on the generation that conducts the bulk of their social lives via the written word over the generation that occasionally wrote book reports and letters to grandma once a year, any day.

    Writing Skills: XKCD is on point about language again.

    Here’s a study from this year on kids who use abbreviations while texting, and here’s a summary of previous studies: 

    The first study, published in 2008, showed that 11 and 12-year-olds in Britain who used more textisms — whether misspelled words (“ppl,” instead of “people”), grammatically incorrect substitutions (“2” for “to” or “too”), wrong verb forms (“he do” instead of “he does”), or missing punctuation — compared to properly written words tended to have slightly better scores on standardized grammar and writing tests and had better spelling, after controlling for test scores in other subjects and other factors. A 2009 study, conducted by some of the same researchers on 88 kids between 10 and 12 years old, found similar associations between high textism use and slightly better reading ability.

    Hovertext from the xkcd comic: I’d like to find a corpus of writing from children in a non-self-selected sample (e.g. handwritten letters to the president from everyone in the same teacher’s 7th grade class every year)—and score the kids today versus the kids 20 years ago on various objective measures of writing quality. I’ve heard the idea that exposure to all this amateur peer practice is hurting us, but I’d bet on the generation that conducts the bulk of their social lives via the written word over the generation that occasionally wrote book reports and letters to grandma once a year, any day.

     
  3. image: Download

    superlinguo:

Fun times are on the up.
I’m not a corpus linguist, but I love playing with different corpora when they’re presented in accessibly and fun ways - so I was thrilled when Claire Hardaker tweeted about the NYT Chronicle, a way to visualise the language used across the newspaper’s history. 
Like Google’s n-gram corpus, it presents a nice clear chart. It has some advantages over n-gram, for example the NYT corpus is completely up to date while Google’s gets sketchy for contemporary references; compare NYT drone to n-gram drone and you see the NYT data kicks up swiftly just where the Google data ends. 
There are obviously biases in this data too. For one, there’s a bias towards American spelling that isn’t as pronounced in the Google Books corpus. The genre represented is also fairly narrow.
I found a nice use for it the other day while listening to a This American Life podcast that talked about “the meat question”; a period in the late 19th and early 20th century when the USA was unsure it would have enough viable agriculture to feed its population and looked at alternative sources of meat (including, most famously, hippopotamus). The NYT Chronicle has a nice couple of spikes in usages of this phrase when the issue was most pressing (and therefore made it into the news), while the Google Books usage is more diffuse, as people wrote books in the aftermath, being a corpus that is less immediate than newspapers.
This may not become my default go-to tool, but it’s nice and simple and makes a great point of comparison to n-gram. Thanks Claire for sharing!

I can imagine in the long term that if one compared, say, a future Twitter corpus that managed to make graphs like this, along with the NYT Chronicle and Google Ngrams, that the Twitter one would be even spikier because it’s not subject to editing or any time-delay at all. I really hope someone eventually makes a Twitter graphing feature like this now!
It’s interesting to see the same terms, such as boomer, baby boomer, millennial, generation x, generation y, give us quite different graphs in NYT Chronicle versus Google Ngrams. Both of them show increases, but for example there are spikes for “boomer” in NYT in 1994 (relating to a sports story) and for “millennial” in 1999 (relating to New Years) that are entirely absent from Ngrams. 

    superlinguo:

    Fun times are on the up.

    I’m not a corpus linguist, but I love playing with different corpora when they’re presented in accessibly and fun ways - so I was thrilled when Claire Hardaker tweeted about the NYT Chronicle, a way to visualise the language used across the newspaper’s history. 

    Like Google’s n-gram corpus, it presents a nice clear chart. It has some advantages over n-gram, for example the NYT corpus is completely up to date while Google’s gets sketchy for contemporary references; compare NYT drone to n-gram drone and you see the NYT data kicks up swiftly just where the Google data ends. 

    There are obviously biases in this data too. For one, there’s a bias towards American spelling that isn’t as pronounced in the Google Books corpus. The genre represented is also fairly narrow.

    I found a nice use for it the other day while listening to a This American Life podcast that talked about “the meat question”; a period in the late 19th and early 20th century when the USA was unsure it would have enough viable agriculture to feed its population and looked at alternative sources of meat (including, most famously, hippopotamus). The NYT Chronicle has a nice couple of spikes in usages of this phrase when the issue was most pressing (and therefore made it into the news), while the Google Books usage is more diffuse, as people wrote books in the aftermath, being a corpus that is less immediate than newspapers.

    This may not become my default go-to tool, but it’s nice and simple and makes a great point of comparison to n-gram. Thanks Claire for sharing!

    I can imagine in the long term that if one compared, say, a future Twitter corpus that managed to make graphs like this, along with the NYT Chronicle and Google Ngrams, that the Twitter one would be even spikier because it’s not subject to editing or any time-delay at all. I really hope someone eventually makes a Twitter graphing feature like this now!

    It’s interesting to see the same terms, such as boomer, baby boomer, millennial, generation x, generation y, give us quite different graphs in NYT Chronicle versus Google Ngrams. Both of them show increases, but for example there are spikes for “boomer” in NYT in 1994 (relating to a sports story) and for “millennial” in 1999 (relating to New Years) that are entirely absent from Ngrams. 

     
  4. An interesting long post about Greek diglossia, from Pseudoerasmus. Here’s Wikipedia on diglossia, for context: 

    In linguistics, diglossia refers to a situation in which two dialects or usually closely related languages are used by a single language community. In addition to the community’s everyday or vernacular language variety (labelled “L” or “low” variety), a second, highly codified variety (labelled “H” or “high”) is used in certain situations such as literature, formal education, or other specific settings, but not used for ordinary conversation.

    The high variety may be an older stage of the same language (e.g. Latin in the early Middle Ages), or a distinct yet closely related present day dialect (e.g. Norwegian with Bokmål and Nynorsk, or Chinese with Mandarin as the official, literary standard and colloquial topolects/dialects used in everyday communication).

    And another excerpt from the post: 

    Imagine a Greek member of Parliament in 1900. He could choose from amongst three words for “fish” — not three words with slightly different meanings, but three words expressing exactly the same thing.  In ordinary conversation, he would have just said ψαρι /psari/ (Demotic), but during a Parliamentary debate he might speak about οψαριον /opsarion/ (Katharevousa). But if he were writing a report on the Ottoman harassment of Greek fishermen, he might write, perhaps to just show off, ιχθυς /ichthys/ (Attic). (Read the rest)

     
  5. feitclub:

It’s a katakana font (named “ゴウラ”) designed to look like Olde English fancy print
This must be the Japanese equivalent of that “asian” font you see on Chinese takeout boxes
(via a friend-of-a-friend on Facebook. hat-tip to artofemilyo)

The comments on the Language Log post about Gothic katakana are also interesting, including a link to The Structures of Letters and Symbols throughout Human History Are Selected to Match Those Found in Objects in Natural Scenes. 

    feitclub:

    It’s a katakana font (named “ゴウラ”) designed to look like Olde English fancy print

    This must be the Japanese equivalent of that “asian” font you see on Chinese takeout boxes

    (via a friend-of-a-friend on Facebook. hat-tip to artofemilyo)

    The comments on the Language Log post about Gothic katakana are also interesting, including a link to The Structures of Letters and Symbols throughout Human History Are Selected to Match Those Found in Objects in Natural Scenes

     
  6. image: Download

    xkcd: Wikipedia article titles with the right syllable stress pattern to be sung to the tune of the original Teenage Mutant Ninja Turtles theme song. (Here’s the song, for reference.)
All of these titles are examples of trochaic tetrameter, which is one of the most common English meters (a trochee is a foot consisting of STRONG-weak and tetrameter is four feet per line). Another example is Twinkle Twinkle Little Star, although that has a deficient last foot, but you can sing any of these titles to that tune as well if you just double the last note.
Trochaic tetrameter creates a strong feeling of sing-song “poem-ness” in English. Most Shakespearean characters, for example, speak in iambic pentameter (weak-STRONG, five feet per line), which sounds more natural, but a few speak in trochaic tetrameter for dramatic effect. For example, MacBeth and Lady MacBeth speak in iambic pentameter, which gives the effect of talking normally: 

Methought I heard a voice cry “Sleep no more!Macbeth does murder sleep,” the innocent sleep,Sleep that knits up the ravell’d sleave of care,
Out, damned spot! out, I say!—One: two: why,then, ‘tis time to do’t.—Hell is murky!—Fie, mylord, fie! a soldier, and afeard? What need wefear who knows it, when none can call our powerto account?—Yet who would have thought the oldman to have had so much blood in him?

But the witches speak in trochaic tetrameter, which makes them seem like they’re delivering an incantation: 

Double, double toil and trouble;Fire burn, and cauldron bubble.
Fair is foul, and foul is fair

Previous xkcd on poetry: metrical foot fetish, ballad meter, trochaic fixation. Language Log also has a long, interesting post on meter. 

    xkcd: Wikipedia article titles with the right syllable stress pattern to be sung to the tune of the original Teenage Mutant Ninja Turtles theme song. (Here’s the song, for reference.)

    All of these titles are examples of trochaic tetrameter, which is one of the most common English meters (a trochee is a foot consisting of STRONG-weak and tetrameter is four feet per line). Another example is Twinkle Twinkle Little Star, although that has a deficient last foot, but you can sing any of these titles to that tune as well if you just double the last note.

    Trochaic tetrameter creates a strong feeling of sing-song “poem-ness” in English. Most Shakespearean characters, for example, speak in iambic pentameter (weak-STRONG, five feet per line), which sounds more natural, but a few speak in trochaic tetrameter for dramatic effect. For example, MacBeth and Lady MacBeth speak in iambic pentameter, which gives the effect of talking normally: 

    Methought I heard a voice cry “Sleep no more!
    Macbeth does murder sleep,” the innocent sleep,
    Sleep that knits up the ravell’d sleave of care,

    Out, damned spot! out, I say!—One: two: why,
    then, ‘tis time to do’t.—Hell is murky!—Fie, my
    lord, fie! a soldier, and afeard? What need we
    fear who knows it, when none can call our power
    to account?—Yet who would have thought the old
    man to have had so much blood in him?

    But the witches speak in trochaic tetrameter, which makes them seem like they’re delivering an incantation: 

    Double, double toil and trouble;
    Fire burn, and cauldron bubble.

    Fair is foul, and foul is fair

    Previous xkcd on poetry: metrical foot fetish, ballad meter, trochaic fixation. Language Log also has a long, interesting post on meter

     
  7. Optimality Theory 101: Constraints > Rules

    aproposofamelate:

    Okay, so here’s a crash course in Optimality Theory (or OT) for any confused linguistics or curious parties out there.

    At it’s base, OT is essentially just an alternative way to view phonology. Instead of rules to figure out what is and is not ‘allowed’ in a language OT uses constraints and structures grammars as systems that map from the input to the output. The input is referred to the as underlying form whereas the output is the surface realization.  

    Read More

    Another entry in Crowdsourced Linguistics! Yay! 

    Sometimes people also use non-linguistics decision-making analogies to explain Optimality Theory: here’s a coffee-buying analogy via linguisticky, for example. I just realized that this analogy doesn’t link into the more formal layout of OT, with the tableaux and such, so I’m going to do that below. 

    Read More

     
  8. Before we get to ergativity, unaccusitivity and other kinds of morphosyntactic funtimes…

    superlinguo:

    Thanks so much to All Things Linguistic for setting up the Crowdsourced Linguistics project. We tend to prattle on about things we know, or find interesting, so it’s great to get an idea of what some people find bamboozling or tricky about language!

    I offered to help explain the collected jargon of ergative, accusative, unaccusative and unergative. I still remember sitting in undergraduate classes and trying to get my head around ergativity, so for anyone trying to puzzle it out, I feel your pain.

    Each Wikipedia page (linked above) explains the relevant phenomenon with as much detail as you’d find in an undergrad linguistics text book, but to make sense of it you have to start thinking about sentences like a linguist. For example, this is really a very elegant summary:

    image

    But only if you understand what the A, S and O stand for, and what that actually means for real language. I’ve given a short intro before (in this post), but I thought I’d write a post that goes right, right back to basics. Hopefully by time you’ve read this, the information on the various Wikipedia pages will be more accessible. Strap yourselves in, it’s going to be a long post by Superlinguo standards!

    Read More

    I’m so excited to see the explanations start trickling in over this week! This is a great complement to the existing resources on Wikipedia and thanks Lauren/superlinguo for also adding to the existing entry

    Another helpful post explaining ergativity is this one by Literal Minded on what English would look like if it were ergative (and the follow-up post on antipassives, which are like passives for ergative languages). 

     
  9. 28 tips for doing better in your Intro Linguistics course

    allthingslinguistic:

    Just in time for back to school, here are some tips for doing better on your linguistics assignments from someone who’s marked a few hundred of them over the years. 

    General:

    1. Read the question. The easiest mistake to fix: if the question says circle the error and fix it, make sure you do both, or if the question asks for three examples, make sure you give three and not two or four. If the question asks for a transcription, don’t give a translation, and so on. Before you pass something in, read it over to make sure the question and the answer match. 

    2. Use only the necessary words. In grade school, you may have been asked to answer in complete sentences. That doesn’t really matter anymore: what matters is that you show that you understand the material. Linguistics problem sets aren’t essay questions, so a short phrase may be totally sufficient. 

    3. Use the technical words that you’ve been learning (but don’t use the other ones you found on Wikipedia). Part of what you’re being tested on is your ability to use technical vocabulary, so you should say “transitive verb” instead of “an action word that has both a person who did the thing and a person who the thing is done to”. 

    Read More

    Bringing this post back for another year. If you think you’re already pretty good at linguistics, perhaps you’d like the satirical followup: Tips for doing worse in your intro linguistics course

    Also a reminder that I tag by subfield, so you can find potentially useful resource posts in the phonetics, phonology, morphology, syntax, semantics, etc. tags. IPA, protolinguist, and intro linguistics are also good tags for resources. Be warned though that different introductory courses use different simplifications, so when in doubt, go with your professor, TA, and/or textbook. 

     
  10. The Lambda Calculus is often used in semantics as a way of representing meaning in a manner more independent of the specific words used in a particular language. For example, “the cat chased the dog”, “the dog was chased by the cat”, and “le chat a chassé le chien” would all have the same representation because they have the same literal meaning, despite a few pragmatic differences, such as putting focus on the dog or being comprehensible only to speakers of French. 

    This accessible introduction to the Lambda Calculus is aimed at philosophers, but since semantics and philosophy end up having certain areas of intersection, it’s also very useful for linguists. Excerpt: 

    It might look frighteningly mathematical from a distance (it has a greek letter in it, after all!), so nobody outside of academic computer science tends to look at it, but it is unbelievably easy to understand. And if you understood it, you might end up with a much better intuition of computation. […]

    Don’t be intimidated by the word “calculus”! It does not have any complicated formulae or operations. All it ever does is taking a line of letters (or symbols), and performing a little cut and paste operation on it. As you will see, the Lambda Calculus can compute everything that can be computed, just with a very simple cut and paste.

    To follow that, here are some notes on the Lambda Calculus as it relates to linguistics