1. How Linguistics is like Coding

    As a linguist, I’ve had a lot of interesting conversations with programmers about the similarities between linguistics and coding.  

    Here’s an example of a simple linguistic tree structure for the sentence “The internet loves cats” (which of course is also true!) 


    But you can actually represent the information in this structure in several different ways, and many of them look a lot like code.

    Sidenote: the abbreviations I’m using are: 

    • S = sentence
    • NP = noun phrase
    • N = noun
    • D = determiner (article, demonstrative, possessive, etc)
    • VP = verb phrase
    • V = verb

    This tree structure is useful for linguists because shows several things about the sentence. For example, both “the internet” and “cats” are NPs, but one of them is the subject and branches from S, while the other is the object and branches from VP. If we swap the position of the NPs (“Cats love the internet”) we get a sentence that still sounds reasonable but no longer means the same thing.

    On the other hand, if we swap the position of, say the NP “the internet” and the VP “loves cats” (“Loves cats the internet”), then we no longer have a normal-sounding English sentence, although it might sound okay for Yoda. 

    At any rate, this is a pretty normal linguistic structure, and if you take a linguistics course you’ll draw lots of similar and much more complicated  diagrams. So what does this have to do with coding? Let’s take a look at another way of representing this information:


    Now we’ve replaced the branches in the tree with labelled square brackets (here in corresponding colours so you can see them easily). This labelled bracket notation is also commonly used by linguists to generate trees like the one above, using tools like phpSyntaxTree.

    If you’re familiar with the programming language LISP, you may already notice some similarities, but let’s convert all the labelled brackets into XML tags to make it really clear. 


    And with conventional indentation: 


    All of these formats represent a different way of looking at the exact same structure. Structured representations like this are common in all of the subfields of linguistics: the example here uses syntax to show the relationships between words, but hierarchal structures are also used in phonology to show syllable structure and feature geometry, and in semantics and morphology to show the relationships between smaller pieces of meaning. Perhaps this is why markup tools like LaTeX are so popular among linguists. 

    Similar structures are also used in computational linguistics and Natural Language Processing (NLP); see, for example, how dependency structures are used in Google ngrams

    Realizing the underlying similarities between their two fields can give both linguists and programmers a head start in learning about each other. 

    Edit: I should also point out that it’s not a coincidence that syntax trees and XML can represent the same structure. They’re both based on the mathematical concept of a partially ordered set, another example of which is a family tree. 

    1. blongshahang reblogged this from allthingslinguistic
    2. liapher reblogged this from allthingslinguistic
    3. faust042 reblogged this from allthingslinguistic and added:
      Cool! :)
    4. andunie reblogged this from allthingslinguistic
    5. sherrynorico reblogged this from allthingslinguistic
    6. asmk reblogged this from han-nara
    7. prioadelia reblogged this from han-nara
    8. han-nara reblogged this from allthingslinguistic
    9. naradeer reblogged this from allthingslinguistic
    10. kimerakincaid reblogged this from queencorazon
    11. queencorazon reblogged this from throughtosunrise
    12. luxexmachina13 reblogged this from allthingslinguistic
    13. throughtosunrise reblogged this from allthingslinguistic
    14. thambos reblogged this from thebearprogrammer
    15. suzakukotowari reblogged this from allthingslinguistic and added:
      It is everywhere 0___o
    16. sylvr3 reblogged this from codingandtea
    17. nickavv reblogged this from codingandtea
    18. chelonaut reblogged this from codingandtea
    19. codingandtea reblogged this from thebearprogrammer
    20. thebearprogrammer reblogged this from allthingslinguistic
    21. timecubed reblogged this from science-of-noise
    22. casstr reblogged this from allthingslinguistic and added:
      This is an awesome post by allthingslinguistic* about the similarities between linguistics and computer science. The...