1. How Linguistics is like Coding

    As a linguist, I’ve had a lot of interesting conversations with programmers about the similarities between linguistics and coding.  

    Here’s an example of a simple linguistic tree structure for the sentence “The internet loves cats” (which of course is also true!) 


    But you can actually represent the information in this structure in several different ways, and many of them look a lot like code.

    Sidenote: the abbreviations I’m using are: 

    • S = sentence
    • NP = noun phrase
    • N = noun
    • D = determiner (article, demonstrative, possessive, etc)
    • VP = verb phrase
    • V = verb

    This tree structure is useful for linguists because shows several things about the sentence. For example, both “the internet” and “cats” are NPs, but one of them is the subject and branches from S, while the other is the object and branches from VP. If we swap the position of the NPs (“Cats love the internet”) we get a sentence that still sounds reasonable but no longer means the same thing.

    On the other hand, if we swap the position of, say the NP “the internet” and the VP “loves cats” (“Loves cats the internet”), then we no longer have a normal-sounding English sentence, although it might sound okay for Yoda. 

    At any rate, this is a pretty normal linguistic structure, and if you take a linguistics course you’ll draw lots of similar and much more complicated  diagrams. So what does this have to do with coding? Let’s take a look at another way of representing this information:


    Now we’ve replaced the branches in the tree with labelled square brackets (here in corresponding colours so you can see them easily). This labelled bracket notation is also commonly used by linguists to generate trees like the one above, using tools like phpSyntaxTree.

    If you’re familiar with the programming language LISP, you may already notice some similarities, but let’s convert all the labelled brackets into XML tags to make it really clear. 


    And with conventional indentation: 


    All of these formats represent a different way of looking at the exact same structure. Structured representations like this are common in all of the subfields of linguistics: the example here uses syntax to show the relationships between words, but hierarchal structures are also used in phonology to show syllable structure and feature geometry, and in semantics and morphology to show the relationships between smaller pieces of meaning. Perhaps this is why markup tools like LaTeX are so popular among linguists. 

    Similar structures are also used in computational linguistics and Natural Language Processing (NLP); see, for example, how dependency structures are used in Google ngrams

    Realizing the underlying similarities between their two fields can give both linguists and programmers a head start in learning about each other. 

    Edit: I should also point out that it’s not a coincidence that syntax trees and XML can represent the same structure. They’re both based on the mathematical concept of a partially ordered set, another example of which is a family tree. 

    1. stiltwalkingelephantsarecool reblogged this from languageramblings
    2. arrendajoazul reblogged this from languageramblings
    3. dedehawk reblogged this from languageramblings
    4. singingofdudesarms reblogged this from languageramblings
    5. amymvuong reblogged this from demizuko
    6. smileitskyle reblogged this from languageramblings
    7. soyspecters reblogged this from lonelylinguist
    8. lonelylinguist reblogged this from aceclint
    9. demizuko reblogged this from aceclint
    10. mishbueno reblogged this from amateurlanguager
    11. aceclint reblogged this from ultralinguisticsnerd
    12. newporkstateofmind reblogged this from yamaharfang
    13. cyclobuttane reblogged this from languageramblings
    14. louise-rachel reblogged this from badgerssong
    15. iceglory reblogged this from badgerssong
    16. weresquirrel reblogged this from languageramblings
    17. chthonichellbeast reblogged this from amateurlanguager
    18. badgerssong reblogged this from yamaharfang
    19. nicethanks reblogged this from allthingslinguistic