1. How Linguistics is like Coding

    As a linguist, I’ve had a lot of interesting conversations with programmers about the similarities between linguistics and coding.  

    Here’s an example of a simple linguistic tree structure for the sentence “The internet loves cats” (which of course is also true!) 


    But you can actually represent the information in this structure in several different ways, and many of them look a lot like code.

    Sidenote: the abbreviations I’m using are: 

    • S = sentence
    • NP = noun phrase
    • N = noun
    • D = determiner (article, demonstrative, possessive, etc)
    • VP = verb phrase
    • V = verb

    This tree structure is useful for linguists because shows several things about the sentence. For example, both “the internet” and “cats” are NPs, but one of them is the subject and branches from S, while the other is the object and branches from VP. If we swap the position of the NPs (“Cats love the internet”) we get a sentence that still sounds reasonable but no longer means the same thing.

    On the other hand, if we swap the position of, say the NP “the internet” and the VP “loves cats” (“Loves cats the internet”), then we no longer have a normal-sounding English sentence, although it might sound okay for Yoda. 

    At any rate, this is a pretty normal linguistic structure, and if you take a linguistics course you’ll draw lots of similar and much more complicated  diagrams. So what does this have to do with coding? Let’s take a look at another way of representing this information:


    Now we’ve replaced the branches in the tree with labelled square brackets (here in corresponding colours so you can see them easily). This labelled bracket notation is also commonly used by linguists to generate trees like the one above, using tools like phpSyntaxTree.

    If you’re familiar with the programming language LISP, you may already notice some similarities, but let’s convert all the labelled brackets into XML tags to make it really clear. 


    And with conventional indentation: 


    All of these formats represent a different way of looking at the exact same structure. Structured representations like this are common in all of the subfields of linguistics: the example here uses syntax to show the relationships between words, but hierarchal structures are also used in phonology to show syllable structure and feature geometry, and in semantics and morphology to show the relationships between smaller pieces of meaning. Perhaps this is why markup tools like LaTeX are so popular among linguists. 

    Similar structures are also used in computational linguistics and Natural Language Processing (NLP); see, for example, how dependency structures are used in Google ngrams

    Realizing the underlying similarities between their two fields can give both linguists and programmers a head start in learning about each other. 

    Edit: I should also point out that it’s not a coincidence that syntax trees and XML can represent the same structure. They’re both based on the mathematical concept of a partially ordered set, another example of which is a family tree. 

    1. sherrynorico reblogged this from allthingslinguistic
    2. asmk reblogged this from han-nara
    3. prioadelia reblogged this from han-nara
    4. han-nara reblogged this from allthingslinguistic
    5. naradeer reblogged this from allthingslinguistic
    6. kimerakincaid reblogged this from queencorazon
    7. queencorazon reblogged this from throughtosunrise
    8. luxexmachina13 reblogged this from allthingslinguistic
    9. throughtosunrise reblogged this from allthingslinguistic
    10. thambos reblogged this from thebearprogrammer
    11. suzakukotowari reblogged this from allthingslinguistic and added:
      It is everywhere 0___o
    12. sylvr3 reblogged this from codingandtea
    13. nickavv reblogged this from codingandtea
    14. chelonaut reblogged this from codingandtea
    15. codingandtea reblogged this from thebearprogrammer
    16. thebearprogrammer reblogged this from allthingslinguistic
    17. timecubed reblogged this from science-of-noise
    18. casstr reblogged this from allthingslinguistic and added:
      This is an awesome post by allthingslinguistic* about the similarities between linguistics and computer science. The...
    19. ahintofkrazyness reblogged this from allthingslinguistic
    20. all--maroon reblogged this from trascendjamo
    21. trascendjamo reblogged this from allthingslinguistic
    22. go-fly-a-kite reblogged this from allthingslinguistic
    23. couldyoujusttrustmethisonce reblogged this from allthingslinguistic
    24. vynessia reblogged this from allthingslinguistic and added:
      This so neat (and so true!) Just gonna throw this out here: The compiler (the program that translates code into...