As a linguist, I’ve had a lot of interesting conversations with programmers about the similarities between linguistics and coding.
Here’s an example of a simple linguistic tree structure for the sentence “The internet loves cats” (which of course is also true!)
But you can actually represent the information in this structure in several different ways, and many of them look a lot like code.
Sidenote: the abbreviations I’m using are:
- S = sentence
- NP = noun phrase
- N = noun
- D = determiner (article, demonstrative, possessive, etc)
- VP = verb phrase
- V = verb
This tree structure is useful for linguists because shows several things about the sentence. For example, both “the internet” and “cats” are NPs, but one of them is the subject and branches from S, while the other is the object and branches from VP. If we swap the position of the NPs (“Cats love the internet”) we get a sentence that still sounds reasonable but no longer means the same thing.
On the other hand, if we swap the position of, say the NP “the internet” and the VP “loves cats” (“Loves cats the internet”), then we no longer have a normal-sounding English sentence, although it might sound okay for Yoda.
At any rate, this is a pretty normal linguistic structure, and if you take a linguistics course you’ll draw lots of similar and much more complicated diagrams. So what does this have to do with coding? Let’s take a look at another way of representing this information:
Now we’ve replaced the branches in the tree with labelled square brackets (here in corresponding colours so you can see them easily). This labelled bracket notation is also commonly used by linguists to generate trees like the one above, using tools like phpSyntaxTree.
If you’re familiar with the programming language LISP, you may already notice some similarities, but let’s convert all the labelled brackets into XML tags to make it really clear.
And with conventional indentation:
All of these formats represent a different way of looking at the exact same structure. Structured representations like this are common in all of the subfields of linguistics: the example here uses syntax to show the relationships between words, but hierarchal structures are also used in phonology to show syllable structure and feature geometry, and in semantics and morphology to show the relationships between smaller pieces of meaning. Perhaps this is why markup tools like LaTeX are so popular among linguists.
Similar structures are also used in computational linguistics and Natural Language Processing (NLP); see, for example, how dependency structures are used in Google ngrams.
Realizing the underlying similarities between their two fields can give both linguists and programmers a head start in learning about each other.
Edit: I should also point out that it’s not a coincidence that syntax trees and XML can represent the same structure. They’re both based on the mathematical concept of a partially ordered set, another example of which is a family tree.