1. Protolinguist resources: Teaching yourself corpus linguistics

    Many domains of linguistics use collections of language data (written or recorded) to look for patterns about languages or dialects. A corpus can be huge or tiny, and can be created as part of research or already exist as a public or private source. In this resource post, I provide a ton of links to learn more about various types of linguistics that use corpuses/corpora (pick your favourite plural): sociolinguistics, historical linguistics, typology, and language documentation/revitalization/fieldwork.* 


    Sociolinguistics is about how people who speak the same language speak differently depending on various social and demographic factors (age, gender, social class, ethnicity, etc). Dialectology tends to focus more on how dialects are different in different locations.

    Historical linguistics

    Historical linguistics is about how language changes over time, for example from Beowulf to Modern English, or attempting to reconstruct the original source language that several related languages are derived from. Etymology and philology are subfields of historical linguistics.


    Typology is about comparing various related and non-related languages in attempt to figure out what characteristics are common to all languages, what characteristics are different, and how they vary. With historical linguistics, typology is part of a larger group of comparative linguistics.

    Language documentation

    Language documentation is about recording and describing languages, generally those that have not historically been studied in as much detail and which may be endangered. Sometimes documentation may overlap with language revitalization if the people of that language are interested in speaking it more but it’s no longer commonly spoken in homes.

    In an academic setting, documentation is generally introduced through field methods courses, where a class works with a speaker of a language that none of them has any previous knowledge of, and learns how to ask them questions to figure out the structure of their language. 

    Documentation/fieldwork is normally learned by doing, so it’s not really possible to learn everything about it just by reading things online. You could also try reading a descriptive grammar (list of grammars) of an interesting-looking language or asking questions of a friend who speaks a language you don’t know, if they’re interested. 

    Content note: I’m not trying to assert any theoretical differences by putting something in the “corpus” or “experimental" posts, just trying to split up areas so one post doesn’t get too long. Corpus methods and experimental methods often overlap in each sub-field.

    *Notes: Some of the links overlap in content, especially chapters and slides. This is deliberate, so if you don’t like how something is explained in one place, try somewhere else. Content is taken from a variety of sources, which may use slightly different theories or simplifications: don’t panic. Introductory linguistics courses vary in how much they cover corpus-related topics: some may not talk about them at all, while others may go into considerable detail in one or more areas. Reading everything would be closer to a full undergrad course in each of these sub-disciplines, so don’t feel like you have to work your way through absolutely everything. If you have questions about what you’re reading, you will probably get a faster response posting in #linguistics where multiple people can see you and reply than messaging me directly.

    This post is part of a series on resources for teaching yourself linguistics. Previously: semantics,  syntax,  morphology,  phonetics/phonology,  why “protolinguist”,  and my original protolinguist post. Next: experimental, descriptive grammar, philosophy of language/linguistic anthropology. Any comments/feedback very much appreciated, especially if you are trying to learn more about linguistics or if you have more (fun or serious) corpus links to add. Posts will be tagged #linguistics and #protolinguist, and I’ll be checking both.

    1. rosie-girl reblogged this from allthingslinguistic and added:
      Try as I may, I cannot get this to reblog in full text.
    2. thesecondsexandthecity reblogged this from allthingslinguistic
    3. biromanatees-like-cats reblogged this from allthingslinguistic and added:
      ooh ooh ooh shiny
    4. nieczynne reblogged this from allthingslinguistic
    5. maccababy reblogged this from allthingslinguistic and added:
      so many resources :’)
    6. princessofbakingandhighnotes reblogged this from theinformationdump
    7. theinformationdump reblogged this from allthingslinguistic
    8. thothofnorth reblogged this from allthingslinguistic
    9. terminalcountdown reblogged this from allthingslinguistic
    10. thewantsies reblogged this from bellalinguista and added:
      Woohoo for corpus linguistics!
    11. andperseampersand reblogged this from allthingslinguistic and added:
      But why must I choose ONE branch
    12. baalakavii reblogged this from allthingslinguistic
    13. uncannylinguist reblogged this from allthingslinguistic
    14. naradeer reblogged this from allthingslinguistic