Computational Linguistics

Computational Linguistics

Computational linguistics (CL) is a relatively specific subject of study and when I am asked about it, further explanations are usually necessary, so this is why I included this post here. It is an interdisciplinary field that combines computer science, linguistics and artificial intelligence. In computational linguistics, we study human natural language with the use of computer technology to provide models of various kinds of language phenomena. The subject also includes programming, when developing algorithms and systems that can process, analyse and generate language data. Hence, it involves both, linguistic theory and the development of applications for text or speech, like machine translation, natural language processing (NLP), or speech recognition.

In essence, transforming text into a numerical representation of 0s and 1s that is understandable to computers involves different techniques in processing. If you imagine any kind of text you’d automatically like the computer to categorize, then in the beginning this is just a string sequence of characters to a computer without understanding of the text’s meaning. First, it’s fundamental to split a text into sentences which are further split into words (or tokens). Sometimes it is necessary to clean or simplify the words to their underlying form. Depending on the goal of the task, one might enrich the text by identifying word categories (nouns, verbs etc.), sentence structures, proper names with exact reference, or meanings inside the text. Then, before feeding language into algorithms that can detect patterns and make decisions based on those patterns, we need to transform the text into a representation of numbers (e.g. vectors).

In the sketch below that I painted you can see some of the different processing stages in a Natural Language Processing (NLP) pipeline with some very simple illustrations.

NLP Pipeline
Some example applications for automatic text data categorization are assigning news articles under topics or detecting the positive, negative or neutral sentiment expressed in a post or product review.



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Offboarding