If you are someone who has been reading about text analytics, you would have noticed that text analytics is not one thing.
Like our Solar system, it is a million little things that make up the whole. The primary purpose of text analytics is to derive insights out of text.
Text analytics systems analyze text for sentiments, patterns and tonal changes. To deduce the right insight out of text, various types of text analytics.
Curious to know what those types of text analytics are?
As the term suggests, in word frequency is a form of text analytics where the recurrence of a word is scanned for. The word could be a brand name, a noun, or a term that denotes the positive or negative sentiment of the user.
Collocation helps identify text that is usually come in a series or co-occur. Collocation helps predict what kind of text would follow the primary text with a higher frequency than just chance.
For example, Strong tea. Heavy rain.
Concordance refers to an alphabetical list of words used in a work of literature or publication. These are words or phrases of importance that are well-known. For example, the Bible, the Vedas, Quran, works of literature, etc.
These texts could belong to a pre-computer period due to which they could reside mostly in the analog form.
N-gram is a contiguous sequence of ‘n’ number of items in a given piece of text or speech. N-grams belongs to the field of computational linguistics and probability. N-grams could be nouns, alphabets, pairs of words, etc.
N-grams can be used to analyze speech texts or scripts that require deep analytics.
Named entity recognition
Imagine a heap of text that has a collection of nouns, like names of people, organizations, people, etc. These are text types that are fixed in nature and refer to only one meaning in any context.
Document classification is the exercise of assigning a specific document to several classes or categories.
For example, in a business scenario, pre-trial, and post-trial checklists would be classified as sales enablement.
Document classification is particularly useful when there are many documents involved and they have to be sorted in a certain fashion.
Corpora is the plural form of corpus, which in turn, means a collection of written texts. In corpora comparison, text analytics is applied to compare two corpora and find the top 5,000 frequently used words.
A keyword score is also applied to each corpus separately.
Language use over time
Imagine a tool which will help you analyze and derive frequencies of comma-delimited search strings? That’s what language use over time does. A typical example of this is Google Ngram Viewer.
Most text phrases have a cluster of words to them that indicate one common meaning or intention. For example, in an NPS survey response, the cluster of words would represent the customer’s positive or negative impressions about a product or service.
Cluster detection helps create an overall picture of the text phrase that is being analyzed.
Although not a typical form of data analytics, data visualization aids text analytics in the pictorial depiction of the data.
For example, a long series of numbers can be shown as a bar graph or pie chart for easier understanding.
Data visualization makes information consumption easier to consume which in turn facilitates quicker decision-making.
Choosing the right type of text analytics
These are the 10 most popular text analytics types that one can use to derive insights out of textual data. Each type has a specific purpose. Choosing the right type of text analytics will help you unearth information quickly and with minimal effort.
For example, if you are using a Net Promoter Score to measure your customer loyalty, word frequency, named entity recognition, or cluster detection can be used to analyze the tone of customer feedback.