Ganesan Senthilvel: Text Analytics

Text Analytics is Microsoft AI library with the below tasks

Tokenization: Breaking text into individual words or phrases (tokens).
Stop Word Removal: Removing common words (e.g., "the," "a," "is") that don't carry much meaning.
Stemming/Lemmatization: Reducing words to their root form (e.g., "running" to "run"). Stemming is a simpler, rule-based approach, while lemmatization uses dictionaries and is more accurate.
Part-of-Speech Tagging: Identifying the grammatical role of each word (e.g., noun, verb, adjective).
Named Entity Recognition (NER): Identifying named entities like people, organizations, and locations.
Sentiment Analysis: Determining the emotional tone of text (positive, negative, neutral).
Topic Modeling: Discovering underlying topics in a collection of documents.
Text Classification: Assigning categories or labels to text.

Monday, February 17, 2025