Technologies Used

Python
Viterbi Algorithm
Bayesian Inference
Hidden Markov Models

Introduction

The focus of this project is to predict the categories of words in a sentence: the Part of Speech (POS). To do so, we use a hidden markov model, which contains the probabilities that each word belongs to each of the categories. I then use the Viterbi algorithm to classify the words, as it allows me to compute the sequence of POS tags that has the highest joint probability efficiently - that is the most likely.

Personal Contributions

Implementation of a Data Preprocessing script to extract the words and their POS tags
Training of the HMM model using dataset
Implementation of the Viterbi algorithm to classify the words
Implemented all functions with Numpy arrays to improve performance
Implementation of a script to evaluate the accuracy of the model