Technologies Used

  • Python
  • Viterbi Algorithm
  • Bayesian Inference
  • Hidden Markov Models

Introduction

The focus of this project is to predict the categories of words in a sentence: the Part of Speech (POS). To do so, we use a hidden markov model, which contains the probabilities that each word belongs to each of the categories. I then use the Viterbi algorithm to classify the words, as it allows me to compute the sequence of POS tags that has the highest joint probability efficiently - that is the most likely.

Personal Contributions

  • Implementation of a Data Preprocessing script to extract the words and their POS tags
  • Training of the HMM model using dataset
  • Implementation of the Viterbi algorithm to classify the words
  • Implemented all functions with Numpy arrays to improve performance
  • Implementation of a script to evaluate the accuracy of the model