Hidden Markov Models and Natural Language Processing

Problem Definition

In this assignment, we are using Hidden Markov Models (HMM) for Language Processing. The goal of this assignment is to have a computer recognize and parse the sentences with our select vocabulary. We used Forward and Backward Procedure to recognize the pattern. We also implement Viterbi algorithm to determine the optimal state path for each observation. We then use Baum-Welch algorithm to optimize the the HMM. This program is useful since the HMM can be used as a learning tool and optimization of real world situations like in a strategy such as chess moves. The program will learn from its database and calculate the optimization move.

Method and Implementation

We create a file input as the which will read line by line from the its calibrating set. The first input will tell the number of states, number of observation symbols, and the length of the observation sequence. The second line would read the basic English syntactic structures which is correlated to each of the states. The third line will contain its vocabulary. The lines after will state the matrix a, matrix b, and vector pi which will utilize for Viterbi algorthim and the Baum-Welch algorithm. We will then run with some examples to observe the probability in reaching that sentence using Viterbi algorithm. We split each line and store their values. In the rocognize.java file, we have forward method which will calculate the alpha values. There's also a backward function which will calculate the beta value. There are stage function which will read the word and go to the various stages. In statepath.java, we use Viterbi algorithm to to determine the optimal state path for each observation set and report its probability. In optimize.java, the Baum method will calculate the new optimization which will print the before and after.


Question 1:  For the current application, why does this probability seem lower than we expect? What does this probability tell you? Does the current HMM always give a reasonable answer? For instance, what is the output probability for the below sentence?
"robots do kids play chess"

"chess eat play kids"

You may want to use different observation data to justify your answer.


"robots do kids play chess"

Probability: 0.0015119999999999997

"chess eat play kids"

Probability: 0.0

With the 8 given words,we can generate a lot of possible sentences with different length, different sequence, and the probability of each sentence only occupies a small rate of the total probability 1. As long as the probability is bigger than 0, it shows that this sentence is valid according to this HMM. It does not always give a reasonable answer, for example:


"robots do kids play chess"

Probability: 0.0015119999999999997

The HMM decides that "robots do" is valid and "do kids play chess" is valid, it doesn't know it's actually a combination of two valid sentences.To prove it,I combined other valid sentences and experiment as follows:

"kids can kids play chess"

Probability: 0.0018899999999999998

It's not a valid sentence but the probability is bigger than 0.

Question 2:  What can we tell from the reported optimal path for syntax analysis purpose? Can the HMM always correctly distinguish "statement" from "question" sentence? Why?


Answer: We can tell the syntax estimation from the output result, which gives a meaningful explanation of the "question sentence".

Below is the connection of syntax-words relationship from the b matrix. We can tell that the HMM may not always correctly distinguish "statement" from "question" because of the red probability below, which might cause confusion between OBJECT and SUBJECT.

SUBJECT---  0.5 0.4 0.0 0.0 0.0 0.0 0.05 0.05         

                   kids robots

AUXILIARY---0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0                   

                               do can

PREDICATE---0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0                           

                                           play eat

OBJECT  --- 0.1 0.2 0.0 0.0 0.0 0.0 0.3 0.4                                  

                                                 chess food

Question 3: Why should you not try to optimize an HMM with zero observation probability?

When it's zero, we can not calculate sai(i,j)t because observation probability is the divendence. This is one of the limitations of this method.

in the Baum method, for newA and newB calculation part, I added the if condition which stated: when the dividend(observation probability) is zero(I wrote < 0.00001 to represent it in code), newA and newB should just take the original a and b, or the result will be infinity. 

Model Enhancement

Now supposed you want this HMM to model new syntax structures, like "PRESENT TENSE" or "ADVERB," so that the following sentences can be parsed: 
"robots can play chess well"
"kids do eat food fast"

Question:  What kinds of changes will you need to make in the above HMM? Please describe your solution with an example of the modified matrices a, b and pi in the submitted web page.


5 10 5


kids robots do can play eat chess food well fast


0.0 0.4 0.6 0.0 0.0

0.7 0.0 0.3 0.0 0.0

0.0 0.0 0.0 0.5 0.5

0.0 0.0 0.0 0.5 0.5 

0.0 0.0 0.0 0.0 0.0


0.5 0.4 0.0 0.0 0.0 0.0 0.05 0.05 0.0 0.0

0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0

0.1 0.2 0.0 0.0 0.0 0.0 0.3 0.4 0.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5


0.6 0.3 0.1 0.0 0.0



This project was interesting. I was able to see how the sentence structure and the use of Viterbi and Baum-Welch algorithm in the Hidden Markov Model can be used to teach a computer correct grammar. If this was done in the larger scale, the computer will be able to successfully learn English grammar. In that sense, this can help with automation of factory machines and even have better human-like interactions among automated calls.



Haoqing Geng(Panda)

Xuli Song

Stanley Cheng