My Understanding of Topic Modeling via Latent Dirichlet Allocation(LDA)

Edward Mendoza
3 min readFeb 24, 2021

So far in my Data Science Bootcamp, I’ve been thrown a lot of subject matter of which I hope to retain, but the one subject that I will know by heart is topic modeling. To sum up briefly, Topic modeling is a powerful unsupervised learning tool where the machine will take an input of unstructured data (such as a bag of words) and sort out the unstructured data into topics where the data scientist must interpret. The technique that I will further discuss in this blog post is Latent Dirichlet Allocation. If we were to break the words down verbatim, “latent” in this term means “hidden” whereas “Dirichlet” is a type of probability distribution. So if we were to bring it altogether, Latent Dirichlet Allocation is the probability distribution of hidden words within a given set.

Let’s say, you have the following documents that display these text:

Some of them are in regards of visiting foreign countries, the rest are about about flowers.

If you applied LDA on this document, you would get the following output:

If we were to take a look into the topics, we would see a breakdown similar to the following:

So overall, How LDA works is the following:

  1. Input the Document-Term Matrix, number of topics, and number of iterations
  2. Randomly assign each word in each word in each document to one of 2 topics
  3. Go through every word and its topic assignment in each document. Look at how often the topic occurs in the document and how often the word occurs in the topic overall. Based on this info, assign the word a new topic.
  4. Go through multiple iterations of this. Eventually the topics will start to make sense and this is where you will have to interpret them.

Python Libraries such as gensim will do the manual labor of steps 3 and 4, so the only steps you’ll need to worry about is choosing the number of topics and the iterations needed to make the topics interpretable.

For my Capstone project, I will uncover the topics found in reddit posts from subreddits dealing with male issues. I would like to use these topics to help mental health professionals what men are facing in today’s society. My next post will be displaying the results of my Capstone project, and share any key insights that I have found within this sobering topic. Stay tuned!

--

--