Skip to main content

Posts

Showing posts from August, 2020

Build an autocorrect model to correct misspelled words

Autocorrect is an application that changes misspelled words into correct ones. You have it on the phone and on your computer inside your document editors, as well as email applications. For example, it takes a sentence like this one, “happy birthday deah friend.” , and corrects the misspelled word deah to dear . We can use a simple yet powerful model to solve this kind of problem. One thing to notice here is that you are not searching for contextual errors, just spelling errors,  So words like deer will pass through this filter, as it is spelled correctly, regardless of how the context may seem. This simple yet powerful autocorrect model works in 4 key steps: Identify a misspelled word Find strings n edit distance away Filter candidates Calculate word probabilities Now, let’s look at the details of how each step is implemented. Step 1: Identify a misspelled word How does the machine know if a word is misspelled or not?  Simply put, if a word is not given in a dictionary, th...

Comparison between Logistic Regression and Support Vector Machine

Logistic regression and support vector machine (SVM) are both popular models that can be applied to classification tasks. This article gives the introduction of these two methods and summarizes the differences between them. What is Logistic Regression? Logistic regression is a generalized linear model for binary classification. In logistic regression, we take the output of the linear function and then pass the value to the sigmoid function. The sigmoid function is S-shaped,  it is a bounded and differentiable activation function.  We use sigmoid function in logistic regression because it can take any real-valued number and map it into a value between the range of 0 and 1, as is known to all, the probability of any event is between 0 and 1, so sigmoid function is an intuitive and right choice for logistic regression. After we get the probabilities, we then set a threshold to make decisions, if the probability is greater than the threshold, we assign it a label 1, else we a...

How to use Naive Bayes to perform sentiment analysis

Naive Bayes has a wide range of applications in sentiment analysis, author identification, information retrieval and word disambiguation. It's a very popular method since it's relatively simple to train, easy to use and interpret. In this article, I will summarize how to use Naive Bayes to perform sentiment analysis on text data. General Idea  Bayes Rule for determining if a word is positive or negative: How about determining if a writing is positive or negative? When you use Naive Bayes to predict the sentiments of a writing, what you're actually doing is estimating the probability for each sentiment by using the joint probability of the words in sentiments. The Naive Bayes formula is just the ratio between these two probabilities, the products of the priors and the likelihoods.  The Naive Bayes inference formula for this kind of  binary classification: However, the formula above is the products of ratio, which brings risk of underflow. To solve this problem, you can t...

How to use logistic regression to perform sentiment analysis

Introduction Sentiment analysis is the process of determining whether a piece of writing is positive, negative or neutral. It can be used to better understand the users and improve the products. For example, you can understand customer experience towards products on Amazon in their online feedback, or you can identify Twitter’s user sentiment in the tweets they posted. It’s very easy to use logistic regression to perform sentiment analysis, only 3 steps are needed: Text data preprocessing Feature Extraction Building a logistic regression model Now, let’s take a look at the details of each step.  Step1: Text Data Preprocessing There are several steps in text data preprocessing: Tokenizing the string It means split the string into individual words without blanks or tabs Lowercasing ‘HAPPY’, ‘Happy’ -> ‘happy’ Removing stop words and punctuation Stop words: 'is', 'and', 'a', ‘are’, … Punctuation: ',', '.', ':', '!', ... Remove han...