In our last machine learning deep dive, we discussed the different problems that could occur with text classification in machine learning, and how to tackle them. Today, we would like to apply the different theories on a use case. We will investigate how to handle text classification problems in email triage.
Let’s start with a comparison between an email and a question in a chatbot. Typically, a question in a chatbot is very short:
"What's the weather tomorrow?"
In contrast, the main intention in an email might be quite hidden, for example, when you forward an email and write in the “fyi” body. In this case, the recipient then scrolls down in the email, finds first the header of the previous email which is in the best case followed by the body containing somewhere the main information. Then the body is usually followed by a long footer. If it is a thread of emails, then there will even be several bodies. In short, in order to get to the main message of an email you might have analyse it thoroughly. In machine learning, this is referred to as pre-processing.
The pre-processing of emails should at least address the following points:
- Email specific patterns
- Named Entities Recognition
To have a good model performance improvement vs invested pre-processing time ratio, you can just focus on email specific patters and named entities recognition.
"An extremely helpful tool are regular expressions, regex in short."
Let’s take the Swiss social security number as an example of a named entity. This number is always 13 digits long and starts with 756. If you train a decent named entity recognition model with enough data, it could of course learn that the word “756.1234.5678.90” is a Swiss social security number. An easy alternative to building an machine learning model for named entities recognition, you can use a pattern called a regular expression (Regex). Using regex, an email can be substantially reduced as the example in shown Figures 4 and 5 shows.
Figure 4: Original email
Figure 5: Pre-processed email
The detection of postal addresses is a bit trickier than just using regex. However, all the other named entities and patterns in figure 4 can easily be treated by regex expressions.
Build and train a model
Once the text is pre-processed, you have to choose a model to train the e-mail classifier. Tree based methods as random forests and gradient boosting decision trees usually yield very good results (in python see for example randomforest from sklearn, xgboost or lightgbm). In order to use them, we have to transform the input text into numerical vectors. There exists many methods like doc2vec to perform such input transformation but one of the simplest and most performing one is the so called term frequency inverse document frequency method TF-IDF (however when using tree based methods the usefulness of the “IDF” part is questionable). See Figure 6 for a short explanation of TF-IDF we used on a chatbot questions corpus.
Figure 6: How to transform text into a vector with TF-IDF.
After this step each email in the training set is represented as a numerical vector. Together with the emails labels representing the categories we want to classify each email in, we can train the model. In the case of email triage for a customer service desk, labels could be “change of address”, “login problem”, “product question”, “question to invoice”, “fidelity card”, and so on. In the previous figure 4 the email would have two labels “change of address” and “question to invoice”.
After building a first model for text classification, you can boost the final result by trying other text pre-processing techniques.
Evaluation of the model and prediction
Once you’ve built the model, you have to evaluate it. Then probably you see room for improvement, go back to step 1 about pre-processing and start all over again.
"What is the best way to evaluate a model?"
The answer to this question depends a lot on the context and on the requirements of the use case. The easiest evaluation is the accuracy of the model. It’s the most straight forward one to explain for example to business stakeholders. However, when the labels are not equally distributed, this is a rather bad choice. Let’s look at an extreme example as fraud detection cases where we can think of a proportion of 1 sample out of 1000 is a fraud. If we predict that nothing is a fraud, then we have 99.9% accuracy which in this case means nothing. Examples of other evaluation metrics that are widely used are:
- Log loss or cross-entropy
- F1 score
- Area under the ROC curve (AUC)
- Precision and Recall, Sensitivity, and Specificity particularly
In particular, for email triage it might be a good idea to look at the confusion matrix of the model. In Figure 7 you see an example of the confusion matrix for our company internal chatbot.
Figure 7: Confusion matrix of our company-internal chatbot.
Looking at this confusion matrix, you can see that there is for example some confusions regarding the intents “how are you?”, “goodbye”, and “greetings”. However, when asking “what the most important use cases for the employees of the company are”, it’s probably not these intents that have entertainment goals but more the intents like “who is Alex”, “what is my staffing”, “who is present/available”. So even if the overall performance of the model with one of the metrics above might not be so good, for the actual business case the model might be very well optimized. Of course, you can also introduce weights in the metrics above to address the business needs. However, the visualisation as in the confusion matrix is a very intuitive tool everyone understands.
In this insight, we went through the main steps of building a text classification machine learning model for the Email Triage use case which are: Text pre-processing, model building, model training and finally model evaluation. For more use cases on artificial intelligence, you can visit our dedicated page. Stay tuned for the next insight on Keras technology!
You can read more about machine learning use case: Keras for text classification