SIMPLE TEXT MINING TECHNIQUES IN CUSTOMERS’​ MEMOS

Nebile Kodaz
4 min readJan 31, 2020

Hello Medium people!

Almost every day, we are reading online texts or we are texting on the phone or sending emails to our colleagues. Texts are an unavoidable part to be faced in our lives. Texts are a mean of transferring data and messages. The mass amount of text data whet data scientists’ appetite! I will tell you about how I used the text mining techniques in R for the memos from customers in the company in which I am working.

In Konusarak Ogren, the customers can communicate on phone with the customer relations department or they can send memos. The memos and the responses for these memos are recorded on the database. The memos might be over 200 characters. As text data, the memos are not categorized and structured while being stored in the database.

The problem was in the company no one has an idea about what customers mostly request via memos except the responsible people. The company was able to use memos just as a communication means simply while many insights or interpretations might be given into customer experiences. The personal perceptions of the responsible people had an effect on the evaluation of the memos for the benefits of the company. Also, the responses of memos have a cost for the company. Decreasing the number of memos would be cost-saving for the company. The insights from memos might be useful for the product development department because memos tell about the requests from customers directly and clearly. Understanding customers is key to develop our service in the sale.

Then we knew the purpose of analyzing the memos. R statistical program has some libraries for text mining. In this analysis, “tm” package helped us mostly. We, a full-stack data person in the company, retrieved the text memo data from the company database. For further steps, we could focus on a specific group of customer memos. To have a general idea about all memos, we did not use any constraints in SQL query this time. We saved the data as csv file to import into the R data container.

The other steps were preprocessing the text data. We did not forget the parameter to consider Turkish language in “tm” text mining package.

tm_map(docs,removePunctuation) #We removed the punctuations in the text.

tm_map(docs, removeNumbers) #We removed the numbers in the text.

tm_map(docs, tolower) #We converted the capitals to lower.

docs <- tm_map(docs, removeWords, c(“responsible person”, “bey”, “br”, “istiyorum”, “merhaba”, “iyi günler”)) #We ignored some specific words in the text.

docs <- tm_map(docs, stripWhitespace) # We erased some extra spaces between words after we removed some specific words.

docs <- tm_map(docs, PlainTextDocument) #We created a character set with the text.

dtm <- DocumentTermMatrix(docs) #We provided a text matrix to do some operations on the text data.

During these preprocessing steps, we can call the text data as “corpus”. This is a well-known text mining term that means a body of text data. The corpus was ready to do further processes in the analysis. One of these processes was term frequency in the text (TFi). We explained the table of term frequencies to the data stakeholders by visualizing in ggplot R package (Look at Figure-1 and Figure-2). Additionally, we created a word cloud text data visual for a better presentation of the analysis results. After that, we had a glimpse of popular terms in memos. These popular terms point what customers want from us to fix or to provide in our service.

Figure-1

Figure-2

Additionally, we can talk about the “sparsity” and “stopwords” in the corpus. Sparsity is a measurement for how much unique term occurs in the corpus. We can remove the sparse terms with “removeSparseTerms()” command. Stopwords are words that are meaningless in a language when they are alone such as conjunctions. In terms of text mining, the set of Stopwords change from a language to other languages. That is why we use a language parameter while import the text data into R program.

The other insights were made from the correlation of terms. That means which word was accompanied by which word.

findAssocs(memos_corpus, c(“KonuşarakÖğren”), corlimit=0.85)

The command above shows all words that were accompanied by the word “KonuşarakÖğren” in a ratio (correlation ratio) %85. When we looked at the accompanying words, we could interpret about positive meaning words or negative meaning words are accompanying to the brand name. Also, we could comment on our teachers’ performances in the eye of our customers. We had other insights in terms of product development.

In conclusion, after this text mining analysis;

We did some product developments that decrease memo numbers %20.

We created a summary and a review of memos besides the responsible people for memos. This provided a better understanding of the system for all stakeholders.

We understood our customer better and improved our customer satisfaction.

We changed some UI designs. We planned to prepare a video guide for the system. The customers learned better how the system works and they send fewer memos. Besides that, the help was more available to the customer than before.

--

--