	{"id":60588,"date":"2021-03-03T09:46:08","date_gmt":"2021-03-03T09:46:08","guid":{"rendered":"https:\/\/www.artefact.com\/?post_type=news&#038;p=60588"},"modified":"2024-09-20T17:45:41","modified_gmt":"2024-09-20T16:45:41","slug":"using-nlp-to-extract-quick-and-valuable-insights-from-your-customers-reviews","status":"publish","type":"blog","link":"https:\/\/www.artefact.com\/es\/blog\/using-nlp-to-extract-quick-and-valuable-insights-from-your-customers-reviews\/","title":{"rendered":"Utilizar la PNL para extraer informaci\u00f3n r\u00e1pida y valiosa de las rese\u00f1as de sus clientes"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling article-author\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_2 1_2 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:50%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:50%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Author<\/h2><\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27150%27%20height%3D%270%27%20viewBox%3D%270%200%20150%200%27%3E%3Crect%20width%3D%27150%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/Louise-Morin-300x300.jpeg\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left article-author-image\" style=\"width: 150px; border-radius: 54% 46% 77% 23% \/ 74% 40% 60% 26%; overflow: hidden;\" width=\"150\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-three article-author-name-title\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Louise Morin<\/h3><\/div><div class=\"fusion-text fusion-text-1 article-author-description\"><p>Senior Data Scientist<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-2 description\"><p><strong>TL;DR<\/strong><br \/>\nUnderstanding customers\u2019 feedback and knowing what your strengths and weaknesses are is key to any business. Nowadays, companies have access to a lot of information that could give them those insights: website reviews, chat interactions, conversations transcripts, social media comments\u2026<br \/>\nThis article explains how you can quickly extract insights from textual data, leveraging consumers\u2019 reviews as an example. We will present 3 different approaches:<\/p>\n<li>unsupervised data exploration\n<li>sentiment analysis with features importance\n<li>analyzing correlation between ratings and<br \/>\npredefined business themes<\/p>\n<p>(topic modeling could be a fourth option to go further)<\/p>\n<p><em>Please note the data behind this article was artificially generated to ensure confidentiality of our initial project.<\/em><\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><img decoding=\"async\" width=\"300\" height=\"74\" title=\"Medium Blog\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-300x74.png\" alt class=\"lazyload img-responsive wp-image-60582\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%274000%27%20height%3D%27992%27%20viewBox%3D%270%200%204000%20992%27%3E%3Crect%20width%3D%274000%27%20height%3D%27992%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-200x50.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-400x99.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-600x149.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-800x198.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-1200x298.png 1200w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 300px\" \/><\/span><\/div><\/div><\/div><\/div><\/div><article class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Customer Reviews Analysis<\/h2><\/div><div class=\"fusion-text fusion-text-3\"><p>We are trying to find insights from our products reviews in order to understand what are their main issues \/ main strengths. Products are camera devices and accessories, rated from 1 (bad) to 5 (excellent).<\/p>\n<\/div><div class=\"fusion-text fusion-text-4\"><p>We will be using three different approaches here, to gather insights from our data.<\/p>\n<\/div><div class=\"fusion-text fusion-text-5\"><p><img decoding=\"async\" class=\"lazyload aligncenter wp-image-60591 size-full\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-1.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-1.png\" alt=\"\" width=\"695\" height=\"382\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27695%27%20height%3D%27382%27%20viewBox%3D%270%200%20695%20382%27%3E%3Crect%20width%3D%27695%27%20height%3D%27382%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-1-200x110.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-1-300x165.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-1-400x220.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-1-600x330.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-1.png 695w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 695px) 100vw, 695px\" \/><\/p>\n<\/div><div class=\"fusion-text fusion-text-6\"><p>The point is to have complementary views:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-1 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Data mining or sentiment analysis is more exploratory: it will find out what matters the most, what could be the main reasons driving a review to be positive or negative.<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Themes impact is used to associate scores distribution to already defined business concepts (zoom, battery, \u2026).<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Get a global look at the data you have collected<\/h2><\/div><div class=\"fusion-text fusion-text-7\"><p>Whenever you\u2019re starting a new data project, the first step is always to get the global picture on the data you have (is it imbalanced? is there enough data? are there lot of missing values?).<\/p>\n<\/div><div class=\"fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">How many reviews do I have for each product category?<\/h2><\/div><div class=\"fusion-text fusion-text-8\"><p><img decoding=\"async\" class=\"lazyload aligncenter wp-image-60601 size-full\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-2.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-2.png\" alt=\"\" width=\"491\" height=\"209\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27491%27%20height%3D%27209%27%20viewBox%3D%270%200%20491%20209%27%3E%3Crect%20width%3D%27491%27%20height%3D%27209%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-2-200x85.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-2-300x128.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-2-400x170.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-2.png 491w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 491px) 100vw, 491px\" \/><\/p>\n<p style=\"text-align: center;\">Number of reviews per product category<\/p>\n<\/div><div class=\"fusion-text fusion-text-9\"><p>\u2192 The fact that there are not as many Tripod reviews should be kept in mind if we analyze reviews for this specific category of product. The more data we have, the better, in order to have unbiased and relevant conclusions.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">How many reviews do I have for each rating?<\/h2><\/div><div class=\"fusion-text fusion-text-10\"><p><img decoding=\"async\" class=\"lazyload aligncenter wp-image-60600 size-full\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-3.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-3.png\" alt=\"\" width=\"700\" height=\"235\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27235%27%20viewBox%3D%270%200%20700%20235%27%3E%3Crect%20width%3D%27700%27%20height%3D%27235%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-3-200x67.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-3-300x101.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-3-400x134.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-3-600x201.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-3.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<p style=\"text-align: center;\">Number of reviews per score<\/p>\n<\/div><div class=\"fusion-text fusion-text-11\"><p>\u2192 This is important. We see that our dataset is quite imbalanced, we have a lot more positive reviews than negative reviews. This kind of information needs to be taken into account when training dedicated models (ex: a classification model for sentiment analysis).<\/p>\n<\/div><div class=\"fusion-title title fusion-title-7 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">What\u2019s the rating distribution of each category?<\/h2><\/div><div class=\"fusion-text fusion-text-12\"><p><img decoding=\"async\" class=\"lazyload wp-image-60599 size-full aligncenter\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-4.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-4.png\" alt=\"\" width=\"537\" height=\"408\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27537%27%20height%3D%27408%27%20viewBox%3D%270%200%20537%20408%27%3E%3Crect%20width%3D%27537%27%20height%3D%27408%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-4-200x152.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-4-300x228.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-4-400x304.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-4.png 537w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 537px) 100vw, 537px\" \/><\/p>\n<p style=\"text-align: center;\">Average rating &amp; distribution of each product category<\/p>\n<\/div><div class=\"fusion-text fusion-text-13\"><p>We can see here that Lenses have the highest average rating, while there are a lot of negative reviews (especially with a score of 1) for Drones and Aerial Imaging.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-8 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Using NLP to understand your customers\u2019 concerns<\/h2><\/div><div class=\"fusion-text fusion-text-14\"><p>Now, to understand what the reviews are about, we will implement the different NLP approaches mentioned previously.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-9 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Data cleaning<\/h2><\/div><div class=\"fusion-text fusion-text-15\"><p>Before doing anything else, we need to clean the text data, to make it usable by the different NLP methods (this step is not always required, depending on the algorithms you want to use).<\/p>\n<\/div><div class=\"fusion-text fusion-text-16\"><p>We applied standard pre-processing functions that were relevant to our data (removing HTML, punctuation, phone numbers, \u2026), and we implemented a custom list of stop words that we remove from reviews (for instance the word \u201ccamera\u201d does not bring that much information to our analysis).<\/p>\n<\/div><div class=\"fusion-text fusion-text-17\"><p>You can find a lot of these functions in our<a href=\"https:\/\/github.com\/artefactory\/NLPretext\" target=\"_blank\" rel=\"noopener noreferrer\"> NLPretext<\/a> Github repository.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-10 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Mining insights in a few lines of code<\/h2><\/div><div class=\"fusion-text fusion-text-18\"><p>Now that we have for each review:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-2 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>A product category<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>The review original text<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>The review cleaned text<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>The review cleaned text split into tokens<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>The product rating<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-19\"><p>We can start by simply looking at our most frequent words (single words, bi-grams, tri-grams\u2026). It\u2019s a simple analysis, but it gives you an immediate vision of what the main topics are for each score and category.<\/p>\n<\/div><div class=\"fusion-text fusion-text-20\"><div class=\"code\">\n<p>from collections import Counter<br \/>\nimport matplotlib.pyplot as plt<br \/>\nimport wordcloud<\/p>\n<p>plt.rcParams[&#8220;figure.figsize&#8221;] = [16, 9]<\/p>\n<p>def create_ngrams(token_list, nb_elements):<br \/>\n&#8220;&#8221;&#8221;<br \/>\nCreate n-grams for list of tokens<\/p>\n<p>Parameters<br \/>\n&#8212;&#8212;&#8212;-<br \/>\ntoken_list : list<br \/>\nlist of strings<br \/>\nnb_elements :<br \/>\nnumber of elements in the n-gram<\/p>\n<p>Returns<br \/>\n&#8212;&#8212;-<br \/>\nGenerator<br \/>\ngenerator of all n-grams<br \/>\n&#8220;&#8221;&#8221;<br \/>\nngrams = zip(*[token_list[index_token:] for index_token in range(nb_elements)])<br \/>\nreturn (&#8221; &#8220;.join(ngram) for ngram in ngrams)<\/p>\n<p>def frequent_words(list_words, ngrams_number=1, number_top_words=10):<br \/>\n&#8220;&#8221;&#8221;<br \/>\nCreate n-grams for list of tokens<\/p>\n<p>Parameters<br \/>\n&#8212;&#8212;&#8212;-<br \/>\nngrams_number : int<br \/>\nnumber_top_words : int<br \/>\noutput dataframe length<\/p>\n<p>Returns<br \/>\n&#8212;&#8212;-<br \/>\nDataFrame<br \/>\nDataframe with the entities and their frequencies.<br \/>\n&#8220;&#8221;&#8221;<br \/>\nfrequent = []\nif ngrams_number == 1:<br \/>\npass<br \/>\nelif ngrams_number &gt;= 2:<br \/>\nlist_words = create_ngrams(list_words, ngrams_number)<br \/>\nelse:<br \/>\nraise ValueError(&#8220;number of n-grams should be &gt;= 1&#8221;)<br \/>\ncounter = Counter(list_words)<br \/>\nfrequent = counter.most_common(number_top_words)<br \/>\nreturn frequent<\/p>\n<p>def make_word_cloud(text_or_counter, stop_words=None):<br \/>\nif isinstance(text_or_counter, str):<br \/>\nword_cloud = wordcloud.WordCloud(stopwords=stop_words).generate(text_or_counter)<br \/>\nelse:<br \/>\nif stop_words is not None:<br \/>\ntext_or_counter = Counter(word for word in text_or_counter if word not in stop_words)<br \/>\nword_cloud = wordcloud.WordCloud(stopwords=stop_words).generate_from_frequencies(text_or_counter)<br \/>\nplt.imshow(word_cloud)<br \/>\nplt.axis(&#8220;off&#8221;)<br \/>\nplt.show()<\/p>\n<\/div>\n<\/div><div class=\"fusion-title title fusion-title-11 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">WordCloud<\/h3><\/div><div class=\"fusion-text fusion-text-21\"><p>Leveraging these functions, we can easily display a Word Cloud of most frequent words, using reviews for Cameras with a score between 1 and 2:<\/p>\n<\/div><div class=\"fusion-text fusion-text-22\"><p><img decoding=\"async\" class=\"lazyload aligncenter wp-image-60598 size-full\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-5.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-5.png\" alt=\"\" width=\"700\" height=\"352\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27352%27%20viewBox%3D%270%200%20700%20352%27%3E%3Crect%20width%3D%27700%27%20height%3D%27352%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-5-200x101.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-5-300x151.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-5-400x201.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-5-600x302.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-5.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<\/div><div class=\"fusion-text fusion-text-23\"><p>Then display a similar Word Cloud using reviews for Cameras with a score between 4 and 5 :<\/p>\n<\/div><div class=\"fusion-text fusion-text-24\"><p><img decoding=\"async\" class=\"lazyload aligncenter wp-image-60597 size-full\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-6.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-6.png\" alt=\"\" width=\"700\" height=\"351\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27351%27%20viewBox%3D%270%200%20700%20351%27%3E%3Crect%20width%3D%27700%27%20height%3D%27351%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-6-200x100.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-6-300x150.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-6-400x201.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-6-600x301.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-6.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<\/div><div class=\"fusion-text fusion-text-25\"><p>We can easily identify the main points brought up in both cases.<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-3 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>For reviews with low scores, we have a lot of mentions about the battery, the device screen, its price or even mentions of a real bug encountered.<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>For reviews with high scores, we see that the photo quality, and the functionalities or design are being brought up often<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-26\"><p>We could do this exercise for each product our company has, in order to see the specificity of each and be able to draw conclusions at a more granular level.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-12 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">N-grams Count<\/h3><\/div><div class=\"fusion-text fusion-text-27\"><p>We can also use the <em>frequent_words<\/em> function to display the most frequent words, bi-grams or tri-grams:<\/p>\n<\/div><div class=\"fusion-text fusion-text-28\"><p><img decoding=\"async\" class=\"lazyload aligncenter wp-image-60596 size-full\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-7.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-7.png\" alt=\"\" width=\"700\" height=\"471\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27471%27%20viewBox%3D%270%200%20700%20471%27%3E%3Crect%20width%3D%27700%27%20height%3D%27471%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-7-200x135.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-7-300x202.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-7-400x269.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-7-600x404.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-7.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<\/div><div class=\"fusion-text fusion-text-29\"><p>To go further, you could then put in place a function displaying the reviews associated with a keyword, in order to zoom in on n-grams you find interesting. You could also look at n-grams with the highest \/ lowest <a href=\"https:\/\/monkeylearn.com\/blog\/what-is-tf-idf\/\" target=\"_blank\" rel=\"noopener\">TF-IDF<\/a> (easy to compute with the <em>sklearn<\/em> library), since it allows you to see important words based on a different metric than a simple frequency counter.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-13 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Sentiment Analysis<\/h2><\/div><div class=\"fusion-text fusion-text-30\"><p>Next, we move on to a sentiment analysis approach. Usually, it is used to predict if a text is positive or negative. In our case, we already have this information (the score between 1 and 5 gives us the sentiment behind the review). But training a model to predict this rating will help us find which words (features) are key for customers.<\/p>\n<\/div><div class=\"fusion-text fusion-text-31\"><p>What we can do is to <strong>train a sentiment analysis classifier on this data<\/strong>, and then use libraries like SHAP or LIME to understand <strong>which features<\/strong> (= words) <strong>have the most impact<\/strong> on a review being classified as positive or negative.<\/p>\n<\/div><div class=\"fusion-text fusion-text-32\"><p><img decoding=\"async\" class=\"lazyload aligncenter wp-image-60595 size-full\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-8.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-8.png\" alt=\"\" width=\"637\" height=\"295\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27637%27%20height%3D%27295%27%20viewBox%3D%270%200%20637%20295%27%3E%3Crect%20width%3D%27637%27%20height%3D%27295%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-8-200x93.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-8-300x139.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-8-400x185.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-8-600x278.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-8.png 637w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 637px) 100vw, 637px\" \/><\/p>\n<\/div><div class=\"fusion-title title fusion-title-14 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Classifier<\/h3><\/div><div class=\"fusion-text fusion-text-33\"><p>To train a classifier, you have a lot of possible algorithms you can use, ranging from the classic sklearn LogisticRegression, to ULM-fit models (<em>see <a href=\"https:\/\/github.com\/piegu\/language-models\/blob\/master\/lm3-french-classifier-amazon.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">this notebook<\/a> to train a French ULM-fit model, and <a href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/how-to-train-a-language-model-from-scratch-without-any-linguistic-knowledge-11acaa933e84\" target=\"_blank\" rel=\"noopener noreferrer\">this article<\/a> to understand more about ULM-fit<\/em>) or the Ludwig classifier developed by Uber.<\/p>\n<\/div><div class=\"fusion-text fusion-text-34\"><p><em>You might want to start with a simple one first, to see if it already answers your needs, before putting in place more complex algorithms.<\/em><\/p>\n<\/div><div class=\"fusion-text fusion-text-35\"><p id=\"34d0\" class=\"hz ia fp ib b ic id ie if ig ih ii ij ik il im in io ip iq ir is it iu iv iw fh bx\" data-selectable-paragraph=\"\">Make sure to take into consideration the fact that your dataset is probably imbalanced (more positive than negative reviews, in our case).<\/p>\n<\/div><div class=\"fusion-title title fusion-title-15 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Feature importance<\/h3><\/div><div class=\"fusion-text fusion-text-36\"><p>Once your classifier is implemented, you can move on to the most important step: getting insights from features importance.<\/p>\n<\/div><div class=\"fusion-text fusion-text-37\"><p>In the following example we apply SHAP on our model (here, a simple sklearn LogisticRegression):<\/p>\n<\/div><div class=\"fusion-text fusion-text-38\"><p><img decoding=\"async\" class=\"lazyload aligncenter wp-image-60594 size-full\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-9.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-9.png\" alt=\"\" width=\"690\" height=\"124\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27690%27%20height%3D%27124%27%20viewBox%3D%270%200%20690%20124%27%3E%3Crect%20width%3D%27690%27%20height%3D%27124%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-9-200x36.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-9-300x54.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-9-400x72.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-9-600x108.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-9.png 690w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 690px) 100vw, 690px\" \/><\/p>\n<p><img decoding=\"async\" class=\"lazyload aligncenter wp-image-60593 size-full\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-10.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-10.png\" alt=\"\" width=\"631\" height=\"550\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27631%27%20height%3D%27550%27%20viewBox%3D%270%200%20631%20550%27%3E%3Crect%20width%3D%27631%27%20height%3D%27550%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-10-200x174.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-10-300x261.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-10-400x349.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-10-600x523.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-10.png 631w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 631px) 100vw, 631px\" \/><\/p>\n<\/div><div class=\"fusion-text fusion-text-39\"><p>We can see here that the functionalities, photo quality, and zoom features have a really positive impact on our clients\u2019 satisfaction, while the flash, memory card or batteries tend to have a really negative impact when mentioned in a review.<\/p>\n<\/div><div class=\"fusion-text fusion-text-40\"><p><em>Words like \u201cexcellent\u201d, \u201cperfect\u201d or \u201cbad\u201d were removed from this analysis (before training the classifier), because they will be considered as the most important features, while in our case we want to focus on finding insights about our products, not really improve our classifier performance.<\/em><\/p>\n<\/div><div class=\"fusion-text fusion-text-41\"><p>See <a href=\"https:\/\/slundberg.github.io\/shap\/notebooks\/linear_explainer\/Sentiment%20Analysis%20with%20Logistic%20Regression.html\" target=\"_blank\" rel=\"noopener noreferrer\">this notebook<\/a> for an example on how to use SHAP, with a public dataset.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-16 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Business themes impact<\/h2><\/div><div class=\"fusion-text fusion-text-42\"><p>Our third approach was kind of different from the previous ones, as it starts from business-related themes chosen by someone knowledgeable when it comes to the products.<\/p>\n<\/div><div class=\"fusion-text fusion-text-43\"><p>The point is to analyse how predefined business themes impact products ratings, to understand if they are a source of strength or an issue to solve.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-17 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Determining themes<\/h3><\/div><div class=\"fusion-text fusion-text-44\"><p>The first step is to classify the reviews into the thematic categories. Either by labelling your dataset manually (then you could train a classifier if you want to automatically classify new review into themes), or with a rule-based model.<\/p>\n<\/div><div class=\"fusion-text fusion-text-45\"><p>In our case we used a rule-based model because it can already bring up good results at low cost (e.g: if you\u2019re curious about your lenses quality or your after-sales services, it can be simple to establish rules that will determine if a review mention those or not).<\/p>\n<\/div><div class=\"fusion-title title fusion-title-18 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Theme impact<\/h3><\/div><div class=\"fusion-text fusion-text-46\"><p>In a second step you can compute your global average score, then the average score of reviews talking about a specific theme.<\/p>\n<\/div><div class=\"fusion-text fusion-text-47\"><p>By subtracting both scores, you can deduce the impact your theme has on your global score.<\/p>\n<\/div><div class=\"fusion-text fusion-text-48\"><p><img decoding=\"async\" class=\"lazyload aligncenter wp-image-60592 size-full\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-11.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-11.png\" alt=\"\" width=\"700\" height=\"337\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27337%27%20viewBox%3D%270%200%20700%20337%27%3E%3Crect%20width%3D%27700%27%20height%3D%27337%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-11-200x96.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-11-300x144.png 300w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-11-400x193.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-11-600x289.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/05\/NLP-11.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<\/div><div class=\"fusion-text fusion-text-49\"><p>We should here worry about our after-sales service because it is often mentioned in a negative way (though it could also be because people contacting the after-sales service often had an issue in the first place. Which is why, you should then look into detail at the reviews mentioning this theme, to really understand why it was brought up).<\/p>\n<\/div><div class=\"fusion-text fusion-text-50\"><p>\u2192 Here again, business knowledge is essential to make sense of your results.<\/p>\n<\/div><div class=\"fusion-text fusion-text-51\"><p>On the other hand, when our designs or lenses are mentioned, it\u2019s often linked to a review with a high score, which could mean it\u2019s one of our strengths.<\/p>\n<\/div><div class=\"fusion-text fusion-text-52\"><p>See <a href=\"https:\/\/getthematic.com\/insights\/visualizing-customer-feedback-word-clouds\/\" target=\"_blank\" rel=\"noopener noreferrer\">this article<\/a> for more alternative visualisations to Wordcloud.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-19 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">To go further<\/h2><\/div><div class=\"fusion-text fusion-text-53\"><p>We could go further and try to detect topics in our reviews: you could use the Top2Vec library to extract topics and see the correlation between topics and scores (any topic modeling library will work, but <a href=\"https:\/\/github.com\/ddangelov\/Top2Vec\" target=\"_blank\" rel=\"noopener noreferrer\">Top2Vec<\/a> has the advantage of giving great results while not requiring any preprocessing, nor a pre-defined number of topics).<\/p>\n<\/div><div class=\"fusion-text fusion-text-54\"><p>This article showed how to gain customer insights from your textual data by using a pragmatic and simple analysis. Thanks a lot for reading up to now and don\u2019t hesitate to reach out if you have any comment on the topic! You can visit our blog <a href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a> to learn more about our machine learning projects.<\/p>\n<\/div><\/div><\/div><\/div><\/article><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-55\"><p>You can find more about us and our projects on our Medium blog<\/p>\n<\/div><div ><a class=\"fusion-button button-flat fusion-button-default-size button-default fusion-button-default button-1 fusion-button-default-span fusion-button-default-type button-primary-medium\" target=\"_self\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/customer-reviews-use-nlp-to-gain-insights-from-your-data-4629519b518e\" rel=\"noopener\"><span class=\"fusion-button-text awb-button__text awb-button__text--default\">View Articles<\/span><\/a><\/div><\/div><\/div><\/div><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>3 de marzo de 2021<br \/>\nTodo el mundo habla de BERT, GPT-3, XLNet\u2026 pero, \u00bfsab\u00eda que con unos sencillos pasos de preprocesamiento b\u00e1sicos de PLN ya puede extraer informaci\u00f3n valiosa de su data?<\/p>","protected":false},"featured_media":60590,"parent":0,"template":"","meta":{"_acf_changed":false,"ep_exclude_from_search":false},"blog-category":[22035],"blog-language":[2991],"class_list":["post-60588","blog","type-blog","status-publish","has-post-thumbnail","hentry","blog-category-data-ai-consulting","blog-language-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/blog\/60588","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/types\/blog"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/media\/60590"}],"wp:attachment":[{"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/media?parent=60588"}],"wp:term":[{"taxonomy":"blog-category","embeddable":true,"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/blog-category?post=60588"},{"taxonomy":"blog-language","embeddable":true,"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/blog-language?post=60588"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}