	{"id":66199,"date":"2022-03-14T13:56:17","date_gmt":"2022-03-14T13:56:17","guid":{"rendered":"https:\/\/www.artefact.com\/?post_type=news&#038;p=66199"},"modified":"2024-09-20T17:45:48","modified_gmt":"2024-09-20T16:45:48","slug":"string-filters-in-pandas-youre-doing-it-wrong","status":"publish","type":"blog","link":"https:\/\/www.artefact.com\/fr\/blog\/string-filters-in-pandas-youre-doing-it-wrong\/","title":{"rendered":"Filtres de cha\u00eenes de caract\u00e8res dans pandas : vous vous y prenez mal"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling article-author\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_2 1_2 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:50%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:50%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Author<\/h2><\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27150%27%20height%3D%270%27%20viewBox%3D%270%200%20150%200%27%3E%3Crect%20width%3D%27150%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/mae\u0308l-deschamps.jpeg\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left article-author-image\" style=\"width: 150px; border-radius: 54% 46% 77% 23% \/ 74% 40% 60% 26%; overflow: hidden;\" width=\"150\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-three article-author-name-title\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Mael Deschamps<\/h3><\/div><div class=\"fusion-text fusion-text-1 article-author-description\" style=\"--awb-text-transform:none;\"><p>Senior Machine Learning Engineer at Artefact<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center fusion-column-inner-bg-wrapper\" style=\"--awb-padding-top:20px;--awb-padding-right:20px;--awb-padding-bottom:20px;--awb-padding-left:20px;--awb-overflow:hidden;--awb-inner-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-top:1px;--awb-border-right:1px;--awb-border-bottom:1px;--awb-border-left:1px;--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-inner-bg-border-radius:4px 4px 4px 4px;--awb-inner-bg-overflow:hidden;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><span class=\"fusion-column-inner-bg hover-type-none\"><a class=\"fusion-column-anchor\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/string-filters-in-pandas-youre-doing-it-wrong-f9ce575c13e4\" rel=\"noopener noreferrer\" target=\"_blank\"><span class=\"fusion-column-inner-bg-image\"><\/span><\/a><\/span><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-row fusion-flex-align-items-center\"><div class=\"fusion-text fusion-text-2\"><p><u>Read our article on<\/u><\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><img decoding=\"async\" width=\"4000\" height=\"992\" title=\"Medium Blog\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" alt class=\"lazyload img-responsive wp-image-60582\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%274000%27%20height%3D%27992%27%20viewBox%3D%270%200%204000%20992%27%3E%3Crect%20width%3D%274000%27%20height%3D%27992%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-200x50.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-400x99.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-600x149.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-800x198.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-1200x298.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png 4000w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 4000px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-3\"><p>.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-4 description\" style=\"--awb-text-transform:none;\"><p>Filtering data using ID == \u2018string\u2019 in Pandas is something you should avoid as the scalar_compare operator leads to performance bottlenecks. There is many way to bypass it, for example by partitioning your Dataframe into a dictionary using the ID in question as a key.<\/p>\n<\/div><\/div><\/div><\/div><\/div><article class=\"fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Introduction | String filters in pandas<\/h2><\/div><div class=\"fusion-text fusion-text-5\" style=\"--awb-text-transform:none;\"><p>The simplification of hardware management, brought by cloud solutions, pushes us more and more to turn away from the problems of code optimisation.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img decoding=\"async\" width=\"1400\" height=\"561\" alt=\"The simplification of hardware management, brought by cloud solutions, pushes us more and more to turn away from the problems of code optimisation.\" title=\"article-mae\u0308l1\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l1.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l1.png\" class=\"lazyload img-responsive wp-image-66201\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271400%27%20height%3D%27561%27%20viewBox%3D%270%200%201400%20561%27%3E%3Crect%20width%3D%271400%27%20height%3D%27561%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l1-200x80.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l1-400x160.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l1-600x240.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l1-800x321.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l1-1200x481.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l1.png 1400w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1400px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-6\" style=\"--awb-text-transform:none;\"><p>But scaling up and\u00a0<strong>increasing computing power is not always the solution <\/strong>as it leads to an escalation of costs and computing power is not infinite.<\/p>\n<p>By giving myself the challenge to fit all my data preparation on a simple container, I quickly realised that with the knowledge of a few tricks,\u00a0<strong>optimising your code can sometimes be just as simple as instancing a bigger instance.<\/strong><\/p>\n<p>In this article I\u2019d like to come back to a single change of code that allowed me to reduce drastically the time spent on the features calculation during the development phase of a propensity model. This change is common enough to be applied to many other situations.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-3 hover-type-none\"><img decoding=\"async\" width=\"1400\" height=\"443\" title=\"article-mae\u0308l2\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l2.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l2.png\" alt class=\"lazyload img-responsive wp-image-66202\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271400%27%20height%3D%27443%27%20viewBox%3D%270%200%201400%20443%27%3E%3Crect%20width%3D%271400%27%20height%3D%27443%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l2-200x63.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l2-400x127.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l2-600x190.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l2-800x253.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l2-1200x380.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l2.png 1400w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1400px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-7\" style=\"--awb-text-transform:none;\"><p>It doesn\u2019t aim to be the most optimal solution, but seek to be a quick option to decrease the computation time in an efficient way, in the spirit of the\u00a0<strong>Pareto principle.<\/strong><\/p>\n<\/div><div class=\"fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Context<\/h2><\/div><div class=\"fusion-text fusion-text-8\" style=\"--awb-text-transform:none;\"><p>During this mission, I was in charge of automating the data preparation process and the prediction of models developed by our Data Scientist team. To simplify the data-flow, transactional data was uploaded everyday and had to go through a first\u00a0<strong>Preprocessing<\/strong>\u00a0step, followed by a second step of\u00a0<strong>Feature<\/strong> <strong>Computation<\/strong>\u00a0before reaching the last step of\u00a0<strong>Model Prediction<\/strong>\u00a0using the trained model.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-4 hover-type-none\"><img decoding=\"async\" width=\"1400\" height=\"243\" alt=\"The Feature computation phase is the one that took the longest time to execute: indeed many features were computed at the customer level\" title=\"article-mae\u0308l3\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l3.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l3.png\" class=\"lazyload img-responsive wp-image-66203\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271400%27%20height%3D%27243%27%20viewBox%3D%270%200%201400%20243%27%3E%3Crect%20width%3D%271400%27%20height%3D%27243%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l3-200x35.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l3-400x69.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l3-600x104.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l3-800x139.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l3-1200x208.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l3.png 1400w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1400px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-9\" style=\"--awb-text-transform:none;\"><p>The <strong>Feature computation<\/strong>\u00a0phase is the one that took the longest time to execute: indeed many features were computed at the customer level, which resulted in the recurrent execution of a surprisingly time-consuming line in the code :<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-5 hover-type-none\"><img decoding=\"async\" width=\"1400\" height=\"189\" alt=\"which resulted in the recurrent execution of a surprisingly time-consuming line in the code :\" title=\"article-mae\u0308l4\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l4.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l4.png\" class=\"lazyload img-responsive wp-image-66204\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271400%27%20height%3D%27189%27%20viewBox%3D%270%200%201400%20189%27%3E%3Crect%20width%3D%271400%27%20height%3D%27189%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l4-200x27.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l4-400x54.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l4-600x81.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l4-800x108.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l4-1200x162.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l4.png 1400w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1400px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-10\" style=\"--awb-text-transform:none;\"><p>This single line was timed 18ms, which meant that with my 33.717 clients to evaluate daily, I was spending around 10 min of raw computation time per customer level features, reduced to 1 min and 16 second per feature thanks to the parallelisation of the operation on my 8 available CPUs.<\/p>\n<p>As we were working with B2B data, it was necessary to compute features at the customer level, as one customer was representing in fact one business with sometimes multiple orders per day.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Experimentation | String filters in pandas<\/h2><\/div><div class=\"fusion-text fusion-text-11\" style=\"--awb-text-transform:none;\"><p>After investigating a little using the\u00a0<strong>%%prun<\/strong>\u00a0magic command, I was able to identify the source of this processing bottleneck : the\u00a0<strong>pandas._libs.ops.scalar_compare<\/strong>\u00a0operator which was under-optimised in my version of Pandas (1.3.1).<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-6 hover-type-none\"><img decoding=\"async\" width=\"1400\" height=\"183\" title=\"article-mae\u0308l5\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l5.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l5.png\" alt class=\"lazyload img-responsive wp-image-66205\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271400%27%20height%3D%27183%27%20viewBox%3D%270%200%201400%20183%27%3E%3Crect%20width%3D%271400%27%20height%3D%27183%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l5-200x26.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l5-400x52.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l5-600x78.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l5-800x105.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l5-1200x157.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l5.png 1400w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1400px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-12\" style=\"--awb-text-transform:none;\"><p>By simply replacing this \u201c<strong>==<\/strong>\u201d operator by \u201c<strong>isin<\/strong>\u201d, which is not very intuitive as I was comparing a single string, I already divided the computing time by 2.5 times, going from 18ms by operation to 7.95ms.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-7 hover-type-none\"><img decoding=\"async\" width=\"1400\" height=\"189\" title=\"article-mae\u0308l6\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l6.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l6.png\" alt class=\"lazyload img-responsive wp-image-66206\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271400%27%20height%3D%27189%27%20viewBox%3D%270%200%201400%20189%27%3E%3Crect%20width%3D%271400%27%20height%3D%27189%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l6-200x27.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l6-400x54.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l6-600x81.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l6-800x108.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l6-1200x162.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l6.png 1400w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1400px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-13\" style=\"--awb-text-transform:none;\"><p>Still looking for optimisation, I came across a\u00a0<a class=\"au ja\" href=\"https:\/\/stackoverflow.com\/questions\/14737566\/pandas-performance-issue-need-help-to-optimize\" target=\"_blank\" rel=\"noopener ugc nofollow\">Stackoverflow<\/a>\u00a0post pushing the use of\u00a0<a class=\"au ja\" href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/user_guide\/categorical.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">Categorical type<\/a>\u00a0to improve further the operation.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-8 hover-type-none\"><img decoding=\"async\" width=\"1400\" height=\"259\" title=\"article-mae\u0308l7\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l7.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l7.png\" alt class=\"lazyload img-responsive wp-image-66207\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271400%27%20height%3D%27259%27%20viewBox%3D%270%200%201400%20259%27%3E%3Crect%20width%3D%271400%27%20height%3D%27259%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l7-200x37.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l7-400x74.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l7-600x111.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l7-800x148.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l7-1200x222.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l7.png 1400w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1400px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-14\" style=\"--awb-text-transform:none;\"><p>This last implementation allowed me to divide the computing time more than\u00a0<strong>36 times<\/strong>. I could however observe a nuance to this trick, as\u00a0<strong>the category type does not behave like a classic str during all operations<\/strong>\u00a0(see example bellow when using .groupby() in pandas) so I had to convert it back to str at one point.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-9 hover-type-none\"><img decoding=\"async\" width=\"1332\" height=\"1082\" title=\"article-mae\u0308l8\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l8.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l8.png\" alt class=\"lazyload img-responsive wp-image-66208\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271332%27%20height%3D%271082%27%20viewBox%3D%270%200%201332%201082%27%3E%3Crect%20width%3D%271332%27%20height%3D%271082%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l8-200x162.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l8-400x325.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l8-600x487.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l8-800x650.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l8-1200x975.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l8.png 1332w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1332px\" \/><\/span><\/div><div class=\"fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Hypothesis<\/h2><\/div><div class=\"fusion-text fusion-text-15\" style=\"--awb-text-transform:none;\"><p>But why is that ? How can a simple\u00a0<strong class=\"kb hp\">==<\/strong>\u00a0operation between two string takes more time than an\u00a0<strong>isin()<\/strong>\u00a0operation comparing lists, or a\u00a0<strong>categorical<\/strong>\u00a0one ?<br \/>\nWell, to answer the first question we would definitely need to dig out the code behind the execution of\u00a0<a class=\"au ja\" href=\"https:\/\/github.com\/pandas-dev\/pandas\/blob\/1.3.x\/pandas\/_libs\/ops.pyx\" target=\"_blank\" rel=\"noopener ugc nofollow\">scalar_compare<\/a>\u00a0which uses mostly Cython and compare it to the code behind the\u00a0<a class=\"au ja\" href=\"https:\/\/github.com\/numpy\/numpy\/blob\/v1.22.0\/numpy\/lib\/arraysetops.py#L640-L736\" target=\"_blank\" rel=\"noopener ugc nofollow\">isin()<\/a>\u00a0method.<\/p>\n<\/div><div class=\"fusion-text fusion-text-16\" style=\"--awb-text-transform:none;\"><p>Fortunately, the answer to the second part seems to be more intuitive: when comparing two string values, we are comparing an\u00a0<strong>infinite number<\/strong>\u00a0of possibilities together, whereas when comparing two categories, the\u00a0<strong>number of options is set<\/strong>\u00a0by the different unique categories that exist. It seems much\u00a0<strong>easier to compare two entities when our number of options is fixed<\/strong>.<\/p>\n<\/div><div class=\"fusion-text fusion-text-17\" style=\"--awb-text-transform:none;\"><p>My thirst for optimisation was still not quenched, I decided to take a step back from the current method. I came up with a new approach:\u00a0<strong>partitioning my Dataframe into a dictionary<\/strong>\u00a0that I will then use to filter my customers when computing my features.<br \/>\nIn terms of code, this simply translated into the following lines:<\/p>\n<\/div>customers_list = list(df.ID_Customer.unique())\ndf_dict = {elem: df[df.ID_Customer == elem] for elem in customers_list}<div class=\"fusion-text fusion-text-18\" style=\"--awb-text-transform:none;\"><p>Building that dictionary cost me 32 seconds of computing time, but using this partitioned Dataframe I was now able to filter my data in a few nanoseconds.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-10 hover-type-none\"><img decoding=\"async\" width=\"1400\" height=\"179\" title=\"article-mae\u0308l9\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l9.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l9.png\" alt class=\"lazyload img-responsive wp-image-66209\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271400%27%20height%3D%27179%27%20viewBox%3D%270%200%201400%20179%27%3E%3Crect%20width%3D%271400%27%20height%3D%27179%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l9-200x26.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l9-400x51.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l9-600x77.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l9-800x102.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l9-1200x153.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l9.png 1400w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1400px\" \/><\/span><\/div><div class=\"fusion-title title fusion-title-7 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Conclusion | String filters in pandas<\/h2><\/div><div class=\"fusion-text fusion-text-19\" style=\"--awb-text-transform:none;\"><p>After spending a couple of hours in the experimentation phase, I was happy with the result :<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-11 hover-type-none\"><img decoding=\"async\" width=\"1400\" height=\"257\" alt=\"After spending a couple of hours in the experimentation phase, I was happy with the result :\" title=\"article-mae\u0308l10\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l10.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l10.png\" class=\"lazyload img-responsive wp-image-66210\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271400%27%20height%3D%27257%27%20viewBox%3D%270%200%201400%20257%27%3E%3Crect%20width%3D%271400%27%20height%3D%27257%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l10-200x37.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l10-400x73.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l10-600x110.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l10-800x147.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l10-1200x220.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l10.png 1400w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1400px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-20\" style=\"--awb-text-transform:none;\"><p>The initial computing time per customer filtering was now divided <strong>348 000\u00a0times<\/strong>, going from\u00a0<strong>18ms to 51.7ns<\/strong>, or from\u00a0<strong>10min to 2.65ms<\/strong>\u00a0per feature computed in my case, taking into account the time spend on the partitioning.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-12 hover-type-none\"><img decoding=\"async\" width=\"1400\" height=\"309\" alt=\"Immediately, the impact of this small change allowed me to reduce the calculation time of my complete Feature computation phase by 90%, from 40&#039;49&quot; to 7&#039;27&quot;.\" title=\"article-mae\u0308l11\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l11.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l11.png\" class=\"lazyload img-responsive wp-image-66211\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271400%27%20height%3D%27309%27%20viewBox%3D%270%200%201400%20309%27%3E%3Crect%20width%3D%271400%27%20height%3D%27309%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l11-200x44.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l11-400x88.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l11-600x132.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l11-800x177.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l11-1200x265.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/03\/article-mae\u0308l11.png 1400w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1400px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-21\" style=\"--awb-text-transform:none;\"><p>Immediately, the impact of this small change allowed me to\u00a0<strong>reduce the calculation time of my complete Feature computation phase by 90%<\/strong>, from 40&#8217;49&#8221; to 7&#8217;27&#8221;. Using a CO2eq estimation method that I will detail in my next article, this modification\u00a0<strong>saved at least 170$\/year + 22kgCO2\/year<\/strong>\u00a0and potentially way more with the growing customer list and the roll-out of the project in other countries.<\/p>\n<\/div><\/div><\/div><\/div><\/article><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-5 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center\" style=\"--awb-padding-top:40px;--awb-padding-right:40px;--awb-padding-bottom:40px;--awb-padding-left:40px;--awb-overflow:hidden;--awb-bg-position:left center;--awb-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper lazyload fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-column fusion-column-has-bg-image\" data-bg-url=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\" data-bg=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\"><div class=\"fusion-image-element\" style=\"text-align:center;--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-13 hover-type-none\"><img decoding=\"async\" width=\"72\" height=\"41\" title=\"medium\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%2772%27%20height%3D%2741%27%20viewBox%3D%270%200%2072%2041%27%3E%3Crect%20width%3D%2772%27%20height%3D%2741%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/medium.png\" alt class=\"lazyload img-responsive wp-image-60927\"\/><\/span><\/div><div class=\"fusion-title title fusion-title-8 fusion-sep-none fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-margin-top:20px;--awb-margin-bottom:0px;--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-center fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Medium Blog by <a href=\"https:\/\/www.artefact.com\/\">Artefact<\/a>.<\/h3><\/div><div class=\"fusion-text fusion-text-22\" style=\"--awb-content-alignment:center;\"><p>This article was initially published on <strong>Medium.com<\/strong>.<br \/>\nFollow us on our Medium Blog !<\/p>\n<\/div><div style=\"text-align:center;\"><a class=\"fusion-button button-flat button-medium button-default fusion-button-default button-1 fusion-button-default-span fusion-button-default-type\" target=\"_blank\" rel=\"noopener noreferrer\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/string-filters-in-pandas-youre-doing-it-wrong-f9ce575c13e4\"><span class=\"fusion-button-text awb-button__text awb-button__text--default\">Read Our Article<\/span><\/a><\/div><\/div><\/div><\/div><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Les filtres de cha\u00eenes dans Pandas sont \u00e0 \u00e9viter car l'op\u00e9rateur scalaire_compare entra\u00eene des goulots d'\u00e9tranglement au niveau des performances. <\/p>","protected":false},"featured_media":68684,"parent":0,"template":"","meta":{"_acf_changed":false,"ep_exclude_from_search":false},"blog-category":[21939],"blog-language":[2991],"class_list":["post-66199","blog","type-blog","status-publish","has-post-thumbnail","hentry","blog-category-medium","blog-language-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/blog\/66199","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/types\/blog"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/media\/68684"}],"wp:attachment":[{"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/media?parent=66199"}],"wp:term":[{"taxonomy":"blog-category","embeddable":true,"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/blog-category?post=66199"},{"taxonomy":"blog-language","embeddable":true,"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/blog-language?post=66199"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}