	{"id":65787,"date":"2022-01-18T09:50:59","date_gmt":"2022-01-18T09:50:59","guid":{"rendered":"https:\/\/www.artefact.com\/?post_type=news&#038;p=65787"},"modified":"2024-09-20T17:45:48","modified_gmt":"2024-09-20T16:45:48","slug":"how-to-quickly-compare-two-datasets-using-a-generic-and-powerful-sql-query","status":"publish","type":"blog","link":"https:\/\/www.artefact.com\/fr\/blog\/how-to-quickly-compare-two-datasets-using-a-generic-and-powerful-sql-query\/","title":{"rendered":"Comment comparer rapidement deux ensembles data \u00e0 l'aide d'une requ\u00eate SQL g\u00e9n\u00e9rique et puissante ?"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling article-author\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_2 1_2 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:50%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:50%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Auteur<\/h2><\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27150%27%20height%3D%270%27%20viewBox%3D%270%200%20150%200%27%3E%3Crect%20width%3D%27150%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/1560114886665.jpeg\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left article-author-image\" style=\"width: 150px; border-radius: 54% 46% 77% 23% \/ 74% 40% 60% 26%; overflow: hidden;\" width=\"150\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-three article-author-name-title\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">K\u00e9vin Lef\u00e8vre<\/h3><\/div><div class=\"fusion-text fusion-text-1 article-author-description\"><p>Data D\u00e9veloppeur \u00e0 Artefact<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center fusion-column-inner-bg-wrapper\" style=\"--awb-padding-top:20px;--awb-padding-right:20px;--awb-padding-bottom:20px;--awb-padding-left:20px;--awb-overflow:hidden;--awb-inner-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-top:1px;--awb-border-right:1px;--awb-border-bottom:1px;--awb-border-left:1px;--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-inner-bg-border-radius:4px 4px 4px 4px;--awb-inner-bg-overflow:hidden;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><span class=\"fusion-column-inner-bg hover-type-none\"><a class=\"fusion-column-anchor\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/scoring-customer-propensity-using-machine-learning-models-on-google-analytics-data-ba1126469c1f\" rel=\"noopener noreferrer\" target=\"_blank\"><span class=\"fusion-column-inner-bg-image\"><\/span><\/a><\/span><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-row fusion-flex-align-items-center\"><div class=\"fusion-text fusion-text-2\"><p><u>Lisez notre article sur<\/u><\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><img decoding=\"async\" width=\"4000\" height=\"992\" title=\"Moyen Blog\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" alt class=\"lazyload img-responsive wp-image-60582\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%274000%27%20height%3D%27992%27%20viewBox%3D%270%200%204000%20992%27%3E%3Crect%20width%3D%274000%27%20height%3D%27992%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-200x50.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-400x99.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-600x149.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-800x198.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-1200x298.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png 4000w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 4000px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-3\"><p>.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-4 description\"><p>Dans le cadre de leurs activit\u00e9s quotidiennes, les analystes Data Engineer et Data sont amen\u00e9s \u00e0 am\u00e9liorer les processus d'ingestion data. Au-del\u00e0 des tests unitaires habituels, il peut \u00eatre int\u00e9ressant de comparer facilement et rapidement deux ensembles data (c'est-\u00e0-dire des tables, des vues, des requ\u00eates, etc.) \u00e0 des fins diff\u00e9rentes, telles que l'analyse d'impact ou les tests de non-r\u00e9gression. De m\u00eame, l'identification des divergences entre deux points de vue d'une table instantan\u00e9e est tr\u00e8s utile pour l'analyse adhoc ou le d\u00e9bogage.<\/p>\n<p>Dans cette optique et pour les praticiens de SQL, un script pr\u00e9construit et r\u00e9utilisable prend tout son sens, d'o\u00f9 l'objet de cet article.<\/p>\n<\/div><\/div><\/div><\/div><\/div><article class=\"fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Explication du mod\u00e8le SQL pr\u00eat \u00e0 l'emploi<\/h2><\/div><div class=\"fusion-text fusion-text-5\"><p>Entrons dans le vif du sujet avec la question attendue :<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27619%27%20height%3D%270%27%20viewBox%3D%270%200%20619%200%27%3E%3Crect%20width%3D%27619%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/How-to-quickly-compare-two-datasets-using-a-generic-powerful-SQL-query-by-Ke\u0301vin-LEFEVRE-Artefact-Engineering-and-Data-Science-Jan-2022-Medium-18-01-2022-10-00-45-1024x789.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 619px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"619\" height=\"auto\" \/><div class=\"fusion-text fusion-text-6\"><p>Expliquons-le maintenant :<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-1 fusion-checklist-default type-icons paddingList dark-text\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Tout d'abord, nous utilisons CTE - Common Table Expressions (WITH ... ASstatements) pour identifier les deux ensembles data \u00e0 comparer. Ainsi, les lignes de requ\u00eate suivantes peuvent rester inchang\u00e9es.<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Ensuite, nous construisons deux requ\u00eates similaires appel\u00e9es dataset_1_only et dataset_2_only afin de conserver les lignes qui ne sont pr\u00e9sentes que dans dataset 1 (par rapport \u00e0 dataset 2) et vice-versa gr\u00e2ce \u00e0 l'op\u00e9rateur EXCEPT set.<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Enfin, un champ compl\u00e9mentaire \"flag\" est d\u00e9fini afin d'identifier la source des lignes non concordantes provenant des deux requ\u00eates. Celles-ci sont ensuite rassembl\u00e9es par l'op\u00e9rateur UNION ALL.<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-7\"><p><em>Remarque : il est \u00e9vident qu'il faut disposer de deux ensembles data avec des champs comparables pour pouvoir utiliser la requ\u00eate ci-dessus.<\/em><\/p>\n<\/div><div class=\"fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Exemple illustr\u00e9<\/h2><\/div><div class=\"fusion-text fusion-text-8\"><p>Finie la th\u00e9orie, place \u00e0 la pratique ! Supposons qu'un supermarch\u00e9 veuille <strong>r\u00e9cup\u00e9rer tous les produits<\/strong> \u00e9ligibles \u00e0 une promotion dans un seul tableau en fonction de la semaine. <strong>conditions de promotion<\/strong>.<\/p>\n<p>Dans un premier temps, nous consid\u00e9rons que les produits ci-dessous sont en stock :<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27587%27%20height%3D%270%27%20viewBox%3D%270%200%20587%200%27%3E%3Crect%20width%3D%27587%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/How-to-quickly-compare-two-datasets-using-a-generic-powerful-SQL-query-by-Ke\u0301vin-LEFEVRE-Artefact-Engineering-and-Data-Science-Jan-2022-Medium-18-01-2022-10-12-30-1024x326.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 587px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"587\" height=\"auto\" \/><div class=\"fusion-text fusion-text-9\"><p>Selon les conditions de promotion de la semaine 1 :<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-2 fusion-checklist-default type-icons paddingList dark-text\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>10% r\u00e9duction sur les \u00e9tag\u00e8res de fruits, l\u00e9gumes et herbes aromatiques<\/em><\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-10\"><p>le dataset_1 est :<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27598%27%20height%3D%270%27%20viewBox%3D%270%200%20598%200%27%3E%3Crect%20width%3D%27598%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/How-to-quickly-compare-two-datasets-using-a-generic-powerful-SQL-query-by-Ke\u0301vin-LEFEVRE-Artefact-Engineering-and-Data-Science-Jan-2022-Medium-18-01-2022-10-15-18-1024x304.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 598px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"598\" height=\"auto\" \/><div class=\"fusion-text fusion-text-11\"><p>Au cours de la deuxi\u00e8me semaine, les conditions de promotion \u00e9voluent :<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-3 fusion-checklist-default type-icons paddingList dark-text\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>10% r\u00e9duction sur les rayons fruits et glaces<\/em><\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>20% r\u00e9duction sur le rayon des l\u00e9gumes<\/em><\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>pas de remise sur le rayon des herbes<\/em><\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-12\"><p>L'\u00e9volution du pipeline data g\u00e9n\u00e8re le dataset_2 :<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27598%27%20height%3D%270%27%20viewBox%3D%270%200%20598%200%27%3E%3Crect%20width%3D%27598%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/How-to-quickly-compare-two-datasets-using-a-generic-powerful-SQL-query-by-Ke\u0301vin-LEFEVRE-Artefact-Engineering-and-Data-Science-Jan-2022-Medium-18-01-2022-10-19-04-1024x302.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 598px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"598\" height=\"auto\" \/><div class=\"fusion-text fusion-text-13\"><p>Mais... sommes-nous vraiment s\u00fbrs des r\u00e9sultats ? Les questions les plus fr\u00e9quentes sont les suivantes :<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-4 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Y a-t-il des lignes manquantes ou suppl\u00e9mentaires ?<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Les \u00e9volutions mises en \u0153uvre sont-elles correctes sur les terrains impact\u00e9s ?<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Y a-t-il une r\u00e9gression dans les champs non touch\u00e9s ?<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-14\"><p><em>Note : m\u00eame s'il semble facile de r\u00e9pondre intuitivement aux questions ci-dessus gr\u00e2ce \u00e0 un exemple trivial sur un petit ensemble data, dans les cas d'utilisation r\u00e9els, nous sommes g\u00e9n\u00e9ralement confront\u00e9s \u00e0 de nombreuses lignes et colonnes provenant de requ\u00eates complexes (transformations, jointures, agr\u00e9gats, fonctions de fen\u00eatres, ...), ce qui donne tout son sens \u00e0 cette requ\u00eate.<\/em><\/p>\n<\/div><div class=\"fusion-text fusion-text-15\"><p>Essayons de r\u00e9pondre aux 3 questions en interpr\u00e9tant les r\u00e9sultats de la requ\u00eate de comparaison :<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27619%27%20height%3D%270%27%20viewBox%3D%270%200%20619%200%27%3E%3Crect%20width%3D%27619%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/How-to-quickly-compare-two-datasets-using-a-generic-powerful-SQL-query-by-Ke\u0301vin-LEFEVRE-Artefact-Engineering-and-Data-Science-Jan-2022-Medium-18-01-2022-10-24-02-1024x336.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 619px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"619\" height=\"auto\" \/><div class=\"fusion-text fusion-text-16\"><p>Observations :<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-5 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Pas de rang\u00e9e avec des produits de l'\u00e9tag\u00e8re des fruits : <strong>amende<\/strong> en tant que \u201c<em>10% r\u00e9duction sur les fruits<\/em>\u201dest rest\u00e9e inchang\u00e9e de la semaine 1 \u00e0 la semaine 2<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Le Product_ID #50 (\u00e9tag\u00e8re \u00e0 herbes) est absent de dataset_2 (seulement dans dataset_1) : <strong>amende<\/strong> en tant que \u201c<em>r\u00e9duction sur les herbes<\/em>\u201da \u00e9t\u00e9 arr\u00eat\u00e9e la semaine 2<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Le Product_ID #60 (\u00e9tag\u00e8re \u00e0 glaces) est manquant dans dataset_1 (seulement dans dataset_2) : <strong>amende<\/strong> en tant que \u201c<em>r\u00e9duction sur les glaces<\/em>\u201da \u00e9t\u00e9 introduite la semaine 2<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Les lignes concernant les Product_ID #30 et #40 (rayon l\u00e9gumes) divergent sur le champ prix_r\u00e9duit : <strong>amende<\/strong> en tant que \u201c<em>r\u00e9duction sur les l\u00e9gumes<\/em>\u201dLe nombre d'heures de travail est pass\u00e9 de 10% la semaine 1 \u00e0 20% la semaine 2.<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-17\"><p>Finalement, tout se pr\u00e9sente bien :<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-6 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Y a-t-il des lignes manquantes ou suppl\u00e9mentaires ?<strong> Non !<\/strong><\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Les \u00e9volutions mises en \u0153uvre sont-elles correctes sur les terrains impact\u00e9s ? <strong>Oui !<\/strong><\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Y a-t-il une r\u00e9gression dans les champs non touch\u00e9s ? <strong>Non !<\/strong><\/p>\n<\/div><\/li><\/ul><div class=\"fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Conseils et astuces<\/h2><\/div><div class=\"fusion-text fusion-text-18\"><p>Dans le CTE, m\u00eame si vous pouvez copier-coller votre requ\u00eate SQL d'ingestion avant\/apr\u00e8s l'\u00e9volution, il peut \u00eatre judicieux de <strong>stocker les r\u00e9sultats dans des tables temporaires<\/strong> pour simplifier la comparaison, am\u00e9liorer les performances de la requ\u00eate et b\u00e9n\u00e9ficier de la mise en cache si vous l'ex\u00e9cutez plusieurs fois.<\/p>\n<p>Dans le monde r\u00e9el, les divergences entre deux ensembles data peuvent \u00eatre d\u00e9sordonn\u00e9es. Dans ce cas, <strong>tri des r\u00e9sultats<\/strong> par cl\u00e9 (la granularit\u00e9 des datasets) et par drapeau peut grandement aider \u00e0 interpr\u00e9ter et \u00e0 comparer des lignes \u00e9quivalentes provenant des deux datasets.<\/p>\n<p>Afin de faciliter les enqu\u00eates et d'identifier l'origine de la lacune, les <strong>suspicieux datasets<\/strong> <strong>champs communs<\/strong> peuvent \u00eatre supprim\u00e9s (c'est-\u00e0-dire comment\u00e9s dans l'ETC) de la comparaison : s'il n'y a pas de r\u00e9sultat, cela signifie que tous les champs compar\u00e9s sont \u00e9gaux. Vous pouvez alors vous concentrer sur les champs suspects restants pour la prochaine comparaison et l'effectuer \u00e9tape par \u00e9tape.<\/p>\n<\/div><div class=\"fusion-text fusion-text-19\"><p><em>Note : une analyse \u00e9quivalente aurait pu \u00eatre effectu\u00e9e en utilisant la strat\u00e9gie LEFT JOIN mais elle aurait \u00e9t\u00e9 beaucoup plus difficile \u00e0 maintenir (gestion des NULLs et des comparaisons de champs) et moins efficace car les op\u00e9rateurs de set sont plus puissants que les jointures.<\/em><\/p>\n<\/div><div class=\"fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">R\u00e9sum\u00e9<\/h2><\/div><div class=\"fusion-text fusion-text-20\"><p>Cette requ\u00eate mod\u00e9lis\u00e9e permet donc aux d\u00e9veloppeurs de valider rapidement les changements apport\u00e9s aux pipelines data complexes. Elle est utile en compl\u00e9ment des tests unitaires plus traditionnels, et peut m\u00eame \u00eatre consid\u00e9r\u00e9e de mani\u00e8re plus g\u00e9n\u00e9rale lors de la comparaison de deux ensembles data ayant des structures similaires. Enfin, la logique de cette requ\u00eate est m\u00eame facile \u00e0 apprendre par c\u0153ur !<\/p>\n<\/div><div class=\"fusion-text fusion-text-21\"><p>Merci d'avoir lu, j'esp\u00e8re que c'\u00e9tait clair et je serais heureux d'entendre vos commentaires :)<\/p>\n<\/div><\/div><\/div><\/div><\/article><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-5 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center\" style=\"--awb-padding-top:40px;--awb-padding-right:40px;--awb-padding-bottom:40px;--awb-padding-left:40px;--awb-overflow:hidden;--awb-bg-position:left center;--awb-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper lazyload fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-column fusion-column-has-bg-image\" data-bg-url=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\" data-bg=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\"><div class=\"fusion-image-element\" style=\"text-align:center;--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img decoding=\"async\" width=\"72\" height=\"41\" title=\"moyen\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%2772%27%20height%3D%2741%27%20viewBox%3D%270%200%2072%2041%27%3E%3Crect%20width%3D%2772%27%20height%3D%2741%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/medium.png\" alt class=\"lazyload img-responsive wp-image-60927\"\/><\/span><\/div><div class=\"fusion-title title fusion-title-7 fusion-sep-none fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-margin-top:20px;--awb-margin-bottom:0px;--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-center fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Moyen Blog par Artefact.<\/h3><\/div><div class=\"fusion-text fusion-text-22\" style=\"--awb-content-alignment:center;\"><p>Cet article a \u00e9t\u00e9 initialement publi\u00e9 sur <strong>Medium.com<\/strong>.<br \/>\nSuivez-nous sur notre Medium Blog !<\/p>\n<\/div><div style=\"text-align:center;\"><a class=\"fusion-button button-flat button-medium button-default fusion-button-default button-1 fusion-button-default-span fusion-button-default-type\" target=\"_blank\" rel=\"noopener noreferrer\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/how-to-quickly-compare-two-datasets-using-a-generic-powerful-sql-query-7efe752332b7\"><span class=\"fusion-button-text awb-button__text awb-button__text--default\">Lire notre article<\/span><\/a><\/div><\/div><\/div><\/div><\/div><\/p>","protected":false},"excerpt":{"rendered":"<p>Un guide \u00e9tape par \u00e9tape pour faciliter la comparaison des ensembles datasets gr\u00e2ce \u00e0 un mod\u00e8le de langage de requ\u00eate structur\u00e9 pr\u00eat \u00e0 l'emploi.<\/p>","protected":false},"featured_media":68686,"parent":0,"template":"","meta":{"_acf_changed":false,"ep_exclude_from_search":false},"blog-category":[21939],"blog-language":[2991],"class_list":["post-65787","blog","type-blog","status-publish","has-post-thumbnail","hentry","blog-category-medium","blog-language-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/blog\/65787","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/types\/blog"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/media\/68686"}],"wp:attachment":[{"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/media?parent=65787"}],"wp:term":[{"taxonomy":"blog-category","embeddable":true,"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/blog-category?post=65787"},{"taxonomy":"blog-language","embeddable":true,"href":"https:\/\/www.artefact.com\/fr\/wp-json\/wp\/v2\/blog-language?post=65787"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}