	{"id":65787,"date":"2022-01-18T09:50:59","date_gmt":"2022-01-18T09:50:59","guid":{"rendered":"https:\/\/www.artefact.com\/?post_type=news&#038;p=65787"},"modified":"2024-09-20T17:45:48","modified_gmt":"2024-09-20T16:45:48","slug":"how-to-quickly-compare-two-datasets-using-a-generic-and-powerful-sql-query","status":"publish","type":"blog","link":"https:\/\/www.artefact.com\/br\/blog\/how-to-quickly-compare-two-datasets-using-a-generic-and-powerful-sql-query\/","title":{"rendered":"Como comparar rapidamente dois datasets usando uma consulta SQL gen\u00e9rica e avan\u00e7ada"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling article-author\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_2 1_2 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:50%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:50%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Autor<\/h2><\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27150%27%20height%3D%270%27%20viewBox%3D%270%200%20150%200%27%3E%3Crect%20width%3D%27150%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/1560114886665.jpeg\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left article-author-image\" style=\"width: 150px; border-radius: 54% 46% 77% 23% \/ 74% 40% 60% 26%; overflow: hidden;\" width=\"150\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-three article-author-name-title\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">K\u00e9vin Lef\u00e8vre<\/h3><\/div><div class=\"fusion-text fusion-text-1 article-author-description\"><p>Data Desenvolvedor no Artefact<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center fusion-column-inner-bg-wrapper\" style=\"--awb-padding-top:20px;--awb-padding-right:20px;--awb-padding-bottom:20px;--awb-padding-left:20px;--awb-overflow:hidden;--awb-inner-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-top:1px;--awb-border-right:1px;--awb-border-bottom:1px;--awb-border-left:1px;--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-inner-bg-border-radius:4px 4px 4px 4px;--awb-inner-bg-overflow:hidden;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><span class=\"fusion-column-inner-bg hover-type-none\"><a class=\"fusion-column-anchor\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/scoring-customer-propensity-using-machine-learning-models-on-google-analytics-data-ba1126469c1f\" rel=\"noopener noreferrer\" target=\"_blank\"><span class=\"fusion-column-inner-bg-image\"><\/span><\/a><\/span><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-row fusion-flex-align-items-center\"><div class=\"fusion-text fusion-text-2\"><p><u>Leia nosso artigo sobre<\/u><\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><img decoding=\"async\" width=\"4000\" height=\"992\" title=\"M\u00e9dio Blog\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" alt class=\"lazyload img-responsive wp-image-60582\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%274000%27%20height%3D%27992%27%20viewBox%3D%270%200%204000%20992%27%3E%3Crect%20width%3D%274000%27%20height%3D%27992%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-200x50.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-400x99.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-600x149.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-800x198.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-1200x298.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png 4000w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 4000px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-3\"><p>.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-4 description\"><p>Em suas atividades di\u00e1rias, os analistas do Data Engineer e do Data precisam aprimorar os processos de ingest\u00e3o do data. Al\u00e9m dos testes unit\u00e1rios usuais, pode ser interessante comparar de forma f\u00e1cil e r\u00e1pida dois conjuntos de data (ou seja, tabelas, visualiza\u00e7\u00f5es, consultas etc.) para diferentes finalidades, como an\u00e1lise de impacto ou teste de n\u00e3o regress\u00e3o. Al\u00e9m disso, a identifica\u00e7\u00e3o de discrep\u00e2ncias entre dois pontos de vista de uma tabela de instant\u00e2neos \u00e9 bastante \u00fatil para an\u00e1lise adhoc ou depura\u00e7\u00e3o.<\/p>\n<p>Com isso em mente e para os profissionais de SQL, um script pr\u00e9-constru\u00eddo e reutiliz\u00e1vel faz sentido, da\u00ed o objetivo deste artigo.<\/p>\n<\/div><\/div><\/div><\/div><\/div><article class=\"fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Explica\u00e7\u00e3o do modelo SQL pronto para uso<\/h2><\/div><div class=\"fusion-text fusion-text-5\"><p>Vamos ao cerne da quest\u00e3o com a consulta esperada:<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27619%27%20height%3D%270%27%20viewBox%3D%270%200%20619%200%27%3E%3Crect%20width%3D%27619%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/How-to-quickly-compare-two-datasets-using-a-generic-powerful-SQL-query-by-Ke\u0301vin-LEFEVRE-Artefact-Engineering-and-Data-Science-Jan-2022-Medium-18-01-2022-10-00-45-1024x789.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 619px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"619\" height=\"auto\" \/><div class=\"fusion-text fusion-text-6\"><p>Agora, vamos explicar isso:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-1 fusion-checklist-default type-icons paddingList dark-text\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Primeiro, usamos o CTE - Common Table Expressions (WITH ... ASstatements) para identificar os dois datasets a serem comparados. Dessa forma, as linhas de consulta a seguir podem permanecer inalteradas.<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Em seguida, criamos duas consultas semelhantes com os apelidos dataset_1_only e dataset_2_only para manter as linhas que est\u00e3o presentes apenas no dataset 1 (em compara\u00e7\u00e3o com o dataset 2) e vice-versa, gra\u00e7as ao operador de conjunto EXCEPT.<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Por fim, um campo complementar de sinaliza\u00e7\u00e3o \u00e9 definido para identificar a origem das linhas incompat\u00edveis de ambas as consultas. Em seguida, elas s\u00e3o reunidas por meio do operador de conjunto UNION ALL.<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-7\"><p><em>Observa\u00e7\u00e3o: obviamente, ter dois datasets com campos compar\u00e1veis \u00e9 um pr\u00e9-requisito para usar a consulta acima<\/em><\/p>\n<\/div><div class=\"fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Exemplo ilustrado<\/h2><\/div><div class=\"fusion-text fusion-text-8\"><p>Chega de teoria, agora vamos \u00e0 pr\u00e1tica! Suponha que um supermercado queira <strong>recuperar todos os produtos<\/strong> eleg\u00edveis para uma promo\u00e7\u00e3o em uma \u00fanica tabela, de acordo com a semana <strong>condi\u00e7\u00f5es de promo\u00e7\u00e3o<\/strong>.<\/p>\n<p>Como declara\u00e7\u00e3o inicial, consideramos os produtos abaixo em estoque:<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27587%27%20height%3D%270%27%20viewBox%3D%270%200%20587%200%27%3E%3Crect%20width%3D%27587%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/How-to-quickly-compare-two-datasets-using-a-generic-powerful-SQL-query-by-Ke\u0301vin-LEFEVRE-Artefact-Engineering-and-Data-Science-Jan-2022-Medium-18-01-2022-10-12-30-1024x326.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 587px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"587\" height=\"auto\" \/><div class=\"fusion-text fusion-text-9\"><p>De acordo com as condi\u00e7\u00f5es da promo\u00e7\u00e3o na semana 1:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-2 fusion-checklist-default type-icons paddingList dark-text\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>10% desconto em prateleiras de frutas, legumes e ervas<\/em><\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-10\"><p>o dataset_1 \u00e9:<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27598%27%20height%3D%270%27%20viewBox%3D%270%200%20598%200%27%3E%3Crect%20width%3D%27598%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/How-to-quickly-compare-two-datasets-using-a-generic-powerful-SQL-query-by-Ke\u0301vin-LEFEVRE-Artefact-Engineering-and-Data-Science-Jan-2022-Medium-18-01-2022-10-15-18-1024x304.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 598px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"598\" height=\"auto\" \/><div class=\"fusion-text fusion-text-11\"><p>Na semana 2, as condi\u00e7\u00f5es de promo\u00e7\u00e3o evoluem para:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-3 fusion-checklist-default type-icons paddingList dark-text\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>10% desconto nas prateleiras de frutas e sorvetes<\/em><\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>20% desconto na prateleira de legumes<\/em><\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>sem desconto na prateleira de ervas<\/em><\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-12\"><p>A evolu\u00e7\u00e3o do pipeline data gera o dataset_2:<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27598%27%20height%3D%270%27%20viewBox%3D%270%200%20598%200%27%3E%3Crect%20width%3D%27598%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/How-to-quickly-compare-two-datasets-using-a-generic-powerful-SQL-query-by-Ke\u0301vin-LEFEVRE-Artefact-Engineering-and-Data-Science-Jan-2022-Medium-18-01-2022-10-19-04-1024x302.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 598px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"598\" height=\"auto\" \/><div class=\"fusion-text fusion-text-13\"><p>Mas... ser\u00e1 que estamos realmente confiantes com os resultados? As perguntas t\u00edpicas s\u00e3o:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-4 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>H\u00e1 alguma linha faltando ou extra?<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>As evolu\u00e7\u00f5es implementadas est\u00e3o corretas nos campos impactados?<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>H\u00e1 alguma regress\u00e3o em campos n\u00e3o impactados?<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-14\"><p><em>Observa\u00e7\u00e3o: mesmo que pare\u00e7a f\u00e1cil responder \u00e0s perguntas acima intuitivamente devido a um exemplo trivial em um pequeno conjunto de data, em casos de uso real, geralmente enfrentamos muitas linhas e colunas de consultas complexas (transforma\u00e7\u00f5es, uni\u00f5es, agregados, fun\u00e7\u00f5es de janelas, etc.), o que faz com que essa consulta assuma todo o seu significado<\/em><\/p>\n<\/div><div class=\"fusion-text fusion-text-15\"><p>Vamos tentar responder \u00e0s tr\u00eas perguntas interpretando os resultados da consulta de compara\u00e7\u00e3o:<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27619%27%20height%3D%270%27%20viewBox%3D%270%200%20619%200%27%3E%3Crect%20width%3D%27619%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/01\/How-to-quickly-compare-two-datasets-using-a-generic-powerful-SQL-query-by-Ke\u0301vin-LEFEVRE-Artefact-Engineering-and-Data-Science-Jan-2022-Medium-18-01-2022-10-24-02-1024x336.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 619px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"619\" height=\"auto\" \/><div class=\"fusion-text fusion-text-16\"><p>Observa\u00e7\u00f5es:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-5 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Nenhuma fila com produtos da prateleira de frutas: <strong>bom<\/strong> como o \u201c<em>10% desconto em frutas<\/em>\u201d permaneceu inalterado da semana 1 para a semana 2<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>O Product_ID #50 (prateleira de ervas) est\u00e1 faltando no dataset_2 (somente no dataset_1): <strong>bom<\/strong> como o \u201c<em>desconto em ervas<\/em>\u201d foi interrompido na semana 2<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>O Product_ID #60 (prateleira de sorvetes) est\u00e1 faltando no dataset_1 (somente no dataset_2): <strong>bom<\/strong> como o \u201c<em>desconto em sorvetes<\/em>\u201d foi introduzido na semana 2<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>As linhas referentes ao Product_ID #30 e #40 (prateleira de legumes) est\u00e3o divergindo no campo reduced_price: <strong>bom<\/strong> como o \u201c<em>desconto em vegetais<\/em>\u201d aumentou de 10% na semana 1 para 20% na semana 2<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-17\"><p>Finalmente, tudo parece bem:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-6 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>H\u00e1 alguma linha faltando ou extra?<strong> N\u00e3o!<\/strong><\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>As evolu\u00e7\u00f5es implementadas est\u00e3o corretas nos campos impactados? <strong>Sim!<\/strong><\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>H\u00e1 alguma regress\u00e3o em campos n\u00e3o impactados? <strong>N\u00e3o!<\/strong><\/p>\n<\/div><\/li><\/ul><div class=\"fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Dicas e truques<\/h2><\/div><div class=\"fusion-text fusion-text-18\"><p>No CTE, mesmo que o senhor possa copiar e colar sua consulta SQL de ingest\u00e3o antes\/depois da evolu\u00e7\u00e3o, pode ser uma boa ideia <strong>armazenar os resultados em tabelas tempor\u00e1rias<\/strong> para simplificar a compara\u00e7\u00e3o, melhorar o desempenho da consulta e se beneficiar do armazenamento em cache se o senhor a executar v\u00e1rias vezes.<\/p>\n<p>No mundo real, as diverg\u00eancias entre dois conjuntos data podem ser confusas. Ent\u00e3o, <strong>Classifica\u00e7\u00e3o de resultados<\/strong> por chave (a granularidade dos datasets) e sinalizador pode ajudar enormemente a interpretar e comparar linhas equivalentes provenientes de ambos os datasets.<\/p>\n<p>Para facilitar as investiga\u00e7\u00f5es e identificar a origem da lacuna, o <strong>suspeitos datasets<\/strong> <strong>campos comuns<\/strong> podem ser removidos (ou seja, comentados no CTE) da compara\u00e7\u00e3o: se n\u00e3o houver resultado, isso significa que todos os campos comparados s\u00e3o iguais. O senhor pode ent\u00e3o se concentrar apenas nos campos suspeitos restantes para a pr\u00f3xima compara\u00e7\u00e3o e faz\u00ea-la passo a passo.<\/p>\n<\/div><div class=\"fusion-text fusion-text-19\"><p><em>Observa\u00e7\u00e3o: uma an\u00e1lise equivalente poderia ter sido realizada usando a estrat\u00e9gia LEFT JOIN, mas seria muito mais dif\u00edcil de manter (gerenciamento de NULLs e compara\u00e7\u00e3o de campos) e menos eficiente, pois os operadores de conjunto s\u00e3o mais poderosos do que as jun\u00e7\u00f5es.<\/em><\/p>\n<\/div><div class=\"fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Resumo<\/h2><\/div><div class=\"fusion-text fusion-text-20\"><p>Portanto, essa consulta de modelo facilita para os desenvolvedores a valida\u00e7\u00e3o r\u00e1pida de altera\u00e7\u00f5es em pipelines complexos do data. Ela \u00e9 \u00fatil como complemento para testes de unidade mais tradicionais e pode at\u00e9 ser considerada de forma mais geral ao comparar dois conjuntos de data com estruturas semelhantes. Por \u00faltimo, mas n\u00e3o menos importante, a l\u00f3gica dessa consulta \u00e9 at\u00e9 f\u00e1cil de aprender de cor!<\/p>\n<\/div><div class=\"fusion-text fusion-text-21\"><p>Obrigado por ler, espero que tenha sido claro e ficaria feliz em ouvir seus coment\u00e1rios :)<\/p>\n<\/div><\/div><\/div><\/div><\/article><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-5 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center\" style=\"--awb-padding-top:40px;--awb-padding-right:40px;--awb-padding-bottom:40px;--awb-padding-left:40px;--awb-overflow:hidden;--awb-bg-position:left center;--awb-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper lazyload fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-column fusion-column-has-bg-image\" data-bg-url=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\" data-bg=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\"><div class=\"fusion-image-element\" style=\"text-align:center;--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img decoding=\"async\" width=\"72\" height=\"41\" title=\"m\u00e9dio\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%2772%27%20height%3D%2741%27%20viewBox%3D%270%200%2072%2041%27%3E%3Crect%20width%3D%2772%27%20height%3D%2741%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/medium.png\" alt class=\"lazyload img-responsive wp-image-60927\"\/><\/span><\/div><div class=\"fusion-title title fusion-title-7 fusion-sep-none fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-margin-top:20px;--awb-margin-bottom:0px;--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-center fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">M\u00e9dia Blog por Artefact.<\/h3><\/div><div class=\"fusion-text fusion-text-22\" style=\"--awb-content-alignment:center;\"><p>Este artigo foi publicado inicialmente no <strong>Medium.com<\/strong>.<br \/>\nSiga-nos em nosso Medium Blog !<\/p>\n<\/div><div style=\"text-align:center;\"><a class=\"fusion-button button-flat button-medium button-default fusion-button-default button-1 fusion-button-default-span fusion-button-default-type\" target=\"_blank\" rel=\"noopener noreferrer\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/how-to-quickly-compare-two-datasets-using-a-generic-powerful-sql-query-7efe752332b7\"><span class=\"fusion-button-text awb-button__text awb-button__text--default\">Leia nosso artigo<\/span><\/a><\/div><\/div><\/div><\/div><\/div><\/p>","protected":false},"excerpt":{"rendered":"<p>Um guia passo a passo para facilitar a compara\u00e7\u00e3o de datasets por meio de um modelo pronto para uso de Structured Query Language (Linguagem de Consulta Estruturada)<\/p>","protected":false},"featured_media":68686,"parent":0,"template":"","meta":{"_acf_changed":false,"ep_exclude_from_search":false},"blog-category":[21939],"blog-language":[2991],"class_list":["post-65787","blog","type-blog","status-publish","has-post-thumbnail","hentry","blog-category-medium","blog-language-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog\/65787","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/types\/blog"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/media\/68686"}],"wp:attachment":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/media?parent=65787"}],"wp:term":[{"taxonomy":"blog-category","embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog-category?post=65787"},{"taxonomy":"blog-language","embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog-language?post=65787"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}