	{"id":67163,"date":"2022-05-24T13:58:03","date_gmt":"2022-05-24T12:58:03","guid":{"rendered":"https:\/\/www.artefact.com\/?post_type=blog&#038;p=67163"},"modified":"2024-09-20T17:45:49","modified_gmt":"2024-09-20T16:45:49","slug":"a-manifesto-to-include-ml-engineer-in-your-data-science-projects-from-day-1","status":"publish","type":"blog","link":"https:\/\/www.artefact.com\/br\/blog\/a-manifesto-to-include-ml-engineer-in-your-data-science-projects-from-day-1\/","title":{"rendered":"Um manifesto para incluir engenheiros de ML em seus projetos cient\u00edficos data desde o primeiro dia"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling article-author\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_2 1_2 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:50%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:50%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Autor<\/h2><\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27150%27%20height%3D%270%27%20viewBox%3D%270%200%20150%200%27%3E%3Crect%20width%3D%27150%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/jeff-kayne.jpeg\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left article-author-image\" style=\"width: 150px; border-radius: 54% 46% 77% 23% \/ 74% 40% 60% 26%; overflow: hidden;\" width=\"150\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-three article-author-name-title\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Jeff Kayne<\/h3><\/div><div class=\"fusion-text fusion-text-1 article-author-description\" style=\"--awb-text-transform:none;\"><p>Cientista s\u00eanior do Data no Artefact<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center fusion-column-inner-bg-wrapper\" style=\"--awb-padding-top:20px;--awb-padding-right:20px;--awb-padding-bottom:20px;--awb-padding-left:20px;--awb-overflow:hidden;--awb-inner-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-top:1px;--awb-border-right:1px;--awb-border-bottom:1px;--awb-border-left:1px;--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-inner-bg-border-radius:4px 4px 4px 4px;--awb-inner-bg-overflow:hidden;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><span class=\"fusion-column-inner-bg hover-type-none\"><a class=\"fusion-column-anchor\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/a-manifesto-to-include-ml-engineers-in-your-data-science-projects-from-day-1-2964fdd25036\" rel=\"noopener noreferrer\" target=\"_blank\"><span class=\"fusion-column-inner-bg-image\"><\/span><\/a><\/span><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-row fusion-flex-align-items-center\"><div class=\"fusion-text fusion-text-2\"><p><u>Leia nosso artigo sobre<\/u><\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><img decoding=\"async\" width=\"4000\" height=\"992\" title=\"M\u00e9dio Blog\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" alt class=\"lazyload img-responsive wp-image-60582\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%274000%27%20height%3D%27992%27%20viewBox%3D%270%200%204000%20992%27%3E%3Crect%20width%3D%274000%27%20height%3D%27992%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-200x50.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-400x99.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-600x149.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-800x198.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-1200x298.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png 4000w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 4000px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-3\"><p>.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-4 description\" style=\"--awb-text-transform:none;\"><p>Neste artigo de opini\u00e3o, Jeffery explica que os cientistas de data devem se concentrar na precis\u00e3o de seus modelos, enquanto os engenheiros de ML devem priorizar a garantia de que os modelos possam ser usados por toda a empresa. <\/p>\n<\/div><\/div><\/div><\/div><\/div><article class=\"fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-5\" style=\"--awb-text-transform:none;\"><p>Muitos de n\u00f3s, desenvolvedores, conhecemos a sensa\u00e7\u00e3o de lutar para equilibrar o tempo gasto com os usu\u00e1rios tentando entender suas necessidades e o tempo gasto com o desenvolvimento de software. Isso \u00e9 ainda mais evidente na ci\u00eancia data, pois, para criar um sistema eficaz, \u00e9 necess\u00e1rio muito conhecimento do dom\u00ednio desse sistema. Nos \u00faltimos dois anos, trabalhando como engenheiro de ML com diferentes <a href=\"https:\/\/www.artefact.com\/br\/ai-technology\/data-science\/\">Ci\u00eancia data<\/a> equipes, muitas vezes me pergunto como posso separar as responsabilidades de otimizar a precis\u00e3o do modelo e criar todo o software necess\u00e1rio para tornar esse modelo funcional. Minha humilde opini\u00e3o \u00e9 que os cientistas de data devem priorizar a precis\u00e3o de seus modelos, enquanto os engenheiros de ML devem priorizar a garantia de que esses modelos possam ser usados pela empresa como um todo.<\/p>\n<\/div><div class=\"fusion-text fusion-text-6\" style=\"--awb-text-transform:none;\"><p>Como regra geral em projetos cient\u00edficos data, quanto mais itera\u00e7\u00f5es o senhor concluir, melhor. Ent\u00e3o, vamos ver por que incluir um engenheiro de ML desde o primeiro dia ajudar\u00e1 o senhor a iterar e, portanto, aumentar\u00e1 suas chances de criar um sistema bem-sucedido. Para abranger todos os aspectos, dividiremos cada motivo em tr\u00eas t\u00f3picos principais dos projetos cient\u00edficos do data:\u00a0<strong class=\"ko jb\">data, modelos\u00a0<\/strong>e<strong>\u00a0infraestrutura<\/strong>.<\/p>\n<p>Antes de entrar no assunto, deixe-me definir o que quero dizer com\u00a0<strong>itera\u00e7\u00e3o<\/strong>. Neste artigo, estou me referindo a itera\u00e7\u00f5es de ponta a ponta do produto completo, muitas vezes incluindo etapas de: Ingest\u00e3o de data, pr\u00e9-processamento, treinamento e avalia\u00e7\u00e3o de modelos, infraestrutura de provisionamento, etc. O que eu\u00a0<em>n\u00e3o significa\u00a0<\/em>\u00e9 uma itera\u00e7\u00e3o r\u00e1pida do modelo em um notebook com o ajuste de um hiperpar\u00e2metro. Se estiver acostumado a trabalhar em uma estrutura \u00e1gil, o senhor tamb\u00e9m pode pensar nessas itera\u00e7\u00f5es como sprints de projeto.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Motivo 1: Acelerar a entrega do POC inicial<\/h2><\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img decoding=\"async\" width=\"500\" height=\"750\" alt=\"Reason 1 ML Engineers : Accelerate the initial POC delivery\" title=\"artigo1\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article1.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article1.png\" class=\"lazyload img-responsive wp-image-67164\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27500%27%20height%3D%27750%27%20viewBox%3D%270%200%20500%20750%27%3E%3Crect%20width%3D%27500%27%20height%3D%27750%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article1-200x300.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article1-400x600.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article1.png 500w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 500px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-7\" style=\"--awb-text-transform:none;\"><p>Construir um esqueleto sobre o qual o senhor possa iterar \u00e9 a primeira prioridade e pode ser um processo demorado. Esse esqueleto geralmente \u00e9 um POC que cont\u00e9m seu modelo de linha de base inicial e uma demonstra\u00e7\u00e3o de um aplicativo ou forma de explorar o resultado do modelo.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Um engenheiro de ML ajudar\u00e1 com:<\/h3><\/div><div class=\"fusion-text fusion-text-8\" style=\"--awb-text-transform:none;\"><p><strong>Infraestrutura:\u00a0<\/strong>A sele\u00e7\u00e3o de recursos compat\u00edveis do cloud (VMs, conex\u00f5es com v\u00e1rias fontes do data) e o projeto da arquitetura do cloud s\u00e3o algumas das considera\u00e7\u00f5es iniciais do engenheiro de ML.<\/p>\n<p><strong>Data:\u00a0<\/strong>obter o data necess\u00e1rio para iniciar a constru\u00e7\u00e3o de um modelo e garantir a disponibilidade de data de v\u00e1rias fontes com a op\u00e7\u00e3o de desenvolver novos fluxos, se necess\u00e1rio.<\/p>\n<p><strong>Modelos:<\/strong>\u00a0garantir que os modelos que est\u00e3o sendo testados sejam de fato compat\u00edveis com a arquitetura cloud proposta para a implanta\u00e7\u00e3o de modelos e requisitos t\u00e9cnicos (por exemplo, lat\u00eancia, computa\u00e7\u00e3o necess\u00e1ria, requisitos do ambiente de produ\u00e7\u00e3o).<\/p>\n<p>O engenheiro de ML tamb\u00e9m pode ajudar nessa fase, definindo as pr\u00e1ticas recomendadas de engenharia de software com controle de vers\u00e3o, linting, arquitetura de c\u00f3digo, testes etc.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Raz\u00e3o 2: Acelerar cada itera\u00e7\u00e3o<\/h2><\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-3 hover-type-none\"><img decoding=\"async\" width=\"659\" height=\"500\" alt=\"Reason 2 ML Engineers : Accelerate each iteration\" title=\"artigo2\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article2.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article2.png\" class=\"lazyload img-responsive wp-image-67166\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27659%27%20height%3D%27500%27%20viewBox%3D%270%200%20659%20500%27%3E%3Crect%20width%3D%27659%27%20height%3D%27500%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article2-200x152.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article2-400x303.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article2-600x455.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article2.png 659w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 659px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-9\" style=\"--awb-text-transform:none;\"><p>Depois que o senhor consegue essa constru\u00e7\u00e3o inicial, as primeiras itera\u00e7\u00f5es costumam ser dif\u00edceis e lentas. Acelerar as itera\u00e7\u00f5es permitir\u00e1 itera\u00e7\u00f5es menores com uma \u00fanica altera\u00e7\u00e3o de recurso - uma maneira muito mais eficaz de desenvolver do que alterar muitas coisas em um modelo antes de obter feedback.<\/p>\n<p><strong>Infraestrutura:<\/strong>\u00a0O tempo pode ser economizado otimizando a infraestrutura de armazenamento e computa\u00e7\u00e3o. Durante essas itera\u00e7\u00f5es, um engenheiro de ML pode procurar versionar a pr\u00f3pria infraestrutura, com ferramentas de Infraestrutura como C\u00f3digo (IaC), como\u00a0<a class=\"au kl\" href=\"https:\/\/www.terraform.io\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Terraform<\/a>. O uso do IaC permite a automa\u00e7\u00e3o da implementa\u00e7\u00e3o da infraestrutura diretamente com pipelines de CI\/CD, acelerando a integra\u00e7\u00e3o de quaisquer altera\u00e7\u00f5es que precisem ser feitas na infraestrutura existente e a cria\u00e7\u00e3o de diferentes ambientes cloud (desenvolvimento, prepara\u00e7\u00e3o, produ\u00e7\u00e3o). Al\u00e9m disso, o uso de componentes espec\u00edficos do cloud pode acelerar seu fluxo de trabalho, por exemplo, a cria\u00e7\u00e3o de imagens remotamente usando o GCP's\u00a0<a class=\"au kl\" href=\"https:\/\/cloud.google.com\/build\" target=\"_blank\" rel=\"noopener ugc nofollow\">Constru\u00e7\u00e3o na nuvem<\/a>.<\/p>\n<\/div><div class=\"fusion-text fusion-text-10\" style=\"--awb-text-transform:none;\"><p><strong>Data:\u00a0<\/strong>Os pipelines de pr\u00e9-processamento podem ser criados rapidamente pelas equipes cient\u00edficas do data para que a modelagem seja feita rapidamente. Um engenheiro de ML pode ajudar nessa fase a simplificar suas consultas de processamento, sejam elas em sql, pandas, pyspark etc. Fazer isso logo no in\u00edcio pode economizar muito tempo em itera\u00e7\u00f5es a longo prazo, pois esse c\u00f3digo \u00e9 executado\u00a0<em class=\"lk\">muito<\/em>.<\/p>\n<p><strong>Modelos:\u00a0<\/strong>arquiteturas complexas de modelos podem tornar o processo de treinamento demorado. Al\u00e9m disso, quando um cientista do data se refere a um \u201cmodelo\u201d, ele pode, na verdade, estar se referindo a um grupo de 100 modelos treinados em diferentes fatias do data, cada um com um explicador SHAP para derivar a import\u00e2ncia do recurso. Um engenheiro de ML pode se concentrar em como paralelizar o pipeline de treinamento, seja em uma VM com multiprocessamento em python ou distribuindo sua carga de trabalho em v\u00e1rios n\u00f3s no cloud. Isso pode ser repetido, mas \u00e9 poss\u00edvel obter grandes ganhos aqui com um esfor\u00e7o surpreendentemente pequeno. Automatizar a implementa\u00e7\u00e3o do seu modelo com um\u00a0<a class=\"au kl\" href=\"https:\/\/cloud.google.com\/architecture\/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning\" target=\"_blank\" rel=\"noopener ugc nofollow\">CI\/CD\/CT<\/a>\u00a0O pipeline tamb\u00e9m acelera muito as itera\u00e7\u00f5es e garante a repetibilidade.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Raz\u00e3o 3: Reduzir o custo de cada itera\u00e7\u00e3o<\/h2><\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-4 hover-type-none\"><img decoding=\"async\" width=\"576\" height=\"433\" alt=\"Reason 3 ML Engineers : Reduce the cost of each iteration\" title=\"artigo3\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article3.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article3.png\" class=\"lazyload img-responsive wp-image-67167\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27576%27%20height%3D%27433%27%20viewBox%3D%270%200%20576%20433%27%3E%3Crect%20width%3D%27576%27%20height%3D%27433%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article3-200x150.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article3-400x301.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article3.png 576w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 576px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-11\" style=\"--awb-text-transform:none;\"><p>Ter um engenheiro para monitorar o or\u00e7amento do seu projeto cloud \u00e9 fundamental, especialmente para aplicativos data intensivos.<\/p>\n<p><strong>Infraestrutura:\u00a0<\/strong>O custo \u00e9 uma vari\u00e1vel importante na equa\u00e7\u00e3o de sele\u00e7\u00e3o da infraestrutura. Depois que a infraestrutura \u00e9 escolhida, alertas or\u00e7ament\u00e1rios podem ser implementados para garantir que os componentes caros sejam monitorados de perto.<\/p>\n<p><strong>Data:\u00a0<\/strong>As consultas inteligentes e o armazenamento do data tamb\u00e9m podem reduzir significativamente os custos de cada itera\u00e7\u00e3o. Por exemplo, a agrega\u00e7\u00e3o do data deve ser feita com parcim\u00f4nia durante as itera\u00e7\u00f5es do modelo.<\/p>\n<p><strong>Modelos:\u00a0<\/strong>A paraleliza\u00e7\u00e3o do pipeline de treinamento tamb\u00e9m pode economizar o tempo de atividade de m\u00e1quinas caras ou o tempo de execu\u00e7\u00e3o de componentes sem servidor.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-7 fusion-sep-none fusion-title-text fusion-title-size-one\" style=\"--awb-margin-bottom-small:8px;\"><h1 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:70;line-height:1;\">Raz\u00e3o 4: Garantir a repetibilidade e a interpretabilidade de cada itera\u00e7\u00e3o<\/h1><\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-5 hover-type-none\"><img decoding=\"async\" width=\"557\" height=\"499\" alt=\"Reason 4 ML Engineers : Ensure repeatability &amp; interpretability of each iteration\" title=\"artigo4\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article4.jpeg\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article4.jpeg\" class=\"lazyload img-responsive wp-image-67168\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27557%27%20height%3D%27499%27%20viewBox%3D%270%200%20557%20499%27%3E%3Crect%20width%3D%27557%27%20height%3D%27499%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article4-200x179.jpeg 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article4-400x358.jpeg 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article4.jpeg 557w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 557px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-12\" style=\"--awb-text-transform:none;\"><p>Conseguir itera\u00e7\u00f5es r\u00e1pidas com um loop de feedback de qualidade em seu projeto \u00e9 \u00f3timo, mas se o senhor n\u00e3o puder replay cada um desses cen\u00e1rios, n\u00e3o ter\u00e1 muita utilidade. Ter um pipeline repet\u00edvel significa implicitamente que o senhor deve ter alguma forma de monitorar as execu\u00e7\u00f5es do pipeline para identificar as execu\u00e7\u00f5es com base em par\u00e2metros espec\u00edficos ou m\u00e9tricas de desempenho para as quais reverter, se necess\u00e1rio. Configurar isso de forma robusta durante o desenvolvimento ajuda os cientistas data a fazer experimentos livremente (sem a necessidade do infame Untitled12.ipynb) e prepara o pipeline para o monitoramento da produ\u00e7\u00e3o.<\/p>\n<\/div><div class=\"fusion-text fusion-text-13\" style=\"--awb-text-transform:none;\"><p><strong>Infraestrutura:\u00a0<\/strong>Vincular a vers\u00e3o do c\u00f3digo de treinamento \u00e0 vers\u00e3o do c\u00f3digo de infraestrutura \u00e9 a \u201cmilha extra\u201d aqui, mas \u00e9 necess\u00e1rio para fornecer recursos completos de revers\u00e3o para uma execu\u00e7\u00e3o anterior. Garantir a repetibilidade e o monitoramento com base em uma execu\u00e7\u00e3o de pipeline \u00e9, para mim, o primeiro n\u00edvel essencial de ML Ops que as equipes devem buscar. As plataformas de nuvem t\u00eam servi\u00e7os (GCP's\u00a0<a class=\"au kl\" href=\"https:\/\/cloud.google.com\/vertex-ai\/?utm_source=google&amp;utm_medium=cpc&amp;utm_campaign=emea-fr-all-en-dr-skws-all-all-trial-e-gcp-1011340&amp;utm_content=text-ad-none-any-DEV_c-CRE_574683660431-ADGP_Hybrid%20%7C%20SKWS%20-%20EXA%20%7C%20Txt%20~%20%20AI%20%26%20ML%20~%20Vertex%20AI-KWID_43700066526085663-kwd-1445093214164-userloc_9054940&amp;utm_term=KW_vertex%20ai%20platform-NET_g-PLAC_&amp;gclid=Cj0KCQjwspKUBhCvARIsAB2IYutwLFuAu35tUPpdvJPb0xE-lKIgAK07C6YgpP_QQW6kF_2vV9VIky8aApfBEALw_wcB&amp;gclsrc=aw.ds\" target=\"_blank\" rel=\"noopener ugc nofollow\">Vertex AI<\/a>\u00a0por exemplo), que pode ser r\u00e1pido de configurar, mas o senhor tamb\u00e9m pode considerar adotar a abordagem \u201cmelhor da categoria\u201d usando ferramentas de c\u00f3digo aberto. A troca aqui \u00e9 equilibrar a maior funcionalidade de ferramentas espec\u00edficas de c\u00f3digo aberto com o aumento da complexidade da infraestrutura geral do sistema.<\/p>\n<p><strong>Data:\u00a0<\/strong>salvar todos os objetos data em cada est\u00e1gio do pipeline. Dependendo dos volumes, a prioridade \u00e9 salvar os conjuntos de treinamento\/teste de cada execu\u00e7\u00e3o.<\/p>\n<p><strong>Modelos:\u00a0<\/strong>como acima, salvando todos os modelos para cada execu\u00e7\u00e3o com todos os par\u00e2metros e m\u00e9tricas necess\u00e1rios. Outra dica \u00e9 registrar um coment\u00e1rio em cada execu\u00e7\u00e3o com o que foi alterado para essa execu\u00e7\u00e3o espec\u00edfica para registrar todos os experimentos durante o desenvolvimento, como faria com um\u00a0<em>git commit<\/em>\u00a0mensagem.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-8 fusion-sep-none fusion-title-text fusion-title-size-one\" style=\"--awb-margin-bottom-small:8px;\"><h1 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:70;line-height:1;\">Raz\u00e3o 5: Evitar a todo custo as agitadas itera\u00e7\u00f5es de \u201cindustrializa\u00e7\u00e3o\u201d<\/h1><\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-6 hover-type-none\"><img decoding=\"async\" width=\"700\" height=\"445\" alt=\"Reason 5 ML Engineers : Avoid at all costs the hectic \u201cindustrialisation\u201d iterations\" title=\"artigo5\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article5.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article5.png\" class=\"lazyload img-responsive wp-image-67169\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27445%27%20viewBox%3D%270%200%20700%20445%27%3E%3Crect%20width%3D%27700%27%20height%3D%27445%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article5-200x127.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article5-400x254.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article5-600x381.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2022\/05\/article5.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 700px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-14\" style=\"--awb-text-transform:none;\"><p>Quando a ci\u00eancia do data surgiu, ela era muito explorat\u00f3ria e exigia um grande esfor\u00e7o de um grupo de engenheiros de software para implantar qualquer modelo, uma vez que ele tivesse demonstrado bom desempenho no data hist\u00f3rico. Essa fase de \u201cindustrializa\u00e7\u00e3o\u201d \u00e9 uma experi\u00eancia muito dolorosa, pois o ambiente de desenvolvimento (arquivos simples e notebook python) \u00e9 muito diferente do ambiente de produ\u00e7\u00e3o (fluxos automatizados de data com data de produ\u00e7\u00e3o e ambiente de codifica\u00e7\u00e3o de produ\u00e7\u00e3o). Os projetos mais bem-sucedidos em que trabalhei foram aqueles em que conseguimos copiar o ambiente de produ\u00e7\u00e3o o mais pr\u00f3ximo poss\u00edvel no desenvolvimento desde o in\u00edcio. Isso reduzir\u00e1 o tempo at\u00e9 a produ\u00e7\u00e3o e permitir\u00e1 que o senhor fa\u00e7a itera\u00e7\u00f5es com seguran\u00e7a no desenvolvimento, implementando na produ\u00e7\u00e3o quando estiver satisfeito com uma itera\u00e7\u00e3o.<\/p>\n<p><strong>Infraestrutura:\u00a0<\/strong>Emular a infraestrutura de produ\u00e7\u00e3o necess\u00e1ria no desenvolvimento nem sempre \u00e9 f\u00e1cil e pode ser caro. \u00c9 nesse ponto que a infraestrutura como c\u00f3digo \u00e9 \u00fatil e pode permitir que o senhor alterne facilmente entre ambientes.<\/p>\n<\/div><div class=\"fusion-text fusion-text-15\" style=\"--awb-text-transform:none;\"><p><strong>Data: <\/strong>Algo que separa o desenvolvimento cient\u00edfico do data da engenharia de software tradicional ou mesmo da engenharia de data \u00e9 que os cientistas do data exigem o data de produ\u00e7\u00e3o no desenvolvimento. O data de sandbox (excluindo alguns data ou incluindo alguns data de teste sint\u00e9tico) para a engenharia regular do data \u00e9 uma boa pr\u00e1tica durante o desenvolvimento, mas pode ser uma grande perda de tempo para a ci\u00eancia do data e pode ter grandes impactos em todo o pipeline cient\u00edfico do data. Portanto, ter acesso somente de leitura \u00e0s tabelas de produ\u00e7\u00e3o \u00e9 algo que deve ser negociado com a sua equipe de data desde o primeiro dia.<\/p>\n<p><strong>Modelos:\u00a0<\/strong>Desde o in\u00edcio do projeto, apenas um modelo (ou abordagem de modelagem) deve estar presente em seu c\u00f3digo de produ\u00e7\u00e3o. Todos os experimentos devem ficar em notebooks ou em scripts tempor\u00e1rios separados em outra pasta. Isso evita que o senhor acumule c\u00f3digo morto em sua base de c\u00f3digo de produ\u00e7\u00e3o e facilita a manuten\u00e7\u00e3o ou a integra\u00e7\u00e3o de outros desenvolvedores.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-9 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Conclus\u00e3o<\/h2><\/div><div class=\"fusion-text fusion-text-16\" style=\"--awb-text-transform:none;\"><p>Em resumo, a cria\u00e7\u00e3o de modelos e a cria\u00e7\u00e3o do software que envolve esses modelos devem ser duas prioridades desde o in\u00edcio de cada projeto. Portanto, ter fluxos separados com responsabilidades diferentes pode ajudar as equipes a se concentrarem em ambos em paralelo. A fun\u00e7\u00e3o do engenheiro de ML est\u00e1 evoluindo a cada dia, e eu gostaria de ouvir a opini\u00e3o dos senhores sobre qualquer coisa que eu tenha deixado passar!<\/p>\n<\/div><\/div><\/div><\/div><\/article><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-5 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center\" style=\"--awb-padding-top:40px;--awb-padding-right:40px;--awb-padding-bottom:40px;--awb-padding-left:40px;--awb-overflow:hidden;--awb-bg-position:left center;--awb-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper lazyload fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-column fusion-column-has-bg-image\" data-bg-url=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\" data-bg=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\"><div class=\"fusion-image-element\" style=\"text-align:center;--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-7 hover-type-none\"><img decoding=\"async\" width=\"72\" height=\"41\" title=\"m\u00e9dio\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%2772%27%20height%3D%2741%27%20viewBox%3D%270%200%2072%2041%27%3E%3Crect%20width%3D%2772%27%20height%3D%2741%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/medium.png\" alt class=\"lazyload img-responsive wp-image-60927\"\/><\/span><\/div><div class=\"fusion-title title fusion-title-10 fusion-sep-none fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-margin-top:20px;--awb-margin-bottom:0px;--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-center fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">M\u00e9dia Blog por Artefact.<\/h3><\/div><div class=\"fusion-text fusion-text-17\" style=\"--awb-content-alignment:center;\"><p>Este artigo foi publicado inicialmente no <strong>Medium.com<\/strong>.<br \/>\nSiga-nos em nosso Medium Blog !<\/p>\n<\/div><div style=\"text-align:center;\"><a class=\"fusion-button button-flat button-medium button-default fusion-button-default button-1 fusion-button-default-span fusion-button-default-type\" target=\"_blank\" rel=\"noopener noreferrer\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/a-manifesto-to-include-ml-engineers-in-your-data-science-projects-from-day-1-2964fdd25036\"><span class=\"fusion-button-text awb-button__text awb-button__text--default\">Leia nosso artigo<\/span><\/a><\/div><\/div><\/div><\/div><\/div><\/p>","protected":false},"excerpt":{"rendered":"<p>Jeffrey Kane, cientista s\u00eanior do Data, explica por que o engenheiro de ML deve estar em seus projetos cient\u00edficos do data desde o primeiro dia.<\/p>","protected":false},"featured_media":68674,"parent":0,"template":"","meta":{"_acf_changed":false,"ep_exclude_from_search":false},"blog-category":[21939],"blog-language":[2991],"class_list":["post-67163","blog","type-blog","status-publish","has-post-thumbnail","hentry","blog-category-medium","blog-language-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog\/67163","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/types\/blog"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/media\/68674"}],"wp:attachment":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/media?parent=67163"}],"wp:term":[{"taxonomy":"blog-category","embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog-category?post=67163"},{"taxonomy":"blog-language","embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog-language?post=67163"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}