	{"id":64449,"date":"2021-10-26T16:06:03","date_gmt":"2021-10-26T15:06:03","guid":{"rendered":"https:\/\/www.artefact.com\/?post_type=news&#038;p=64449"},"modified":"2024-09-20T17:45:46","modified_gmt":"2024-09-20T16:45:46","slug":"serving-ml-models-at-scale-using-mlflow-on-kubernetes-part-2-2","status":"publish","type":"blog","link":"https:\/\/www.artefact.com\/br\/blog\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-part-2-2\/","title":{"rendered":"Servindo modelos de ML em escala usando o Mlflow no Kubernetes - Parte 3"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling article-author\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_2 1_2 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:50%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:50%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Autor<\/h2><\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27150%27%20height%3D%270%27%20viewBox%3D%270%200%20150%200%27%3E%3Crect%20width%3D%27150%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/kais-laribi.jpeg\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left article-author-image\" style=\"width: 150px; border-radius: 54% 46% 77% 23% \/ 74% 40% 60% 26%; overflow: hidden;\" width=\"150\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-three article-author-name-title\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Kais Laribi<\/h3><\/div><div class=\"fusion-text fusion-text-1 article-author-description\"><p>Cientista s\u00eanior do Data no Artefact<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center fusion-column-inner-bg-wrapper\" style=\"--awb-padding-top:20px;--awb-padding-right:20px;--awb-padding-bottom:20px;--awb-padding-left:20px;--awb-overflow:hidden;--awb-inner-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-top:1px;--awb-border-right:1px;--awb-border-bottom:1px;--awb-border-left:1px;--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-inner-bg-border-radius:4px 4px 4px 4px;--awb-inner-bg-overflow:hidden;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><span class=\"fusion-column-inner-bg hover-type-none\"><a class=\"fusion-column-anchor\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-a83390718a92\" rel=\"noopener noreferrer\" target=\"_blank\"><span class=\"fusion-column-inner-bg-image\"><\/span><\/a><\/span><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-row fusion-flex-align-items-center\"><div class=\"fusion-text fusion-text-2\"><p><u>Leia nosso artigo sobre<\/u><\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><img decoding=\"async\" width=\"4000\" height=\"992\" title=\"M\u00e9dio Blog\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" alt class=\"lazyload img-responsive wp-image-60582\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%274000%27%20height%3D%27992%27%20viewBox%3D%270%200%204000%20992%27%3E%3Crect%20width%3D%274000%27%20height%3D%27992%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-200x50.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-400x99.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-600x149.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-800x198.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-1200x298.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png 4000w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 4000px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-3\"><p>.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-4 description\"><p>Este artigo \u00e9 a terceira parte de uma s\u00e9rie na qual percorremos o processo de registro de modelos usando o Mlflow, servindo-os no mecanismo Kubernetes e, por fim, dimensionando-os de acordo com as necessidades do nosso aplicativo. Embora este artigo possa ser usado de forma independente para testar qualquer resposta de API, recomendamos a leitura de nossos dois artigos anteriores (parte 1 e parte 2) sobre como implantar uma inst\u00e2ncia de rastreamento e servir um modelo como uma API com o Mlflow. A seguir, estaremos interessados na quest\u00e3o do dimensionamento e a abordaremos com alguns experimentos para entender o comportamento do cluster do k8s e dar recomenda\u00e7\u00f5es sobre como lidar com altas cargas.<\/p>\n<\/div><\/div><\/div><\/div><\/div><article class=\"fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-one\" style=\"--awb-margin-bottom-small:8px;\"><h1 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:70;line-height:1;\">Parte 3 - Como lidar com altas cargas e tornar nosso aplicativo escalon\u00e1vel?<\/h1><\/div><div class=\"fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Introdu\u00e7\u00e3o<\/h2><\/div><div class=\"fusion-text fusion-text-5\"><p>Em um cen\u00e1rio cl\u00e1ssico em que um modelo de aprendizado de m\u00e1quina \u00e9 implementado por tr\u00e1s de um aplicativo ou produto, v\u00e1rios usu\u00e1rios podem interagir com ele simultaneamente para gerar previs\u00f5es. Portanto, \u00e9 essencial analisar nossos recursos de infraestrutura e dimension\u00e1-los de acordo. Isso se torna particularmente interessante no que diz respeito ao Kubernetes, pois pode afetar as decis\u00f5es sobre o uso ou n\u00e3o do dimensionamento autom\u00e1tico, o n\u00famero m\u00e1ximo de n\u00f3s a ser considerado...<\/p>\n<\/div><div class=\"fusion-text fusion-text-6\"><p>Nesse contexto, os testes de carga permitem simular v\u00e1rios n\u00fameros simult\u00e2neos ou incrementais de solicita\u00e7\u00f5es e monitorar o comportamento da infraestrutura (tempo de resposta, uso da CPU, uso da mem\u00f3ria...) para dimensionar corretamente os recursos e evitar gargalos. Esses testes ser\u00e3o realizados aqui usando uma ferramenta chamada Locust.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Prepara\u00e7\u00e3o do ambiente<\/h2><\/div><div class=\"fusion-text fusion-text-7\"><p>Os requisitos para este Hands-on est\u00e3o detalhados no primeiro artigo desta s\u00e9rie, mas, como resumo, aqui est\u00e3o os principais elementos de que precisamos especificamente para esta parte, supondo que nosso modelo j\u00e1 esteja implantado como uma API em um cluster do Kubernetes (mlflow-k8s).<\/p>\n<p>Para esta parte da pr\u00e1tica, precisaremos do senhor:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-1 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">Um cluster GKE para implantar o Locust (aqui vamos cham\u00e1-lo de\u00a0<em class=\"jz\">teste_de_carga<\/em>)<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">Uma esta\u00e7\u00e3o de trabalho local configurada (gcloud, kubectl)<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">A seguinte vari\u00e1vel de ambiente foi exportada<\/p>\n<pre class=\"hp hq hr hs ht lb gv be\"><span id=\"c4c0\" class=\"ej lc ii dm ld b le lf lg s lh\" data-selectable-paragraph=\"\">export GCR_REPO=eu.gcr.io\/mlflow-on-k8s\/repo<\/span><\/pre>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\"><a class=\"bv ig\" href=\"https:\/\/github.com\/artefactory-global\/mlflow-serving-example\" target=\"_blank\" rel=\"noopener ugc nofollow\">O reposit\u00f3rio<\/a>\u00a0Onde est\u00e1 o c\u00f3digo pr\u00e1tico<\/div><\/li><\/ul><div class=\"fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Implanta\u00e7\u00e3o<\/h2><\/div><div class=\"fusion-title title fusion-title-7 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">1. Crie a imagem do docker do Locust e envie a imagem do Locust para o GCR<\/h3><\/div><div class=\"fusion-text fusion-text-8\"><pre class=\"hp hq hr hs ht lb gv be\"><span id=\"df7b\" class=\"ej lc ii dm ld b le lf lg s lh\" data-selectable-paragraph=\"\">cd mlflow-serving-example<\/span><span id=\"f519\" class=\"ej lc ii dm ld b le lk ll lm ln lo lg s lh\" data-selectable-paragraph=\"\">docker build --tag $\/locust-tasks:v1\narquivo dockerfile_locust .<\/span><span id=\"dcc8\" class=\"ej lc ii dm ld b le lk ll lm ln lo lg s lh\" data-selectable-paragraph=\"\">docker push $\/locust-tasks:v1<\/span><\/pre>\n<\/div><div class=\"fusion-title title fusion-title-8 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">2. Preparar a tarefa de teste<\/h3><\/div><div class=\"fusion-text fusion-text-9\"><p>As tarefas s\u00e3o fun\u00e7\u00f5es python que o Locust executar\u00e1 em seus trabalhadores como parte do teste de carga, no c\u00f3digo de exemplo fornecido em\u00a0<em>locust-tasks\/tasks.py<\/em>\u00a0S\u00f3 precisamos enviar uma solicita\u00e7\u00e3o POST para a API com uma linha data para obter as previs\u00f5es.<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27667%27%20height%3D%270%27%20viewBox%3D%270%200%20667%200%27%3E%3Crect%20width%3D%27667%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/Capture-de\u0301cran-2021-10-26-a\u0300-18.11.10.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 667px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"667\" height=\"auto\" \/><div class=\"fusion-text fusion-text-10\"><p>Neste trecho de c\u00f3digo :<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-2 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em><strong>on_start<\/strong><strong>:<\/strong><\/em><em class=\"jz\"> \u00e9\u00a0<\/em>executado apenas uma vez quando o thread \u00e9 iniciado para fazer o download do dataset.<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\"><em><strong>post_metrics<\/strong><\/em>O senhor tem uma fun\u00e7\u00e3o que envia uma linha para o endpoint \/invocation.<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-11\"><p>Podemos criar tantas fun\u00e7\u00f5es quantos forem os testes que quisermos realizar. Por exemplo, podemos adicionar uma para enviar lotes de data. Al\u00e9m disso, podemos usar a fun\u00e7\u00e3o\u00a0<strong>@task()<\/strong>\u00a0para dar prioridade \u00e0s diferentes tarefas.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-9 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">3. Implantar no Kubernetes<\/h3><\/div><div class=\"fusion-text fusion-text-12\"><p>Agora \u00e9 hora de implementar a imagem e executar o Locust em seu cluster dedicado. Primeiro, certifique-se de que o contexto esteja definido no\u00a0<em>teste_de_carga<\/em>\u00a0executando<\/p>\n<\/div><div class=\"fusion-text fusion-text-13\"><p>kubectl config get-contexts<br \/>\nkubectl config use-context NAME<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27667%27%20height%3D%270%27%20viewBox%3D%270%200%20667%200%27%3E%3Crect%20width%3D%27667%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-2.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 667px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"667\" height=\"auto\" \/><div class=\"fusion-text fusion-text-14\"><p>Em seguida, podemos atualizar nosso arquivo de implanta\u00e7\u00e3o\u00a0<em>deployments\/locust_load_test.yaml\u00a0<\/em>especificando\u00a0<strong>o caminho da imagem no GCR<\/strong>e<strong class=\"jh lj\">\u00a0<\/strong>apontando o\u00a0<strong>TARGET_HOST<\/strong>\u00a0para o endere\u00e7o da API.<\/p>\n<\/div><div class=\"fusion-text fusion-text-15\"><div class=\"code\">tipo: ReplicationController<br \/>\napiVersion: v1<br \/>\nmetadata:<br \/>\nnome: locust-master<br \/>\netiquetas:<br \/>\nnome: locust<br \/>\nfun\u00e7\u00e3o: mestre<br \/>\nespec:<br \/>\nr\u00e9plicas: 1<br \/>\nseletor:<br \/>\nnome: locust<br \/>\nfun\u00e7\u00e3o: mestre<br \/>\nmodelo:<br \/>\nmetadata:<br \/>\netiquetas:<br \/>\nnome: locust<br \/>\nfun\u00e7\u00e3o: mestre<br \/>\nespec:<br \/>\ncont\u00eaineres:<br \/>\n- nome: locust<br \/>\nimagem: GCR_REPO\/locust-tasks:v1 # Alterar aqui<br \/>\nenv:<br \/>\n- nome: LOCUST_MODE<br \/>\nvalor: mestre<br \/>\n- nome: TARGET_HOST<br \/>\nvalor: \u2018http:\/\/SERVING_IP:SERVING_PORT\u2019 # Altere aqui<br \/>\nportos:<br \/>\n- nome: loc-master-web<br \/>\nporta do cont\u00eainer: 8089<br \/>\nprotocolo: TCP<br \/>\n- nome: loc-master-p1<br \/>\nporta do cont\u00eainer: 5557<br \/>\nprotocolo: TCP<br \/>\n- nome: loc-master-p2<br \/>\nporta do cont\u00eainer: 5558<br \/>\nprotocolo: TCP<br \/>\n-<br \/>\ntipo: ReplicationController<br \/>\napiVersion: v1<br \/>\nmetadata:<br \/>\nnome: locust-worker<br \/>\netiquetas:<br \/>\nnome: locust<br \/>\nfun\u00e7\u00e3o: trabalhador<br \/>\nespec:<br \/>\nr\u00e9plicas: 30<br \/>\nseletor:<br \/>\nnome: locust<br \/>\nfun\u00e7\u00e3o: trabalhador<br \/>\nmodelo:<br \/>\nmetadata:<br \/>\netiquetas:<br \/>\nnome: locust<br \/>\nfun\u00e7\u00e3o: trabalhador<br \/>\nespec:<br \/>\ncont\u00eaineres:<br \/>\n- nome: locust<br \/>\nimagem: GCR_REPO\/locust-tasks:v1 # Alterar aqui<br \/>\nenv:<br \/>\n- nome: LOCUST_MODE<br \/>\nvalor: trabalhador<br \/>\n- nome: LOCUST_MASTER<br \/>\nvalor: locust-master<br \/>\n- nome: TARGET_HOST<br \/>\nvalor: \u2018http:\/\/SERVING_IP:SERVING_PORT\u2019 # Altere aqui<br \/>\n-<br \/>\nTipo: Servi\u00e7o<br \/>\napiVersion: v1<br \/>\nmetadata:<br \/>\nnome: locust-master<br \/>\netiquetas:<br \/>\nnome: locust<br \/>\nfun\u00e7\u00e3o: mestre<br \/>\nespec:<br \/>\nportos:<br \/>\n- porto: 8089<br \/>\ntargetPort: loc-master-web<br \/>\nprotocolo: TCP<br \/>\nnome: loc-master-web<br \/>\n- porto: 5557<br \/>\ntargetPort: loc-master-p1<br \/>\nprotocolo: TCP<br \/>\nnome: loc-master-p1<br \/>\n- porto: 5558<br \/>\ntargetPort: loc-master-p2<br \/>\nprotocolo: TCP<br \/>\nnome: loc-master-p2<br \/>\nseletor:<br \/>\nnome: locust<br \/>\nfun\u00e7\u00e3o: mestre<br \/>\nTipo: LoadBalancer<\/div>\n<\/div>\nTipo: ReplicationController\napiVersion: v1\nmetadata:\n  nome: locust-master\n  etiquetas:\n    nome: locust\n    fun\u00e7\u00e3o: master\nspec:\n  r\u00e9plicas: 1\n  seletor:\n    name: locust\n    role: master\n  template:\n    metadata:\n      etiquetas:\n        nome: locust\n        fun\u00e7\u00e3o: master\n    spec:\n      cont\u00eaineres:\n        - nome: locust\n          imagem: GCR_REPO\/locust-tasks:v1 # Alterar aqui\n          env:\n            - name: LOCUST_MODE\n              valor: master\n            - name: TARGET_HOST\n              value: 'http:\/\/SERVING_IP:SERVING_PORT' # Altere aqui\n          ports:\n            - name: loc-master-web\n              containerPort: 8089\n              protocol: TCP\n            - nome: loc-master-p1\n              porta do cont\u00eainer: 5557\n              protocolo: TCP\n            - nome: loc-master-p2\n              porta do cont\u00eainer: 5558\n              protocolo: TCP\n---\ntipo: ReplicationController\napiVersion: v1\nmetadata:\n  nome: locust-worker\n  etiquetas:\n    nome: locust\n    fun\u00e7\u00e3o: worker\nspec:\n  r\u00e9plicas: 30\n  seletor:\n    name: locust\n    fun\u00e7\u00e3o: worker\n  template:\n    metadata:\n      etiquetas:\n        nome: locust\n        fun\u00e7\u00e3o: trabalhador\n    spec:\n      containers:\n        - name: locust\n          imagem: GCR_REPO\/locust-tasks:v1 # Altere aqui\n          env:\n            - name: LOCUST_MODE\n              valor: worker\n            - name: LOCUST_MASTER\n              valor: locust-master\n            - name: TARGET_HOST\n              value: 'http:\/\/SERVING_IP:SERVING_PORT' # Altere aqui\n---\ntipo: Servi\u00e7o\napiVersion: v1\nmetadata:\n  nome: locust-master\n  etiquetas:\n    nome: locust\n    fun\u00e7\u00e3o: master\nspec:\n  ports:\n    - port: 8089\n      targetPort: loc-master-web\n      protocolo: TCP\n      nome: loc-master-web\n    - port: 5557\n      targetPort: loc-master-p1\n      protocolo: TCP\n      nome: loc-master-p1\n    - port: 5558\n      targetPort: loc-master-p2\n      protocolo: TCP\n      nome: loc-master-p2\n  seletor:\n    nome: locust\n    fun\u00e7\u00e3o: master\n  type: LoadBalancer<div class=\"fusion-text fusion-text-16\"><p>Por fim, vamos implement\u00e1-lo usando o seguinte comando.<\/p>\n<pre class=\"hp hq hr hs ht lb gv be\"><span id=\"30de\" class=\"ej lc ii dm ld b le lf lg s lh\" data-selectable-paragraph=\"\">kubectl create -f deployments\/locust_load_test.yaml<\/span><\/pre>\n<p>A inst\u00e2ncia do Locust j\u00e1 deve estar funcionando e um novo balanceador de carga deve ter sido criado. Podemos encontrar seu IP digitando <em>kubectl get services\u00a0<\/em>e acessar a interface usando o LoadbalancerIP:8089<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27701%27%20height%3D%270%27%20viewBox%3D%270%200%20701%200%27%3E%3Crect%20width%3D%27701%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-3.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 701px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"701\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-10 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Experimenta\u00e7\u00e3o<\/h2><\/div><div class=\"fusion-text fusion-text-17\"><p>A ideia \u00e9 usar o Locust para simular consultas paralelas em nossa API de servi\u00e7o e analisar o comportamento do cluster e o tempo de resposta (mediana em verde e 95\u00ba percentil em laranja). Isso \u00e9 feito para fins educacionais para destacar dois recursos que o Kubernetes oferece, que s\u00e3o o dimensionamento (autom\u00e1tico) horizontal e vertical.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-11 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">1. Dimensionamento manual<\/h3><\/div><div class=\"fusion-text fusion-text-18\"><p>No primeiro experimento, tentamos entender o efeito de\u00a0<strong>ter mais vagens <\/strong>servindo nossos modelos. Come\u00e7amos com um pod e tentamos aumentar o n\u00famero de solicita\u00e7\u00f5es. No gr\u00e1fico abaixo, podemos diferenciar 4 fases com diferentes configura\u00e7\u00f5es e cobran\u00e7as.<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27707%27%20height%3D%270%27%20viewBox%3D%270%200%20707%200%27%3E%3Crect%20width%3D%27707%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-4.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 707px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"707\" height=\"auto\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27686%27%20height%3D%270%27%20viewBox%3D%270%200%20686%200%27%3E%3Crect%20width%3D%27686%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-5.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 686px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"686\" height=\"auto\" \/><div class=\"fusion-text fusion-text-19\"><p>Como conclus\u00e3o geral, podemos ver que \u00e9 importante monitorar sempre as m\u00e9tricas de recursos (CPU, RAM...) para identificar gargalos e problemas de configura\u00e7\u00e3o. Em nosso caso, ter apenas um pod n\u00e3o nos permitiu aproveitar o poder de processamento dispon\u00edvel. Portanto, ao implantar um aplicativo, \u00e9 essencial definir um n\u00famero adequado de pods e definir recursos suficientes por pod para maximizar o uso da m\u00e1quina, levando em considera\u00e7\u00e3o os servi\u00e7os do sistema em execu\u00e7\u00e3o no backend. Portanto, recomendamos n\u00e3o aumentar o uso da CPU dos n\u00f3s para mais de 80-90%.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-12 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">2. Escala autom\u00e1tica horizontal<\/h3><\/div><div class=\"fusion-text fusion-text-20\"><p>Bem, felizmente, o Kubernetes tem um\u00a0<strong>recurso de dimensionamento horizontal autom\u00e1tico<\/strong>\u00a0para monitorar automaticamente o uso da CPU e criar novos pods quando necess\u00e1rio para distribuir a carga. Isso pode ser ativado simplesmente com o seguinte comando.<\/p>\n<pre class=\"hp hq hr hs ht lb gv be\"><span id=\"a7cd\" class=\"ej lc ii dm ld b le lf lg s lh\" data-selectable-paragraph=\"\">kubectl autoscale deployment mlflow-serving --cpu-percent=80 --min=1 --max=12<\/span><\/pre>\n<p>Em seguida, podemos monitorar o n\u00famero e os estados dos pods usando\u00a0<em>kubectl get hpa mlflow-serving<\/em>, O senhor pode, por exemplo, analisar o tempo de resposta do cluster e o consumo de recursos.<br \/>\nO objetivo do experimento a seguir \u00e9 observar como o Kubernetes pode adicionar pods automaticamente para otimizar o uso de recursos e ter um tempo de resposta melhor. Podemos dividir esse experimento em tr\u00eas fases, conforme mostrado no gr\u00e1fico abaixo.<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27686%27%20height%3D%270%27%20viewBox%3D%270%200%20686%200%27%3E%3Crect%20width%3D%27686%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-6.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 686px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"686\" height=\"auto\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27671%27%20height%3D%270%27%20viewBox%3D%270%200%20671%200%27%3E%3Crect%20width%3D%27671%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-7.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 671px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"671\" height=\"auto\" \/><div class=\"fusion-text fusion-text-21\"><p>Nesse segundo experimento, notamos que o dimensionamento autom\u00e1tico horizontal nos permitiu diminuir o tempo de resposta criando novos pods e alocando mais recursos do cluster. No entanto, ao atingir a capacidade do cluster (fase 3), os novos pods permanecem em um estado pendente e nosso tempo de resposta aumenta novamente.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-13 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">3. Escala autom\u00e1tica vertical<\/h3><\/div><div class=\"fusion-text fusion-text-22\"><p>Em tal situa\u00e7\u00e3o, podemos explorar outro recurso do Kubernetes conhecido como\u00a0<strong>escala autom\u00e1tica vertical\u00a0<\/strong>que consiste em alocar mais n\u00f3s sempre que for necess\u00e1rio. Esse recurso pode ser ativado usando o seguinte comando que especifica o n\u00famero m\u00ednimo e m\u00e1ximo de n\u00f3s que o Kubernetes pode alocar.<\/p>\n<pre class=\"hp hq hr hs ht lb gv be\"><span id=\"7f33\" class=\"ej lc ii dm ld b le lf lg s lh\" data-selectable-paragraph=\"\">clusters de cont\u00eaineres gcloud update mlflow-k8s\n--enable-autoscaling --min-nodes 3 --max-nodes 5 --node-pool POOL_NAME<\/span><\/pre>\n<p>Por fim, nesse \u00faltimo experimento resumido no gr\u00e1fico abaixo, a ativa\u00e7\u00e3o do recurso de dimensionamento autom\u00e1tico vertical permitiu que o Kubernetes adicionasse automaticamente dois novos n\u00f3s e criasse novos pods para despachar a carga e garantir um tempo de resposta menor. Na verdade, o Kubernetes levou cerca de 1 minuto para detectar a necessidade e criar os recursos (fase 2). Al\u00e9m disso, com carga menor (fase 3), o Kubernetes conseguiu liberar os dois novos n\u00f3s eliminando pods e reduzir o cluster para um m\u00ednimo de tr\u00eas n\u00f3s em cerca de 15 minutos.<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27661%27%20height%3D%270%27%20viewBox%3D%270%200%20661%200%27%3E%3Crect%20width%3D%27661%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-8.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 661px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"661\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-14 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">4. Estimativa do tamanho do cluster<\/h3><\/div><div class=\"fusion-text fusion-text-23\"><p>Agora que j\u00e1 entendemos como o Kubernetes se comporta em resposta a diferentes n\u00edveis de carga usando os recursos de dimensionamento autom\u00e1tico vertical e horizontal, a etapa final \u00e9 realizar testes de desempenho com diferentes recursos, levando em considera\u00e7\u00e3o os requisitos do nosso aplicativo e a estimativa do n\u00famero de usu\u00e1rios. Vamos imaginar que, para atender aos nossos requisitos de SLA, o tempo de resposta do percentil 95 deve ser inferior a 1 segundo. Nesse caso, podemos tra\u00e7ar o gr\u00e1fico abaixo mostrando o tempo de resposta da API para diferentes n\u00fameros de n\u00facleos e ter uma ideia do desempenho do nosso aplicativo em diferentes condi\u00e7\u00f5es.<\/p>\n<p>Em particular, para nosso modelo de ML servido com o Mlflow, podemos ter cerca de 120 usu\u00e1rios simult\u00e2neos em um cluster Kubernetes de 12 n\u00facleos e garantir um tempo de resposta abaixo de 1 segundo.<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27625%27%20height%3D%270%27%20viewBox%3D%270%200%20625%200%27%3E%3Crect%20width%3D%27625%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-9.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 625px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"625\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-15 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Conclus\u00e3o<\/h2><\/div><div class=\"fusion-text fusion-text-24\"><p>Em uma s\u00e9rie de artigos, passamos por todo o processo de implanta\u00e7\u00e3o da inst\u00e2ncia de rastreamento do Mlflow e servimos um modelo como uma API no mecanismo do Kubernetes, aproveitando sua capacidade de aumentar a escala facilmente e lidar com altas cargas. Tamb\u00e9m fizemos experi\u00eancias com dois recursos interessantes que o Kubernetes oferece, que s\u00e3o o dimensionamento autom\u00e1tico horizontal e vertical, e mostramos que \u00e9 sempre interessante monitorar nossos recursos para garantir que os estamos usando de forma eficiente. Por fim, mostramos como poder\u00edamos testar nosso aplicativo e tomar decis\u00f5es sobre a infraestrutura com base em sua resposta a diferentes cen\u00e1rios de teste.<\/p>\n<\/div><\/div><\/div><\/div><\/article><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-5 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center fusion-column-inner-bg-wrapper\" style=\"--awb-padding-top:40px;--awb-padding-right:40px;--awb-padding-bottom:40px;--awb-padding-left:40px;--awb-overflow:hidden;--awb-inner-bg-position:left center;--awb-inner-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-inner-bg-border-radius:4px 4px 4px 4px;--awb-inner-bg-overflow:hidden;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><span class=\"fusion-column-inner-bg hover-type-none\"><a class=\"fusion-column-anchor\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-a83390718a92\" target=\"_blank\" rel=\"noopener\"><span class=\"fusion-column-inner-bg-image lazyload\" data-bg=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\"><\/span><\/a><\/span><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-column fusion-column-has-bg-image\" data-bg-url=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\"><div class=\"fusion-image-element\" style=\"text-align:center;--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img decoding=\"async\" width=\"72\" height=\"41\" title=\"m\u00e9dio\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%2772%27%20height%3D%2741%27%20viewBox%3D%270%200%2072%2041%27%3E%3Crect%20width%3D%2772%27%20height%3D%2741%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/medium.png\" alt class=\"lazyload img-responsive wp-image-60927\"\/><\/span><\/div><div class=\"fusion-title title fusion-title-16 fusion-sep-none fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-margin-top:20px;--awb-margin-bottom:0px;--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-center fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">M\u00e9dia Blog por Artefact.<\/h3><\/div><div class=\"fusion-text fusion-text-25\" style=\"--awb-content-alignment:center;\"><p>Este artigo foi publicado inicialmente no <strong>Medium.com<\/strong>.<br \/>\nSiga-nos em nosso Medium Blog !<\/p>\n<\/div><div style=\"text-align:center;\"><a class=\"fusion-button button-flat button-medium button-default fusion-button-default button-1 fusion-button-default-span fusion-button-default-type\" target=\"_blank\" rel=\"noopener noreferrer\" title=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-a83390718a92\" aria-label=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-a83390718a92\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-bf27258775e7\"><span class=\"fusion-button-text awb-button__text awb-button__text--default\">Leia nosso artigo<\/span><\/a><\/div><\/div><\/div><\/div><\/div><\/p>","protected":false},"excerpt":{"rendered":"<p>Este artigo \u00e9 a terceira parte de uma s\u00e9rie na qual percorremos o processo de registro de modelos usando o Mlflow, servindo-os no mecanismo Kubernetes e, por fim, dimensionando-os de acordo com as necessidades do nosso aplicativo. Embora este artigo possa ser usado de forma independente para testar qualquer resposta de API, recomendamos a leitura de nossos dois artigos anteriores (parte 1 e parte 2) sobre como implantar uma inst\u00e2ncia de rastreamento e servir um modelo como uma API com o Mlflow. A seguir, estaremos interessados na quest\u00e3o do dimensionamento e a abordaremos com alguns experimentos para entender o comportamento do cluster do k8s e dar recomenda\u00e7\u00f5es sobre como lidar com altas cargas.<\/p>","protected":false},"featured_media":68688,"parent":0,"template":"","meta":{"_acf_changed":false,"ep_exclude_from_search":false},"blog-category":[21939],"blog-language":[2991],"class_list":["post-64449","blog","type-blog","status-publish","has-post-thumbnail","hentry","blog-category-medium","blog-language-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog\/64449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/types\/blog"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/media\/68688"}],"wp:attachment":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/media?parent=64449"}],"wp:term":[{"taxonomy":"blog-category","embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog-category?post=64449"},{"taxonomy":"blog-language","embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog-language?post=64449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}