{"id":64449,"date":"2021-10-26T16:06:03","date_gmt":"2021-10-26T15:06:03","guid":{"rendered":"https:\/\/www.artefact.com\/?post_type=news&#038;p=64449"},"modified":"2024-09-20T17:45:46","modified_gmt":"2024-09-20T16:45:46","slug":"serving-ml-models-at-scale-using-mlflow-on-kubernetes-part-2-2","status":"publish","type":"blog","link":"https:\/\/www.artefact.com\/nl\/blog\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-part-2-2\/","title":{"rendered":"ML-modellen op schaal serveren met Mlflow op Kubernetes - Deel 3"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling article-author\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_2 1_2 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:50%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:50%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Auteur<\/h2><\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27150%27%20height%3D%270%27%20viewBox%3D%270%200%20150%200%27%3E%3Crect%20width%3D%27150%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/kais-laribi.jpeg\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left article-author-image\" style=\"width: 150px; border-radius: 54% 46% 77% 23% \/ 74% 40% 60% 26%; overflow: hidden;\" width=\"150\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-three article-author-name-title\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Kais Laribi<\/h3><\/div><div class=\"fusion-text fusion-text-1 article-author-description\"><p>Senior Data Wetenschapper bij Artefact<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center fusion-column-inner-bg-wrapper\" style=\"--awb-padding-top:20px;--awb-padding-right:20px;--awb-padding-bottom:20px;--awb-padding-left:20px;--awb-overflow:hidden;--awb-inner-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-top:1px;--awb-border-right:1px;--awb-border-bottom:1px;--awb-border-left:1px;--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-inner-bg-border-radius:4px 4px 4px 4px;--awb-inner-bg-overflow:hidden;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><span class=\"fusion-column-inner-bg hover-type-none\"><a class=\"fusion-column-anchor\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-a83390718a92\" rel=\"noopener noreferrer\" target=\"_blank\"><span class=\"fusion-column-inner-bg-image\"><\/span><\/a><\/span><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-row fusion-flex-align-items-center\"><div class=\"fusion-text fusion-text-2\"><p><u>Lees ons artikel over<\/u><\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><img decoding=\"async\" width=\"4000\" height=\"992\" title=\"Medium Blog\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" alt class=\"lazyload img-responsive wp-image-60582\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%274000%27%20height%3D%27992%27%20viewBox%3D%270%200%204000%20992%27%3E%3Crect%20width%3D%274000%27%20height%3D%27992%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-200x50.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-400x99.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-600x149.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-800x198.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-1200x298.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png 4000w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 4000px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-3\"><p>.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-4 description\"><p>Dit artikel is het derde deel van een serie waarin we het proces doorlopen van het loggen van modellen met behulp van Mlflow, het serveren ervan op Kubernetes engine en uiteindelijk het opschalen ervan volgens de behoeften van onze applicatie. Hoewel dit artikel onafhankelijk gebruikt kan worden om elke API-respons te testen, raden wij aan om onze twee vorige artikelen (deel1 en deel2) te lezen over hoe u een volginstantie kunt implementeren en een model als API kunt serveren met Mlflow. In het volgende zullen we ge\u00efnteresseerd zijn in het schaalbaarheidsprobleem en dit aanpakken met enkele experimenten om het gedrag van het k8s cluster te begrijpen en aanbevelingen te geven over hoe om te gaan met hoge belastingen.<\/p>\n<\/div><\/div><\/div><\/div><\/div><article class=\"fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-one\" style=\"--awb-margin-bottom-small:8px;\"><h1 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:70;line-height:1;\">Deel 3 - Hoe hoge belastingen verwerken en onze applicatie schaalbaar maken?<\/h1><\/div><div class=\"fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Inleiding<\/h2><\/div><div class=\"fusion-text fusion-text-5\"><p>In een klassiek scenario waarbij een machine-learningmodel wordt ingezet achter een applicatie of een product, kunnen meerdere gebruikers er tegelijkertijd mee interageren om voorspellingen te genereren. Daarom is het essentieel om de mogelijkheden van onze infrastructuur te analyseren en deze dienovereenkomstig te dimensioneren. Dit wordt met name interessant voor Kubernetes, omdat het van invloed kan zijn op beslissingen over het al dan niet gebruiken van autoscaling, het maximale aantal knooppunten waarmee rekening moet worden gehouden...<\/p>\n<\/div><div class=\"fusion-text fusion-text-6\"><p>In deze context maken laadtesten het mogelijk om meerdere gelijktijdige of oplopende aantallen aanvragen te simuleren en het gedrag van de infrastructuur (responstijd, CPU-gebruik, geheugengebruik...) te bewaken om bronnen correct te dimensioneren en knelpunten te vermijden. Die tests zullen hier worden uitgevoerd met een tool genaamd Locust.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Milieuvoorbereiding<\/h2><\/div><div class=\"fusion-text fusion-text-7\"><p>De vereisten voor deze hands-on zijn gedetailleerd in het eerste artikel van deze serie, maar als samenvatting zijn hier de belangrijkste elementen die we specifiek voor dit deel nodig hebben, ervan uitgaande dat ons model al als API op een Kubernetes-cluster (mlflow-k8s) is ge\u00efmplementeerd.<\/p>\n<p>Voor dit deel van de hands-on hebben we nodig:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-1 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">Een GKE-cluster om Locust te implementeren (hier zullen we het noemen\u00a0<em class=\"jz\">belasting_testen<\/em>)<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">Een geconfigureerd lokaal werkstation (gcloud, kubectl)<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">De volgende omgevingsvariabele ge\u00ebxporteerd<\/p>\n<pre class=\"hp hq hr hs ht lb gv be\"><span id=\"c4c0\" class=\"ej lc ii dm ld b le lf lg s lh\" data-selectable-paragraph=\"\">export GCR_REPO=eu.gcr.io\/mlflow-on-k8s\/repo<\/span><\/pre>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\"><a class=\"bv ig\" href=\"https:\/\/github.com\/artefactory-global\/mlflow-serving-example\" target=\"_blank\" rel=\"noopener ugc nofollow\">De opslagplaats<\/a>\u00a0waar de hands-on code woont<\/div><\/li><\/ul><div class=\"fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Inzet<\/h2><\/div><div class=\"fusion-title title fusion-title-7 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">1. Bouw Locust docker image en push de Locust image naar GCR<\/h3><\/div><div class=\"fusion-text fusion-text-8\"><pre class=\"hp hq hr hs ht lb gv be\"><span id=\"df7b\" class=\"ej lc ii dm ld b le lf lg s lh\" data-selectable-paragraph=\"\">cd mlflow-serving-voorbeeld<\/span><span id=\"f519\" class=\"ej lc ii dm ld b le lk ll lm ln lo lg s lh\" data-selectable-paragraph=\"\">docker build --tag $\/locust-tasks:v1\nbestand dockerfile_locust .<\/span><span id=\"dcc8\" class=\"ej lc ii dm ld b le lk ll lm ln lo lg s lh\" data-selectable-paragraph=\"\">docker push $\/locust-tasks:v1<\/span><\/pre>\n<\/div><div class=\"fusion-title title fusion-title-8 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">2. De testtaak voorbereiden<\/h3><\/div><div class=\"fusion-text fusion-text-9\"><p>Taken zijn pythonfuncties die Locust zal uitvoeren op zijn werkers als onderdeel van de belastingtest, in de voorbeeldcode onder\u00a0<em>locust-tasks\/tasks.py<\/em>\u00a0hoeven we alleen maar een POST-verzoek naar de API te sturen met een data rij om voorspellingen te krijgen.<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27667%27%20height%3D%270%27%20viewBox%3D%270%200%20667%200%27%3E%3Crect%20width%3D%27667%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/Capture-de\u0301cran-2021-10-26-a\u0300-18.11.10.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 667px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"667\" height=\"auto\" \/><div class=\"fusion-text fusion-text-10\"><p>In dit codefragment :<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-2 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em><strong>op_start<\/strong><strong>:<\/strong><\/em><em class=\"jz\"> is\u00a0<\/em>slechts eenmaal uitgevoerd wanneer de thread wordt gestart om de dataset te downloaden.<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\"><em><strong>post_metingen<\/strong><\/em>: is de kern van onze testtaak, hier hebben we slechts \u00e9\u00e9n functie die \u00e9\u00e9n rij naar het \/invocation eindpunt stuurt.<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-11\"><p>We kunnen zoveel functies maken als testen die we willen uitvoeren. We kunnen er bijvoorbeeld \u00e9\u00e9n toevoegen om data batches te verzenden. We kunnen ook de\u00a0<strong>@taak()<\/strong>\u00a0decorator om prioriteit te geven aan de verschillende taken.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-9 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">3. Uitrollen naar Kubernetes<\/h3><\/div><div class=\"fusion-text fusion-text-12\"><p>Nu is het tijd om het image uit te rollen en Locust op zijn eigen cluster te draaien. Zorg er eerst voor dat de context is ingesteld op de\u00a0<em>belasting_testen<\/em>\u00a0cluster door<\/p>\n<\/div><div class=\"fusion-text fusion-text-13\"><p>kubectl config get-contexten<br \/>\nkubectl config gebruik-context NAAM<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27667%27%20height%3D%270%27%20viewBox%3D%270%200%20667%200%27%3E%3Crect%20width%3D%27667%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-2.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 667px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"667\" height=\"auto\" \/><div class=\"fusion-text fusion-text-14\"><p>Vervolgens kunnen we ons deployment-bestand bijwerken\u00a0<em>deployments\/locust_load_test.yaml\u00a0<\/em>door te specificeren\u00a0<strong>het afbeeldingspad op GCR<\/strong>en<strong class=\"jh lj\">\u00a0<\/strong>wijzend op de\u00a0<strong>DOEL_HOST<\/strong>\u00a0naar het API-adres.<\/p>\n<\/div><div class=\"fusion-text fusion-text-15\"><div class=\"code\">soort: ReplicationController<br \/>\napiVersie: v1<br \/>\nmetadata:<br \/>\nnaam: sprinkhaan-meester<br \/>\nlabels:<br \/>\nnaam: sprinkhaan<br \/>\nrol: meester<br \/>\nspec:<br \/>\nreplica's: 1<br \/>\nselector:<br \/>\nnaam: sprinkhaan<br \/>\nrol: meester<br \/>\nsjabloon:<br \/>\nmetadata:<br \/>\nlabels:<br \/>\nnaam: sprinkhaan<br \/>\nrol: meester<br \/>\nspec:<br \/>\ncontainers:<br \/>\n- naam: sprinkhaan<br \/>\nAfbeelding: GCR_REPO\/locust-tasks:v1 # Hier wijzigen<br \/>\nnl:<br \/>\n- naam: LOCUST_MODE<br \/>\nwaarde: master<br \/>\n- naam: TARGET_HOST<br \/>\nwaarde: \u2018http:\/\/SERVING_IP:SERVING_PORT\u2019 # Hier wijzigen<br \/>\nhavens:<br \/>\n- naam: loc-master-web<br \/>\ncontainerPort: 8089<br \/>\nprotocol: TCP<br \/>\n- naam: loc-master-p1<br \/>\ncontainerPort: 5557<br \/>\nprotocol: TCP<br \/>\n- naam: loc-master-p2<br \/>\ncontainerPort: 5558<br \/>\nprotocol: TCP<br \/>\n-<br \/>\nsoort: ReplicationController<br \/>\napiVersie: v1<br \/>\nmetadata:<br \/>\nnaam: sprinkhaanwerker<br \/>\nlabels:<br \/>\nnaam: sprinkhaan<br \/>\nrol: arbeider<br \/>\nspec:<br \/>\nreplica's: 30<br \/>\nselector:<br \/>\nnaam: sprinkhaan<br \/>\nrol: arbeider<br \/>\nsjabloon:<br \/>\nmetadata:<br \/>\nlabels:<br \/>\nnaam: sprinkhaan<br \/>\nrol: arbeider<br \/>\nspec:<br \/>\ncontainers:<br \/>\n- naam: sprinkhaan<br \/>\nAfbeelding: GCR_REPO\/locust-tasks:v1 # Hier wijzigen<br \/>\nnl:<br \/>\n- naam: LOCUST_MODE<br \/>\nwaarde: arbeider<br \/>\n- naam: LOCUST_MASTER<br \/>\nwaarde: sprinkhaan-meester<br \/>\n- naam: TARGET_HOST<br \/>\nwaarde: \u2018http:\/\/SERVING_IP:SERVING_PORT\u2019 # Hier wijzigen<br \/>\n-<br \/>\nSoort: Service<br \/>\napiVersie: v1<br \/>\nmetadata:<br \/>\nnaam: sprinkhaan-meester<br \/>\nlabels:<br \/>\nnaam: sprinkhaan<br \/>\nrol: meester<br \/>\nspec:<br \/>\nhavens:<br \/>\n- poort: 8089<br \/>\ndoelpoort: loc-master-web<br \/>\nprotocol: TCP<br \/>\nnaam: loc-master-web<br \/>\n- poort: 5557<br \/>\ndoelpoort: loc-master-p1<br \/>\nprotocol: TCP<br \/>\nnaam: loc-master-p1<br \/>\n- poort: 5558<br \/>\ndoelpoort: loc-master-p2<br \/>\nprotocol: TCP<br \/>\nnaam: loc-master-p2<br \/>\nselector:<br \/>\nnaam: sprinkhaan<br \/>\nrol: meester<br \/>\ntype: LoadBalancer<\/div>\n<\/div>\nsoort: ReplicatieController\napiVersie: v1\nmetadata:\n  naam: locust-master\n  labels:\n    naam: locust\n    rol: master\nspec:\n  replicas: 1\n  selector:\n    naam: sprinkhaan\n    rol: master\n  sjabloon:\n    metadata:\n      labels:\n        naam: sprinkhaan\n        rol: master\n    spec:\n      containers:\n        - naam: sprinkhaan\n          afbeelding: GCR_REPO\/locust-tasks:v1 # Hier wijzigen\n          env:\n            - naam: LOCUST_MODE\n              waarde: master\n            - naam: TARGET_HOST\n              waarde: 'http:\/\/SERVING_IP:SERVING_PORT' # Hier wijzigen\n          poorten:\n            - naam: loc-master-web\n              containerPort: 8089\n              protocol: TCP\n            - naam: loc-master-p1\n              containerPort: 5557\n              protocol: TCP\n            - naam: loc-master-p2\n              containerPort: 5558\n              protocol: TCP\n---\nsoort: ReplicationController\napiVersie: v1\nmetadata:\n  naam: sprinkhaan-werker\n  labels:\n    naam: sprinkhaan\n    rol: arbeider\nspec:\n  replicas: 30\n  selector:\n    naam: sprinkhaan\n    rol: werker\n  sjabloon:\n    metadata:\n      labels:\n        naam: sprinkhaan\n        rol: arbeider\n    spec:\n      containers:\n        - naam: sprinkhaan\n          afbeelding: GCR_REPO\/locust-tasks:v1 # Hier wijzigen\n          env:\n            - naam: LOCUST_MODE\n              waarde: worker\n            - naam: LOCUST_MASTER\n              waarde: locust-master\n            - naam: TARGET_HOST\n              waarde: 'http:\/\/SERVING_IP:SERVING_PORT' # Hier wijzigen\n---\nsoort: Service\napiVersie: v1\nmetadata:\n  naam: locust-master\n  labels:\n    naam: locust\n    rol: master\nspec:\n  poorten:\n    - poort: 8089\n      doelpoort: loc-master-web\n      protocol: TCP\n      naam: loc-master-web\n    - poort: 5557\n      doelpoort: loc-master-p1\n      protocol: TCP\n      naam: loc-master-p1\n    - poort: 5558\n      doelpoort: loc-master-p2\n      protocol: TCP\n      naam: loc-master-p2\n  selector:\n    naam: locust\n    rol: master\n  type: LoadBalancer<div class=\"fusion-text fusion-text-16\"><p>Laten we het tenslotte implementeren met het volgende commando.<\/p>\n<pre class=\"hp hq hr hs ht lb gv be\"><span id=\"30de\" class=\"ej lc ii dm ld b le lf lg s lh\" data-selectable-paragraph=\"\">kubectl create -f deployments\/locust_load_test.yaml<\/span><\/pre>\n<p>De Locust-instantie zou nu aan moeten staan en er zou een nieuwe loadbalancer moeten zijn aangemaakt. We kunnen het IP vinden door te typen <em>kubectl get diensten\u00a0<\/em>en krijg toegang tot de interface met LoadbalancerIP:8089<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27701%27%20height%3D%270%27%20viewBox%3D%270%200%20701%200%27%3E%3Crect%20width%3D%27701%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-3.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 701px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"701\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-10 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Experiment<\/h2><\/div><div class=\"fusion-text fusion-text-17\"><p>Het idee is om Locust te gebruiken om parallelle queries op onze API te simuleren en het gedrag en de responstijd van het cluster te analyseren (mediaan in groen en 95e percentiel oranje). Dit wordt gedaan voor educatieve doeleinden om twee functies die Kubernetes biedt te benadrukken, namelijk horizontaal en verticaal (auto)schalen.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-11 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">1. Handmatig schalen<\/h3><\/div><div class=\"fusion-text fusion-text-18\"><p>In het eerste experiment proberen we het effect van\u00a0<strong>meer peulen hebben <\/strong>onze modellen bedienen. We beginnen met \u00e9\u00e9n pod en proberen het aantal aanvragen te verhogen. In de grafiek hieronder kunnen we 4 fasen onderscheiden met verschillende configuraties en kosten.<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27707%27%20height%3D%270%27%20viewBox%3D%270%200%20707%200%27%3E%3Crect%20width%3D%27707%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-4.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 707px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"707\" height=\"auto\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27686%27%20height%3D%270%27%20viewBox%3D%270%200%20686%200%27%3E%3Crect%20width%3D%27686%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-5.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 686px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"686\" height=\"auto\" \/><div class=\"fusion-text fusion-text-19\"><p>In het algemeen kunnen we zien dat het belangrijk is om altijd de metriek van de resources (CPU, RAM...) te controleren om knelpunten en configuratieproblemen op te sporen. In ons geval konden we met slechts \u00e9\u00e9n pod niet profiteren van de beschikbare rekenkracht. Bij het implementeren van een applicatie is het dus essentieel om een geschikt aantal pods in te stellen en voldoende bronnen per pod in te stellen om het machinegebruik te maximaliseren, rekening houdend met de systeemservices die in de backend draaien. Wij raden dus aan om het CPU-gebruik van de nodes niet hoger te zetten dan 80-90%.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-12 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">2. Horizontaal automatisch schalen<\/h3><\/div><div class=\"fusion-text fusion-text-20\"><p>Gelukkig heeft Kubernetes een\u00a0<strong>functie voor automatisch horizontaal schalen<\/strong>\u00a0om automatisch het CPU-gebruik te controleren en nieuwe pods aan te maken als dat nodig is om de lading te verdelen. Dit kan eenvoudig met de volgende opdracht worden geactiveerd.<\/p>\n<pre class=\"hp hq hr hs ht lb gv be\"><span id=\"a7cd\" class=\"ej lc ii dm ld b le lf lg s lh\" data-selectable-paragraph=\"\">kubectl autoscale deployment mlflow-serving --cpu-percent=80 --min=1 --max=12<\/span><\/pre>\n<p>Vervolgens kunnen we het aantal en de status van de pods controleren met\u00a0<em>kubectl get hpa mlflow-serving<\/em>, Analyseer de reactietijd van het cluster en het verbruik van bronnen.<br \/>\nHet doel van het volgende experiment is om te observeren hoe Kubernetes automatisch pods kan toevoegen om het gebruik van bronnen te optimaliseren en een betere responstijd te hebben. We kunnen dit experiment in drie fasen opsplitsen, zoals te zien is in de onderstaande grafiek.<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27686%27%20height%3D%270%27%20viewBox%3D%270%200%20686%200%27%3E%3Crect%20width%3D%27686%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-6.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 686px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"686\" height=\"auto\" \/><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27671%27%20height%3D%270%27%20viewBox%3D%270%200%20671%200%27%3E%3Crect%20width%3D%27671%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-7.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 671px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"671\" height=\"auto\" \/><div class=\"fusion-text fusion-text-21\"><p>In dit tweede experiment zagen we dat horizontale auto-scaling ons in staat stelde om de responstijd te verlagen door nieuwe pods aan te maken en meer clustermiddelen toe te wijzen. Bij het bereiken van de clustercapaciteit (fase 3) blijven nieuwe pods echter in afwachting en neemt onze responstijd weer toe.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-13 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">3. Verticaal automatisch schalen<\/h3><\/div><div class=\"fusion-text fusion-text-22\"><p>In zo'n situatie kunnen we een andere Kubernetes-functie verkennen die bekend staat als\u00a0<strong>verticaal automatisch schalen\u00a0<\/strong>die bestaat uit het toewijzen van meer nodes wanneer dat nodig is. Deze functie kan worden geactiveerd met het volgende commando dat het aantal minimale en maximale knooppunten specificeert dat Kubernetes kan toewijzen.<\/p>\n<pre class=\"hp hq hr hs ht lb gv be\"><span id=\"7f33\" class=\"ej lc ii dm ld b le lf lg s lh\" data-selectable-paragraph=\"\">gcloud container clusters update mlflow-k8s\n--enable-autoscaling --min-nodes 3 --max-nodes 5 --node-pool POOL_NAME<\/span><\/pre>\n<p>Tot slot, in dit laatste experiment, samengevat in de grafiek hieronder, kon Kubernetes door de verticale auto-scaling functie in te schakelen automatisch twee nieuwe nodes toevoegen en nieuwe pods aanmaken om de belasting te verdelen en een lagere responstijd te garanderen. Kubernetes had ongeveer 1 minuut nodig om de behoefte te detecteren en de bronnen aan te maken (fase 2). Met een lagere belasting (fase 3) slaagde Kubernetes er bovendien in om de twee nieuwe nodes vrij te maken door pods te doden en het cluster in ongeveer 15 minuten terug te schalen naar een minimum van drie nodes.<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27661%27%20height%3D%270%27%20viewBox%3D%270%200%20661%200%27%3E%3Crect%20width%3D%27661%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-8.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 661px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"661\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-14 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">4. Schatting van de clustergrootte<\/h3><\/div><div class=\"fusion-text fusion-text-23\"><p>Nu we hebben begrepen hoe Kubernetes zich gedraagt als reactie op verschillende laadniveaus met behulp van verticale en horizontale functies voor automatisch schalen, is de laatste stap het uitvoeren van prestatietests met verschillende bronnen, rekening houdend met de vereisten van onze applicatie en de schatting van het aantal gebruikers. Laten we ons voorstellen dat, om aan onze SLA-eisen te voldoen, onze 95e percentiel responstijd lager dan 1 sec. moet zijn. In dit geval kunnen we de onderstaande grafiek met de API responstijd voor verschillende cores uitzetten en een idee krijgen van de prestaties van onze applicatie onder verschillende omstandigheden.<\/p>\n<p>Voor ons ML-model dat met Mlflow wordt geserveerd, kunnen we ongeveer 120 gelijktijdige gebruikers hebben op een Kubernetes-cluster met 12 cores en een responstijd van minder dan 1 sec. garanderen.<\/p>\n<\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27625%27%20height%3D%270%27%20viewBox%3D%270%200%20625%200%27%3E%3Crect%20width%3D%27625%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/10\/article-kais-part3-9.png\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left hover-enable\" style=\"width: 625px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"625\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-15 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Conclusie<\/h2><\/div><div class=\"fusion-text fusion-text-24\"><p>In een reeks artikelen hebben we het hele proces doorlopen om Mlflow tracking instance te implementeren en een model als API op Kubernetes te serveren, waarbij we gebruik hebben gemaakt van de mogelijkheid om eenvoudig op te schalen en hoge belastingen aan te kunnen. We hebben ook ge\u00ebxperimenteerd met twee interessante functies die Kubernetes biedt, namelijk horizontaal en verticaal automatisch schalen, en we hebben laten zien dat het altijd interessant is om onze bronnen te monitoren om er zeker van te zijn dat we ze effici\u00ebnt gebruiken. Tot slot lieten we zien hoe we onze applicatie konden testen en beslissingen konden nemen over de infrastructuur op basis van de respons op verschillende testscenario's.<\/p>\n<\/div><\/div><\/div><\/div><\/article><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-5 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center fusion-column-inner-bg-wrapper\" style=\"--awb-padding-top:40px;--awb-padding-right:40px;--awb-padding-bottom:40px;--awb-padding-left:40px;--awb-overflow:hidden;--awb-inner-bg-position:left center;--awb-inner-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-inner-bg-border-radius:4px 4px 4px 4px;--awb-inner-bg-overflow:hidden;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><span class=\"fusion-column-inner-bg hover-type-none\"><a class=\"fusion-column-anchor\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-a83390718a92\" target=\"_blank\" rel=\"noopener\"><span class=\"fusion-column-inner-bg-image lazyload\" data-bg=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\"><\/span><\/a><\/span><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-column fusion-column-has-bg-image\" data-bg-url=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\"><div class=\"fusion-image-element\" style=\"text-align:center;--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\"fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img decoding=\"async\" width=\"72\" height=\"41\" title=\"middelgrote\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%2772%27%20height%3D%2741%27%20viewBox%3D%270%200%2072%2041%27%3E%3Crect%20width%3D%2772%27%20height%3D%2741%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/medium.png\" alt class=\"lazyload img-responsive wp-image-60927\"\/><\/span><\/div><div class=\"fusion-title title fusion-title-16 fusion-sep-none fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-margin-top:20px;--awb-margin-bottom:0px;--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-center fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Medium Blog bij Artefact.<\/h3><\/div><div class=\"fusion-text fusion-text-25\" style=\"--awb-content-alignment:center;\"><p>Dit artikel werd oorspronkelijk gepubliceerd op <strong>Medium.com<\/strong>.<br \/>\nVolg ons op ons medium Blog !<\/p>\n<\/div><div style=\"text-align:center;\"><a class=\"fusion-button button-flat button-medium button-default fusion-button-default button-1 fusion-button-default-span fusion-button-default-type\" target=\"_blank\" rel=\"noopener noreferrer\" title=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-a83390718a92\" aria-label=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-a83390718a92\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/serving-ml-models-at-scale-using-mlflow-on-kubernetes-bf27258775e7\"><span class=\"fusion-button-text awb-button__text awb-button__text--default\">Lees ons artikel<\/span><\/a><\/div><\/div><\/div><\/div><\/div><\/p>","protected":false},"excerpt":{"rendered":"<p>Dit artikel is het derde deel van een serie waarin we het proces doorlopen van het loggen van modellen met behulp van Mlflow, het serveren ervan op Kubernetes engine en uiteindelijk het opschalen ervan volgens de behoeften van onze applicatie. Hoewel dit artikel onafhankelijk gebruikt kan worden om elke API-respons te testen, raden wij aan om onze twee vorige artikelen (deel1 en deel2) te lezen over hoe u een volginstantie kunt implementeren en een model als API kunt serveren met Mlflow. In het volgende zullen we ge\u00efnteresseerd zijn in het schaalbaarheidsprobleem en dit aanpakken met enkele experimenten om het gedrag van het k8s cluster te begrijpen en aanbevelingen te geven over hoe om te gaan met hoge belastingen.<\/p>","protected":false},"featured_media":68688,"parent":0,"template":"","meta":{"_acf_changed":false,"ep_exclude_from_search":false},"blog-category":[21939],"blog-language":[2991],"class_list":["post-64449","blog","type-blog","status-publish","has-post-thumbnail","hentry","blog-category-medium","blog-language-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.artefact.com\/nl\/wp-json\/wp\/v2\/blog\/64449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.artefact.com\/nl\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.artefact.com\/nl\/wp-json\/wp\/v2\/types\/blog"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.artefact.com\/nl\/wp-json\/wp\/v2\/media\/68688"}],"wp:attachment":[{"href":"https:\/\/www.artefact.com\/nl\/wp-json\/wp\/v2\/media?parent=64449"}],"wp:term":[{"taxonomy":"blog-category","embeddable":true,"href":"https:\/\/www.artefact.com\/nl\/wp-json\/wp\/v2\/blog-category?post=64449"},{"taxonomy":"blog-language","embeddable":true,"href":"https:\/\/www.artefact.com\/nl\/wp-json\/wp\/v2\/blog-language?post=64449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}