{"id":59250,"date":"2021-03-25T15:39:39","date_gmt":"2021-03-25T15:39:39","guid":{"rendered":"https:\/\/www.artefact.com\/?post_type=news&#038;p=59250"},"modified":"2024-09-20T17:45:41","modified_gmt":"2024-09-20T16:45:41","slug":"automating-the-training-of-ml-models-with-google-cloud-ai-platform","status":"publish","type":"blog","link":"https:\/\/www.artefact.com\/es\/blog\/automating-the-training-of-ml-models-with-google-cloud-ai-platform\/","title":{"rendered":"Automatizaci\u00f3n del entrenamiento de modelos ML con Google Cloud AI Platform"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling article-author\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_2 1_2 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:50%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:50%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Author<\/h2><\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27150%27%20height%3D%270%27%20viewBox%3D%270%200%20150%200%27%3E%3Crect%20width%3D%27150%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Sacha-Lasry.jpeg\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left article-author-image\" style=\"width: 150px; border-radius: 59% 41% 41% 59% \/ 29% 48% 52% 71%; overflow: hidden;\" width=\"150\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-three article-author-name-title\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Sacha Lasry<\/h3><\/div><div class=\"fusion-text fusion-text-1 article-author-description\"><p>Data Scientist at Artefact France<\/p>\n<\/div><div class=\"fusion-social-links fusion-social-links-1\" style=\"--awb-margin-top:0px;--awb-margin-right:0px;--awb-margin-bottom:0px;--awb-margin-left:0px;--awb-box-border-top:0px;--awb-box-border-right:0px;--awb-box-border-bottom:0px;--awb-box-border-left:0px;--awb-icon-colors-hover:var(--awb-color4);--awb-box-colors-hover:var(--awb-color6);--awb-box-border-color:var(--awb-color6);--awb-box-border-color-hover:var(--awb-color6);\"><div class=\"fusion-social-networks color-type-brand\"><div class=\"fusion-social-networks-wrapper\"><a class=\"fusion-social-network-icon fusion-tooltip fusion-linkedin awb-icon-linkedin\" style=\"color:#0077b5;font-size:16px;\" data-placement=\"top\" data-title=\"LinkedIn\" data-toggle=\"tooltip\" title=\"LinkedIn\" aria-label=\"linkedin\" target=\"_blank\" rel=\"noopener noreferrer nofollow\" href=\"https:\/\/www.linkedin.com\/in\/sacha-lasry-1484a3141\/\"><\/a><\/div><\/div><\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-2 description\"><p><strong>TL;DR<\/strong><\/p>\n<p>Training a ML model can sometimes be complicated to set-up and replicate:<\/p>\n<p>\u2022 It might be done using some code hosted on a notebook on a VM that you have to launch manually and turn off when it\u2019s finished<br \/>\n\u2022 You may have to upload a training dataset each time you want to train it again<br \/>\n\u2022 You need to deep dive into your code when you want to change a single parameter<br \/>\n\u2022 etc.<\/p>\n<p>In this article, we\u2019ll see how we automated the training process of FastAI\u2019s text classifiers, using Google Cloud AI Platform.<\/p>\n<p>In a second article, we\u2019ll see how we managed to deploy such models with AI Platform and TorchServe.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-3\"><p>You can find more about us and our projects on our Medium blog<\/p>\n<\/div><div ><a class=\"fusion-button button-flat fusion-button-default-size button-default fusion-button-default button-1 fusion-button-default-span fusion-button-default-type button-primary-medium\" target=\"_self\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\" rel=\"noopener\"><span class=\"fusion-button-text awb-button__text awb-button__text--default\">View Articles<\/span><\/a><\/div><\/div><\/div><\/div><\/div><article class=\"fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><a class=\"fusion-no-lightbox\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\" target=\"_blank\" aria-label=\"1*s986xIGqhfsN8U&#8211;09_AdA\" rel=\"noopener noreferrer\"><img decoding=\"async\" width=\"400\" height=\"99\" alt=\"Blog Medium by Artefact\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/1s986xIGqhfsN8U-09_AdA.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/1s986xIGqhfsN8U-09_AdA-400x99.png\" class=\"lazyload img-responsive wp-image-59273\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%274000%27%20height%3D%27992%27%20viewBox%3D%270%200%204000%20992%27%3E%3Crect%20width%3D%274000%27%20height%3D%27992%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/1s986xIGqhfsN8U-09_AdA-200x50.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/1s986xIGqhfsN8U-09_AdA-400x99.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/1s986xIGqhfsN8U-09_AdA-600x149.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/1s986xIGqhfsN8U-09_AdA-800x198.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/1s986xIGqhfsN8U-09_AdA-1200x298.png 1200w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 400px\" \/><\/a><\/span><\/div><div class=\"fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">For who?<\/h2><\/div><div class=\"fusion-text fusion-text-4\"><p>If you\u2019re working on a project that requires to train ML models multiple times, and you\u2019re tired of having to manually run your trainings, you\u2019ve come to the right place.<\/p>\n<\/div><div class=\"fusion-text fusion-text-5\"><p>If you\u2019re tired of managing VMs for your training and just want your time to be allowed to something more interesting, like reading Medium articles, you\u2019ve also come to the right place!<\/p>\n<\/div><div class=\"fusion-text fusion-text-6\"><p>This article is dedicated to those who want to know how they can gain time and resources by using AI Platform for the training of their ML models. We\u2019ll see in this article how we applied this to a project we worked on, using\u00a0<a href=\"https:\/\/github.com\/fastai\/fastai\" target=\"_blank\" rel=\"noopener\">FastAI<\/a>.<\/p>\n<\/div><div class=\"fusion-text fusion-text-7\"><p>This article is dedicated to those who want to know how they can gain time and resources by using AI Platform for the training of their ML models. We\u2019ll see in this article how we applied this to a project we worked on, using\u00a0<a href=\"https:\/\/github.com\/fastai\/fastai\" target=\"_blank\" rel=\"noopener\">FastAI<\/a>.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Pre-requisites if you want to reproduce what we did<\/h2><\/div><div class=\"fusion-text fusion-text-8\"><p>AI Platform is part of the Google Cloud Platform suite, as well as the other services we used to automate our training pipeline. Here are the GCP services we used:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-1 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>AI Platform, to host the training of the model<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Cloud Storage, to host the files that are needed for the training along with the model file that will be exported after the training<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Cloud Registry, to host the Docker Image containing the training code<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>(Optional) Compute Engine, to build and run the Docker Image on a Virtual Machine equipped with a GPU<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-9\"><p>Google Cloud SDK, Docker and Nvidia-docker need to be installed and set-up on the machine where the Docker image is built. The point of installing Nvidia-docker is to be able to run the built Docker Image directly on the GPU of the machine (if there is one), to ensure that there are no errors in the code and that the training will run as expected when running on AI Platform.<\/p>\n<\/div><div class=\"fusion-text fusion-text-10\"><p id=\"a91d\" class=\"kj kk fo kl b gm ld jq kn gp le ju kp kq lf ks kt ku lg kw kx ky lh la lb lc fg bx\" data-selectable-paragraph=\"\">As we\u2019ll see later in the article, the Docker Image has been created from Nvidia-Cuda docker image, so the required Nvidia drivers are automatically installed when building the image.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Context<\/h2><\/div><div class=\"fusion-text fusion-text-11\"><p id=\"a91d\" class=\"kj kk fo kl b gm ld jq kn gp le ju kp kq lf ks kt ku lg kw kx ky lh la lb lc fg bx\" data-selectable-paragraph=\"\">We\u2019ll see in this article how we automated the training of a text classifier made with FastAI, a library allowing users to create powerful models thanks to ULM FiT method.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"text-align:center;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img decoding=\"async\" width=\"700\" height=\"309\" alt=\"Example of classification and ULM FiT Workflow\" title=\"Language-Model\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Language-Model.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Language-Model.png\" class=\"lazyload img-responsive wp-image-59254\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27309%27%20viewBox%3D%270%200%20700%20309%27%3E%3Crect%20width%3D%27700%27%20height%3D%27309%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Language-Model-200x88.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Language-Model-400x177.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Language-Model-600x265.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Language-Model.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 700px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-12\"><p id=\"a91d\" class=\"kj kk fo kl b gm ld jq kn gp le ju kp kq lf ks kt ku lg kw kx ky lh la lb lc fg bx\" data-selectable-paragraph=\"\">We already presented it in <a href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/how-to-train-a-language-model-from-scratch-without-any-linguistic-knowledge-11acaa933e84\" target=\"_blank\" rel=\"noopener\">another Medium article<\/a>, so I invite you to check it out if you want to know more about it.<\/p>\n<\/div><div class=\"fusion-text fusion-text-13\"><p id=\"a91d\" class=\"kj kk fo kl b gm ld jq kn gp le ju kp kq lf ks kt ku lg kw kx ky lh la lb lc fg bx\" data-selectable-paragraph=\"\">Since what we\u2019ll see in this article will be applicable to any framework, you don\u2019t need to be familiar with FastAI to continue reading. All you need to know is that we used a pre-trained model to train our text-classifier in addition to a training labelled dataset.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">How we set-up the training with AIPlatform<\/h2><\/div><div class=\"fusion-text fusion-text-14\"><p id=\"a91d\" class=\"kj kk fo kl b gm ld jq kn gp le ju kp kq lf ks kt ku lg kw kx ky lh la lb lc fg bx\" data-selectable-paragraph=\"\">To automate model training with AI Platform, you need to specify which code should be run in which environment when the training command is called. The best way to do so is to create a Docker Image that contains all the training code and its environment, so AI Platform just has to create a container from this image each time you ask it to train a model. We\u2019ll see in this part how we\u2019ve done it.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-7 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Store all necessary files in a GCS bucket<\/h2><\/div><div class=\"fusion-text fusion-text-15\"><p id=\"a91d\" class=\"kj kk fo kl b gm ld jq kn gp le ju kp kq lf ks kt ku lg kw kx ky lh la lb lc fg bx\" data-selectable-paragraph=\"\">Before creating our Docker image that contains our training, we had to think about the files that are used during the training of a FastAI text classifier model. We therefore decided to store all the files that were necessary for the training in a GCS bucket, separated in folders for each language, with a specific name given to each file.<\/p>\n<\/div><div class=\"fusion-text fusion-text-16\"><p id=\"a91d\" class=\"kj kk fo kl b gm ld jq kn gp le ju kp kq lf ks kt ku lg kw kx ky lh la lb lc fg bx\" data-selectable-paragraph=\"\">We then implemented in our training code (as we\u2019ll see below) a method to retrieve those required files from GCS, by only specifying the target language as argument.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"text-align:center;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-3 hover-type-none\"><img decoding=\"async\" width=\"498\" height=\"249\" alt=\"Architecture of our GCS bucket\" title=\"Architecture-GCS-bucket\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Architecture-GCS-bucket.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Architecture-GCS-bucket.png\" class=\"lazyload img-responsive wp-image-59257\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27498%27%20height%3D%27249%27%20viewBox%3D%270%200%20498%20249%27%3E%3Crect%20width%3D%27498%27%20height%3D%27249%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Architecture-GCS-bucket-200x100.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Architecture-GCS-bucket-400x200.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Architecture-GCS-bucket.png 498w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 498px\" \/><\/span><\/div><div class=\"fusion-title title fusion-title-8 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Write the training code<\/h3><\/div><div class=\"fusion-text fusion-text-17\"><p>After uploading the necessary files in GCS, we created a <a href=\"https:\/\/github.com\/artefactory\/train-fastai-custom-aiplatform\" target=\"_blank\" rel=\"noopener\">repo<\/a> containing the code for our models training, meant to be stored in a Docker Image later on.<\/p>\n<\/div><div class=\"fusion-text fusion-text-18\"><p>As you can see in the linked repo, we divided the code for the training into separate files to handle properly all the training pipeline.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"text-align:center;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-4 hover-type-none\"><img decoding=\"async\" width=\"300\" height=\"122\" alt=\"Execution of the training workflow\" title=\"training-workflow\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/training-workflow.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/training-workflow-300x122.png\" class=\"lazyload img-responsive wp-image-59258\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27284%27%20viewBox%3D%270%200%20700%20284%27%3E%3Crect%20width%3D%27700%27%20height%3D%27284%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/training-workflow-200x81.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/training-workflow-400x162.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/training-workflow-600x243.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/training-workflow.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 300px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-19\"><p>We defined a file that executes all the training workflow as follows:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-2 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Retrieve the arguments specified by the user by calling the get_args() method from args_getter.py<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Call the FastAI training function and retrieve the trained model that has been saved locally<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Upload the trained model to GCS along with its performances stored in a .json<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-20\"><p><em>fastai_train.py<\/em> is the only file directly using methods from FastAI, so if someone wanted to deploy another framework for their training, they\u2019d just have to modify this file (and the content of the config file of course).<\/p>\n<\/div><div class=\"fusion-text fusion-text-21\"><p>The next step was then to create a docker image that will contain everything necessary for the training to be run correctly.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-9 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Create the Dockerfile<\/h3><\/div><div class=\"fusion-text fusion-text-22\"><p>After preparing the training code, a Dockerfile needed to be created and pushed to Google Container Registry, to enable AI Platform to retrieve and execute it in the right environment.<\/p>\n<\/div><div class=\"fusion-text fusion-text-23\"><p>Since the training of our model needed to run on GPU, we imported Nvidia-cuda Docker image to create our own, so all the necessary drivers were already installed.<\/p>\n<\/div><div class=\"fusion-text fusion-text-24\"><div class=\"code\">\n<pre># Dockerfile\nFROM nvidia\/cuda:10.2-devel\n\nRUN apt-get update &amp;&amp; apt-get install -y --no-install-recommends \n    wget \n    build-essential\n\nRUN apt-get update &amp;&amp; apt-get install -y --no-install-recommends \n    python3-dev \n    python3-setuptools \n    python3-pip\n\nRUN pip3 install pip==20.3.1\n\nWORKDIR \/root\n\n# Create directories to contain code and downloaded model from GCS\nRUN mkdir \/root\/trainer\n\nRUN mkdir \/root\/models\n\n# Copy requirements\nCOPY requirements.txt \/root\/requirements.txt\n\n# Install pytorch\nRUN pip3 install torch==1.8.0\n\n# Install requirements\nRUN pip3 install -r requirements.txt\n\n# Installs google cloud sdk, this is mostly for using gsutil to export model.\nRUN wget -nv \n    https:\/\/dl.google.com\/dl\/cloudsdk\/release\/google-cloud-sdk.tar.gz &amp;&amp; \n    mkdir \/root\/tools &amp;&amp; \n    tar xvzf google-cloud-sdk.tar.gz -C \/root\/tools &amp;&amp; \n    rm google-cloud-sdk.tar.gz &amp;&amp; \n    \/root\/tools\/google-cloud-sdk\/install.sh --usage-reporting=false \n        --path-update=false --bash-completion=false \n        --disable-installation-options &amp;&amp; \n    rm -rf \/root\/.config\/* &amp;&amp; \n    ln -s \/root\/.config \/config &amp;&amp; \n    # Remove the backup directory that gcloud creates\n    rm -rf \/root\/tools\/google-cloud-sdk\/.install\/.backup\n\n# Copy files\nCOPY trainer\/fastai_train.py \/root\/trainer\/fastai_train.py\n\nCOPY trainer\/fastai_config.py \/root\/trainer\/fastai_config.py\n\nCOPY trainer\/args_getter.py \/root\/trainer\/args_getter.py\n\nCOPY trainer\/gcs_utils.py \/root\/trainer\/gcs_utils.py\n\nCOPY trainer\/training_workflow.py \/root\/trainer\/training_workflow.py\n\n# Path configuration\nENV PATH $PATH:\/root\/tools\/google-cloud-sdk\/bin\n\n# Make sure gsutil will use the default service account\nRUN echo '[GoogleCompute]nservice_account = default' &gt; \/etc\/boto.cfg\n\n# Authentificate to GCP\nCMD gcloud auth login\n\n# Sets up the entry point to invoke the trainer.\nENTRYPOINT [\"python3\",  \"trainer\/training_workflow.py\"]<\/pre>\n<\/div>\n<\/div><div class=\"fusion-text fusion-text-25\"><p>As you can see above, the Dockerfile executes the following steps to create the image:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-3 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>First, the Nvidia docker image is imported to create the training image<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Then, the essential commands to set-up the environment are executed, and python and pip are downloaded<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>The requirements file is copied<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Before installing the requirements, pytorch needs to be installed (to avoid errors due to FastAI installing the wrong version of torch if it\u2019s not detected in the environment). It\u2019s really important to choose pytorch\u2019s corresponding version depending on the cuda image that is used. To know which one to use, check pytorch.org<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>After these steps, the requirements and the Google Cloud SDK are installed, and the other python files are copied in the image.<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>The service account to be used by default to authenticate to GCP is specified.<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Finally, the training of the model is run by calling training_workflow.py.<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-title title fusion-title-10 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Build the image and push it to GCR<\/h3><\/div><div class=\"fusion-text fusion-text-26\"><p>After creating the Dockerfile, it was necessary to build the image to push it to GCR. As specified in the repo, various local variables needed to be defined, such as the <strong>IMAGE_URI<\/strong> in GCR, the <strong>REGION<\/strong> our operates in, etc.<\/p>\n<\/div><div class=\"fusion-text fusion-text-27\"><p>The image was built by running this command:<\/p>\n<\/div><div class=\"fusion-text fusion-text-28\"><div class=\"code\">\n<pre>docker build -f Dockerfile -t $IMAGE_URI .\/<\/pre>\n<\/div>\n<\/div><div class=\"fusion-text fusion-text-29\"><p>Before pushing it to GCR, we wanted to ensure that everything would work fine when calling the training, so since our VM had a GPU available, we ran the image prior pushing it to see what happened:<\/p>\n<\/div><div class=\"fusion-text fusion-text-30\"><div class=\"code\">\n<pre>docker run --runtime=nvidia $IMAGE_URI --epochs 1 --bucket-name $BUCKET_NAME<\/pre>\n<\/div>\n<\/div><div class=\"fusion-text fusion-text-31\"><p>This step is not necessary but can save you a lot of time because you\u2019ll directly see if there are errors in your code.<\/p>\n<\/div><div class=\"fusion-text fusion-text-32\"><p>We finally pushed the container to GCR by running the following command, <strong>$IMAGE_URI<\/strong> being the variable referring to the URI where the image is stored in GCR:<\/p>\n<\/div><div class=\"fusion-text fusion-text-33\"><div class=\"code\">\n<pre>docker push $IMAGE_URI<\/pre>\n<\/div>\n<\/div><div class=\"fusion-title title fusion-title-11 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Run and follow the job<\/h3><\/div><div class=\"fusion-text fusion-text-34\"><p>After following the previous steps, the training of the model was ready to be called using a simple command in the terminal. The Google Cloud SDK just needed to be enabled and local variables defined:<\/p>\n<\/div><div class=\"fusion-text fusion-text-35\"><div class=\"code\">\n<pre>gcloud ai-platform jobs submit training $JOB_NAME \n--scale-tier BASIC_GPU \n--region $REGION \n--master-image-uri $IMAGE_URI \n-- \n--lang=fr \n--epochs=10 \n--bucket-name=$BUCKET_NAME \n--model-dir=$MODEL_DIR<\/pre>\n<\/div>\n<\/div><div class=\"fusion-text fusion-text-36\"><p>This command asks AI Platform to retrieve the container in GCR using its IMAGE_URI, and then run the training on the GPU of a machine hosted in the region REGION.<\/p>\n<\/div><div class=\"fusion-text fusion-text-37\"><p>We specified various arguments here, such as the language of the training, the number of epochs, the bucket name to upload and download files in GCS, and the directory where to store the trained model. Those are the arguments that are retrieved by the args_getter.py file<\/p>\n<\/div><div class=\"fusion-text fusion-text-38\"><p>After running the command, the training started and a job was created in the AI Platform console, allowing us to follow the evolution of the training and check the logs of the machine running it.<\/p>\n<\/div><div class=\"fusion-text fusion-text-39\"><p>Many information are accessible when looking at the job in AI Platform console<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"text-align:center;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-5 hover-type-none\"><img decoding=\"async\" width=\"300\" height=\"126\" alt=\"View of AI Platform Jobs interface\" title=\"AI-Platforms-Jobs-Interface\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platforms-Jobs-Interface.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platforms-Jobs-Interface-300x126.png\" class=\"lazyload img-responsive wp-image-59259\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27294%27%20viewBox%3D%270%200%20700%20294%27%3E%3Crect%20width%3D%27700%27%20height%3D%27294%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platforms-Jobs-Interface-200x84.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platforms-Jobs-Interface-400x168.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platforms-Jobs-Interface-600x252.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platforms-Jobs-Interface.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 300px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-40\"><p>When the job is complete, the trained model is saved on a GCS bucket as a .pth file along with a .json file containing its performance.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"text-align:center;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-6 hover-type-none\"><img decoding=\"async\" width=\"300\" height=\"128\" alt=\"Folder where the files are stored after the training\" title=\"AI-Platform\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platform.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platform-300x128.png\" class=\"lazyload img-responsive wp-image-59260\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27299%27%20viewBox%3D%270%200%20700%20299%27%3E%3Crect%20width%3D%27700%27%20height%3D%27299%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platform-200x85.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platform-400x171.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platform-600x256.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/AI-Platform.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 300px\" \/><\/span><\/div><div class=\"fusion-image-element\" style=\"text-align:center;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-7 hover-type-none\"><img decoding=\"async\" width=\"300\" height=\"201\" title=\"Performance-training\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Performance-training.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Performance-training-300x201.png\" alt class=\"lazyload img-responsive wp-image-59261\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27413%27%20height%3D%27277%27%20viewBox%3D%270%200%20413%20277%27%3E%3Crect%20width%3D%27413%27%20height%3D%27277%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Performance-training-200x134.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Performance-training-400x268.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Performance-training.png 413w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 300px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-41\"><p>We decided to consider the results on labels separately, as if we had a binary classifier for each possible label.<\/p>\n<\/div><div class=\"fusion-text fusion-text-42\"><p>Since we used FastAI, the model file could directly be loaded in any environment by calling FastAI\u2019s load_learner() method.<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"text-align:center;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-8 hover-type-none\"><img decoding=\"async\" width=\"300\" height=\"61\" alt=\"Reusability of trained model\" title=\"Reusability-trained-model\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Reusability-trained-model.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Reusability-trained-model-300x61.png\" class=\"lazyload img-responsive wp-image-59262\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27700%27%20height%3D%27142%27%20viewBox%3D%270%200%20700%20142%27%3E%3Crect%20width%3D%27700%27%20height%3D%27142%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Reusability-trained-model-200x41.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Reusability-trained-model-400x81.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Reusability-trained-model-600x122.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/Reusability-trained-model.png 700w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 300px\" \/><\/span><\/div><div class=\"fusion-title title fusion-title-12 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Key take-aways<\/h2><\/div><div class=\"fusion-text fusion-text-43\"><p>Automating the training of our models using AI Platform allowed us to save a lot of time, and made us consider various aspects of our model training to efficiently put it into production.<\/p>\n<\/div><div class=\"fusion-text fusion-text-44\"><p>Here are some take-aways we gathered from this:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-4 fusion-checklist-default type-icons\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Building and running the Docker image directly on your machine before pushing it to GCR can save a lot of time of debugging, but be sure to have a machine powerful enough to do it (such as Google\u2019s VMs with GPU)<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>The arguments and parameters must be well defined, as well as their default value, to simplify the training process while granting users the ability to pimp it if they want to<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon fa-angle-double-right fas\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>FastAI is a great ML library, and AI Platform Training is perfectly fitted to automate the training of its models. What is even greater is the fact that the same approach can be applied to any framework, just by modifying the code of the training itself in the repo<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-title title fusion-title-13 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">What\u2019s next?<\/h2><\/div><div class=\"fusion-text fusion-text-45\"><p>Now that we\u2019ve explained how we automated the training of our model, we can now show you how we managed to make it callable easily to classify files on demand.<\/p>\n<\/div><div class=\"fusion-text fusion-text-46\"><p>Stay tuned for the second part of this article that will explain everything you need to know to deploy your trained classifiers using AI Platform and TorchServe!<\/p>\n<\/div><div class=\"fusion-title title fusion-title-14 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Thanks for reading!<\/h2><\/div><div class=\"fusion-text fusion-text-47\"><p>We hope you\u2019ve learned something today, and that it will be useful for your future ML projects. Feel free to reach us if you have any question or comment regarding this topic.<\/p>\n<\/div><\/div><\/div><\/div><\/article><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-5 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-48\"><p>You can find more about us and our projects on our Medium blog<\/p>\n<\/div><div ><a class=\"fusion-button button-flat fusion-button-default-size button-default fusion-button-default button-2 fusion-button-default-span fusion-button-default-type button-primary-medium\" target=\"_self\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\" rel=\"noopener\"><span class=\"fusion-button-text awb-button__text awb-button__text--default\">View Articles<\/span><\/a><\/div><\/div><\/div><\/div><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>25 de marzo de 2021<br \/>\nC\u00f3mo gestionamos el entrenamiento y la implementaci\u00f3n de nuestros modelos de FastAI mediante AI Platform \u2014 Parte I<\/p>","protected":false},"featured_media":59251,"parent":0,"template":"","meta":{"_acf_changed":false,"ep_exclude_from_search":false},"blog-category":[2995,22035],"blog-language":[2991],"class_list":["post-59250","blog","type-blog","status-publish","has-post-thumbnail","hentry","blog-category-ai-technology","blog-category-data-ai-consulting","blog-language-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/blog\/59250","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/types\/blog"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/media\/59251"}],"wp:attachment":[{"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/media?parent=59250"}],"wp:term":[{"taxonomy":"blog-category","embeddable":true,"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/blog-category?post=59250"},{"taxonomy":"blog-language","embeddable":true,"href":"https:\/\/www.artefact.com\/es\/wp-json\/wp\/v2\/blog-language?post=59250"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}