{"id":68762,"date":"2023-02-01T15:28:53","date_gmt":"2023-02-01T15:28:53","guid":{"rendered":"https:\/\/www.artefact.com\/?post_type=blog&#038;p=68762"},"modified":"2024-09-20T17:45:55","modified_gmt":"2024-09-20T16:45:55","slug":"deploying-stable-diffusion-on-vertex-ai","status":"publish","type":"blog","link":"https:\/\/www.artefact.com\/br\/blog\/deploying-stable-diffusion-on-vertex-ai\/","title":{"rendered":"Implementa\u00e7\u00e3o do Stable Diffusion no Vertex AI"},"content":{"rendered":"<p><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling article-author\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_2 1_2 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:50%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:50%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Author<\/h2><\/div><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27150%27%20height%3D%270%27%20viewBox%3D%270%200%20150%200%27%3E%3Crect%20width%3D%27150%27%20height%3D%270%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/Tom-Darmon.jpeg\" alt=\"Image\" class=\"lazyload artefact-elegant-image align-left article-author-image\" style=\"width: 150px; border-radius: 54% 46% 77% 23% \/ 74% 40% 60% 26%; overflow: hidden;\" width=\"150\" height=\"auto\" \/><div class=\"fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-three article-author-name-title\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Tom Darmon<\/h3><\/div><div class=\"fusion-text fusion-text-1 article-author-description\" style=\"--awb-text-transform:none;\"><p>Senior Data Scientist, Artefact France<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center fusion-column-inner-bg-wrapper\" style=\"--awb-padding-top:20px;--awb-padding-right:20px;--awb-padding-bottom:20px;--awb-padding-left:20px;--awb-overflow:hidden;--awb-inner-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-top:1px;--awb-border-right:1px;--awb-border-bottom:1px;--awb-border-left:1px;--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-inner-bg-border-radius:4px 4px 4px 4px;--awb-inner-bg-overflow:hidden;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><span class=\"fusion-column-inner-bg hover-type-none\"><a class=\"fusion-column-anchor\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/deploying-stable-diffusion-on-vertex-ai-6c05ea53c68f\" rel=\"noopener noreferrer\" target=\"_blank\"><span class=\"fusion-column-inner-bg-image\"><\/span><\/a><\/span><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-row fusion-flex-align-items-center\"><div class=\"fusion-text fusion-text-2\"><p><u>Read our article on<\/u><\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><img decoding=\"async\" width=\"4000\" height=\"992\" title=\"Medium Blog\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png\" alt class=\"lazyload img-responsive wp-image-60582\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%274000%27%20height%3D%27992%27%20viewBox%3D%270%200%204000%20992%27%3E%3Crect%20width%3D%274000%27%20height%3D%27992%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-200x50.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-400x99.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-600x149.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-800x198.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog-1200x298.png 1200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/04\/Medium-Blog.png 4000w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 4000px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-3\"><p>.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-4 description \"><p>This article provides a guide for deploying Stable Diffusion model, a popular image generation model, on Google Cloud using Vertex AI. The guide covers Setup and weights download, TorchServe for deployment, TorchServe server deployed on a Vertex endpoint and Saving the image history automatically on GCS<\/p>\n<\/div><\/div><\/div><\/div><\/div><article class=\"fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Introduction<\/h2><\/div><div class=\"fusion-text fusion-text-5\"><p>Stable Diffusion is an image generation model. It was open sourced in 2022 and has gained popularity due to its ability to generate high quality images from text descriptions. Like other image generation models, such as Dall-E, Stable Diffusion uses machine learning techniques to generate images based on a given input.<\/p>\n<p>Mo\u00ebt Hennessy, the wine and spirits division of luxury conglomerate LVMH, manages a portfolio of more than 26 iconic brands such as Mo\u00ebt &amp; Chandon, Hennessy, and Veuve Clicquot. Mo\u00ebt Hennessy has collaborated with <a href=\"https:\/\/www.artefact.com\/blog\/\">Artefact<\/a> to investigate the potential uses of cutting-edge technology in marketing content generation. With a focus on privacy and security, the team decided to explore the deployment of Stable Diffusion on Google Cloud Platform (GCP) to allow Mo\u00ebt Hennessy to fine-tune and run the model within their own infrastructure, providing a seamless experience from model fine-tuning to API exposure.<\/p>\n<p>Before you begin, it is important to note that this article assumes that you have prior knowledge of Google Cloud Platform (GCP) and specifically Vertex AI. This includes concepts such as model registry and Vertex endpoints. Additionally, you need to have prior experience with Docker to follow some of the steps. If you are not familiar with these concepts, it is recommended that you familiarize yourself with them before proceeding.<br \/>\nAdditionally, in order to download Stable Diffusion weights you need to have a huggingface account, if you don\u2019t have one already you can create one easily on<a href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"noopener\">\u00a0huggingface\u2019s website<\/a>.<br \/>\nWith that being said, let\u2019s begin !<\/p>\n<\/div><div class=\"fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Setup and weights download<\/h2><\/div><div class=\"fusion-text fusion-text-6\"><p>I recommend cloning the g<a href=\"https:\/\/github.com\/artefactory\/deploy_stable_difusion#dotenv-file\" target=\"_blank\" rel=\"noopener\">ithub repository<\/a>\u00a0I prepared in order to follow the steps of the article.<\/p>\n<p>It is always important to create a virtual environment to install the packages. Personally, I will use anaconda and install the requirements.txt with every dependency:<\/p>\n<\/div><div class=\"fusion-text fusion-text-7\"><div class=\"code\">conda create -n stable_diffusion \u2014 no-default-packages python=3.8 -y<br \/>\nconda activate stable_diffusion<br \/>\npip install -r src\/requirements.txt<\/div>\n<\/div><div class=\"fusion-text fusion-text-8\"><p>You are now ready to download the weights of Stable Diffusion, in this article we will use\u00a0<a href=\"https:\/\/huggingface.co\/runwayml\/stable-diffusion-v1-5\" target=\"_blank\" rel=\"noopener\">Stable Diffusion 1.5<\/a>. You need to accept the license on the model page, else you will have errors when downloading the model weights.<br \/>\nGo to your account \u2192 settings \u2192 access token \u2192 new token (read access)<\/p>\n<\/div><div class=\"fusion-image-element\" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img decoding=\"async\" width=\"1164\" height=\"276\" alt=\"Go to your account \u2192 settings \u2192 access token \u2192 new token (read access)\" title=\"image-vertex1\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/image-vertex1.png\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/image-vertex1.png\" class=\"lazyload img-responsive wp-image-68764\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271164%27%20height%3D%27276%27%20viewBox%3D%270%200%201164%20276%27%3E%3Crect%20width%3D%271164%27%20height%3D%27276%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/image-vertex1-200x47.png 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/image-vertex1-400x95.png 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/image-vertex1-600x142.png 600w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/image-vertex1-800x190.png 800w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/image-vertex1.png 1164w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 1164px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-9\"><p>You can add the token as an environment variable. Personally, I recommend using a .env file and loading the environment variable using the\u00a0<a href=\"https:\/\/pypi.org\/project\/python-dotenv\/\" target=\"_blank\" rel=\"noopener\">python-dotenv library<\/a>.<br \/>\nYou have to navigate to `src\/stable_diffusion` and run:<\/p>\n<\/div><div class=\"fusion-text fusion-text-10\"><div class=\"code\">python download_model.py<\/div>\n<\/div><div class=\"fusion-text fusion-text-11\"><p>This will download the weights inside<\/p>\n<div class=\"code\">`src\/stable_diffusion\/external_files\/model_weights`.<\/div>\n<\/div><div class=\"fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Torchserve framework<\/h2><\/div><div class=\"fusion-text fusion-text-12\"><p><a class=\"ae kg\" href=\"https:\/\/github.com\/pytorch\/serve\" target=\"_blank\" rel=\"noopener\">Torchserve<\/a>\u00a0is a framework for serving PyTorch models. It allows you to deploy PyTorch models in a production environment and provides features such as model versioning and multi-model serving. It is designed to be easy to use and allows you to focus on building and deploying your models, rather than worrying about infrastructure.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">1. Creating a handler respecting TorchServe format<\/h3><\/div><div class=\"fusion-text fusion-text-13\"><p>A custom handler is a Python class that defines how to pre-process input data, how to run the model, and how to post-process the output. To create a custom handler for your model, you will need to create a Python class that follows the TorchServe format.<br \/>\nA custom handler for Stable Diffusion is already given inside the TorchServe repository. But Vertex endpoint expects a specific format for the requests, so we need to adapt the preprocess() method of the handler to account for Vertex format. You can use the modified version of the handler named `<a class=\"ae kg\" href=\"https:\/\/github.com\/artefactory\/deploy_stable_difusion\/blob\/master\/src\/stable_diffusion\/stable_diffusion_handler.py\" target=\"_blank\" rel=\"noopener\">stable_diffusion_handler.py<\/a>` given inside the github repository of this article.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-7 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">2. Creating the .mar file<\/h3><\/div><div class=\"fusion-text fusion-text-14\"><p>Once you have created your custom handler, you will need to package it along with any dependencies and the model itself into a .mar file using the model-archiver tool. The model-archiver tool is a command-line tool that allows you to package your model, handler, and dependencies into a single file.<\/p>\n<p>This will create a .mar file called output.mar that contains your model, handler, and dependencies.<\/p>\n<p>You might need to edit the path based on where the files are on your machine:<\/p>\n<\/div><div class=\"fusion-text fusion-text-15\"><div class=\"code\">torch-model-archiver<br \/>\n&#8211;model-name stable-diffusion<br \/>\n&#8211;version 1.0<br \/>\n&#8211;handler stable_diffusion\/stable_diffusion_handler.py<br \/>\n&#8211;export-path stable_diffusion\/model-store<br \/>\n&#8211;extra-files stable_diffusion\/external_files <\/div>\n<\/div><div class=\"fusion-title title fusion-title-8 fusion-sep-none fusion-title-text fusion-title-size-three\" style=\"--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">3. Running your TorchServe server<\/h3><\/div><div class=\"fusion-text fusion-text-16\"><p>Once you have created your .mar file, you can start the TorchServe server using the torchserve command. To do this, you will need to run the following command:<\/p>\n<\/div><div class=\"fusion-text fusion-text-17\"><div class=\"code\">torchserve<br \/>\n\u2014 start<br \/>\n\u2014 ts-config=config.properties<br \/>\n\u2014 models=stable-diffusion.mar<br \/>\n\u2014 model-store=stable_diffusion\/model-store<\/div>\n<\/div><div class=\"fusion-text fusion-text-18\"><p>The config.properties let you specify the configurations of your TorchServe server, such as port for inference, health checks, number of workers, etc..<br \/>\n<em>WARNING: All the scripts need to be run where the file is located to avoid any path errors.<\/em><\/p>\n<\/div><div class=\"fusion-title title fusion-title-9 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Run the TorchServe server locally<\/h2><\/div><div class=\"fusion-text fusion-text-19\"><p>It is important to test the code locally before starting to dockerize our deployment, I already prepared a bash script that will create the .mar file and start the TorchServe server.<\/p>\n<p>You can run it with :<\/p>\n<pre class=\"jr js jt ju fc lk ll lm bn ln lo bi\"><\/pre>\n<\/div><div class=\"fusion-text fusion-text-20\"><div class=\"code\">bash serve_locally.sh<\/div>\n<\/div><div class=\"fusion-text fusion-text-21\"><p>You need to wait a few minutes for the server to initialize and the worker to load the model, you can start to run inference if you see the following log:<\/p>\n<\/div><div class=\"fusion-text fusion-text-22\"><div class=\"code\">2023\u201301\u201305T15:34:52,842 [DEBUG] W-9000-stable-diffusion_1.0<br \/>\norg.pytorch.serve.wlm.WorkerThread\u2014 W-9000-stable-diffusion_1.0<br \/>\nState change WORKER_STARTED -&gt; WORKER_MODEL_LOADED<\/div>\n<\/div><div class=\"fusion-text fusion-text-23\"><p>You can then use the following code to make inference requests to your model:<\/p>\n<\/div><div class=\"fusion-text fusion-text-24\"><div class=\"code\">import requests<br \/>\nprompt = &#8220;a photo of an astronaut riding a horse on mars&#8221;<br \/>\nURL = &#8220;http:\/\/localhost:7080\/predictions\/stable-diffusion&#8221;<br \/>\nresponse = requests.post(URL, data=prompt)<\/div>\n<\/div><div class=\"fusion-text fusion-text-25\"><p>You can check that the server received your request by looking at the server logs:<\/p>\n<\/div><div class=\"fusion-text fusion-text-26\"><div class=\"code\">2023\u201301\u201305T15:35:43,765 [INFO ] W-9000-stable-diffusion_1.0-stdout<br \/>\nMODEL_LOG \u2014 Backend received inference at: 1672929343<\/div>\n<\/div><div class=\"fusion-text fusion-text-27\"><p>Stable Diffusion requires a GPU to run smoothly, so you won\u2019t have any output for the moment. You can stop the torchserve server with `torhcserve \u2013stop`.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-10 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Dockerize torchserve<\/h2><\/div><div class=\"fusion-text fusion-text-28\"><p>Your TorchServe server is working locally, to deploy Stable Diffusion on Vertex AI, you will need to dockerize it. This means creating a Docker image that contains the model, the custom handler, and all the necessary dependencies. This is simply all the steps we have done above inside a Dockerfile.<br \/>\nLuckily, it is already prepared and ready to use here :\u00a0<a class=\"ae kg\" href=\"https:\/\/github.com\/artefactory\/deploy_stable_difusion\/blob\/master\/src\/Dockerfile\" target=\"_blank\" rel=\"noopener\">Dockerfile<\/a>.<\/p>\n<p>It\u2019s important to run the container locally to check that it is working properly. I\u2019m going to build it locally, but you can build it with cloud build and pull it on your machine.<\/p>\n<p>Build the image locally (you need Docker daemon running):<\/p>\n<\/div><div class=\"fusion-text fusion-text-29\"><div class=\"code\">build -t serve_sd .<\/div>\n<\/div><div class=\"fusion-text fusion-text-30\"><p>The image build will take 20 to 30 minutes, the build phase is long because the weights of the model need to be copied inside the image before being packaged by the model-archiver.<\/p>\n<p>You can run a docker container and listen on port 7080 with:<\/p>\n<\/div><div class=\"fusion-text fusion-text-31\"><div class=\"code\">docker run -p 7080:7080 serve_sd<\/div>\n<\/div><div class=\"fusion-text fusion-text-32\"><p>To check that everything is working properly, you can wait until the worker has loaded the model and run the same inference code as before since we are using the same 7080 port.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-11 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Deployment on Vertex AI<\/h2><\/div><div class=\"fusion-text fusion-text-33\"><p>Now that the Dockerfile is ready and working, we need to:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-1 fusion-checklist-default type-icons paddingList dark-text\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">Build the image using Cloud Build inside google container registry (gcr)<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Upload the image of our custom model inside Vertex AI model registry<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">Create a Vertex AI endpoint<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Attach the model to the endpoint<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-34\"><p>This is exactly what the\u00a0<a class=\"ae kg\" href=\"https:\/\/github.com\/artefactory\/deploy_stable_difusion\/blob\/master\/src\/deploy.sh\" target=\"_blank\" rel=\"noopener\">deploy.sh<\/a>\u00a0bash script is going to do, you can run it with:<\/p>\n<\/div><div class=\"fusion-text fusion-text-35\"><div class=\"code\">bash deploy.sh<\/div>\n<\/div><div class=\"fusion-text fusion-text-36\"><p>The deployment takes around 1 hour with good internet speed:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-2 fusion-checklist-default type-icons paddingList dark-text\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">Sending the 8 GB of model weights to Cloud Build can take from a few minutes to hours depending on your internet speed<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Building the image takes around 20 minutes<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Uploading the model takes around 5 minutes<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Creating the endpoint takes around 5 minutes<\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p>Attaching the model to the endpoint takes from 30 to 40 minutes<\/p>\n<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-37\"><p>After the model is successfully attached to the endpoint, you can query the endpoint with the following code:<\/p>\n<\/div><div class=\"fusion-text fusion-text-38\"><div class=\"code\">from google.cloud import aiplatform as aip<\/div>\n<p>PROJECT_NAME = \"\"<br \/>\nREGION = \"\"<br \/>\nENDPOINT_ID = \"\"<br \/>\naip.init(project=PROJECT_NAME, location=REGION)<br \/>\nendpoint = aip.Endpoint(endpoint_name=ENDPOINT_ID)<br \/>\ntext_input = \"\"\"A bottle of aged and exclusive cognac<br \/>\nstands on a reflective surface, in front of a vibrant bar,<br \/>\nhyper detailed, 4K, bokeh\"\"\"<\/p>\n<p>def query_endpoint(endpoint, text_input):<br \/>\npayload = <br \/>\nresponse = endpoint.predict(instances=[payload])<br \/>\nreturn response<\/p>\n<div class=\"code\">image = query_endpoint(endpoint, text_input)<\/div>\n<\/div><div class=\"fusion-image-element\" style=\"text-align:center;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-3 hover-type-none\"><img decoding=\"async\" width=\"512\" height=\"512\" title=\"1_uS37ust4GEzk7-0OBYNReg\" src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/1_uS37ust4GEzk7-0OBYNReg.webp\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/1_uS37ust4GEzk7-0OBYNReg.webp\" alt class=\"lazyload img-responsive wp-image-68765\" srcset=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27512%27%20height%3D%27512%27%20viewBox%3D%270%200%20512%20512%27%3E%3Crect%20width%3D%27512%27%20height%3D%27512%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-srcset=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/1_uS37ust4GEzk7-0OBYNReg-200x200.webp 200w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/1_uS37ust4GEzk7-0OBYNReg-400x400.webp 400w, https:\/\/www.artefact.com\/\/wp-content\/uploads\/2023\/02\/1_uS37ust4GEzk7-0OBYNReg.webp 512w\" data-sizes=\"auto\" data-orig-sizes=\"(max-width: 640px) 100vw, 512px\" \/><\/span><\/div><div class=\"fusion-text fusion-text-39\"><p style=\"text-align: center;\"><em>A bottle of aged and exclusive cognac stands on a reflective surface, in front of a vibrant bar, hyper detailed, 4K, bokeh \u2014 Stable Diffusion 1.5<\/em><\/p>\n<\/div><div class=\"fusion-text fusion-text-40\"><p>The inference should take between 10 and 15 seconds on T4 GPU. You can improve the speed by choosing a better GPU, you can change the ACCELERATOR_TYPE variable inside <a class=\"ae kg\" href=\"https:\/\/github.com\/artefactory\/deploy_stable_difusion\/blob\/master\/src\/deploy.sh\" target=\"_blank\" rel=\"noopener\">deploy.sh<\/a>.<\/p>\n<\/div><div class=\"fusion-title title fusion-title-12 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Optional: If the image size is too big for the endpoint, or you want to store the image history.<\/h2><\/div><div class=\"fusion-text fusion-text-41\"><p>You might want to keep track of the history of the images, or you are having errors due to the size limit of 1.5 MB for the endpoint response. In this case, I recommend using the post-process method of the handler to save the image inside GCS and return only the GCS path of the image.<\/p>\n<p>Luckily, I have already prepared a handler that does just this for you, switching from stable_diffusion_handler.py to\u00a0<a href=\"https:\/\/github.com\/artefactory\/deploy_stable_difusion\/blob\/master\/src\/stable_diffusion\/stable_diffusion_handler_gcs.py\" target=\"_blank\" rel=\"noopener ugc nofollow\">stable_diffusion_handler_gcs.py<\/a>\u00a0should do the trick.<\/p>\n<p><em>WARNING: Before running the deployment with the new handler, you need to:<\/em><\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-3 fusion-checklist-default type-icons paddingList dark-text\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>Create a GCS bucket that will store the images<\/em><\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>Edit the name of the new GCS bucket and folder inside src\/stable_diffusion\/external_files\/config.py<\/em><\/p>\n<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">\n<p><em>Change the service account used by the endpoint inside the deploy.sh file. You need a service account with GCS OWNER permissions.<\/em><\/p>\n<\/div><\/li><\/ul><div class=\"fusion-title title fusion-title-13 fusion-sep-none fusion-title-text fusion-title-size-two\" style=\"--awb-margin-bottom-small:8px;\"><h2 class=\"fusion-title-heading title-heading-left fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:50;line-height:1.2;\">Conclusion<\/h2><\/div><div class=\"fusion-text fusion-text-42\"><p>In this article, a comprehensive guide has been provided for deploying the Stable Diffusion model on Google Cloud Platform using Vertex AI.<\/p>\n<p>The guide covered essential steps such as:<\/p>\n<\/div><ul style=\"--awb-line-height:27.2px;--awb-icon-width:27.2px;--awb-icon-height:27.2px;--awb-icon-margin:11.2px;--awb-content-margin:38.4px;\" class=\"fusion-checklist fusion-checklist-4 fusion-checklist-default type-icons paddingList dark-text\"><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">Using TorchServe for deployment<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">Creating and modifying a custom handler for Stable Diffusion<\/div><\/li><li class=\"fusion-li-item\" style=\"\"><span class=\"icon-wrapper circle-no\"><i class=\"fusion-li-icon awb-icon-check\" aria-hidden=\"true\"><\/i><\/span><div class=\"fusion-li-item-content\">Deploying the model using Vertex model registry and Vertex Endpoints<\/div><\/li><\/ul><div class=\"fusion-text fusion-text-43\"><p>It\u2019s important to remember that while Vertex endpoint can be an effective solution, it does not support scaling down to 0 instances, which could lead to increased costs as the GPU remains in use.<\/p>\n<p>Additionally, with the deployment of the Stable Diffusion model accomplished, we are currently exploring further possibilities such as fine-tuning the model on specific Moet Hennessy products to enhance the model capabilities even further.<\/p>\n<\/div><\/div><\/div><\/div><\/article><div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-5 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-margin-top:40px;--awb-margin-bottom:40px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-justify-content-center fusion-flex-content-wrap\" style=\"max-width:calc( 1440px + 20px );margin-left: calc(-20px \/ 2 );margin-right: calc(-20px \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-flex-align-self-center\" style=\"--awb-padding-top:40px;--awb-padding-right:40px;--awb-padding-bottom:40px;--awb-padding-left:40px;--awb-overflow:hidden;--awb-bg-position:left center;--awb-bg-size:cover;--awb-border-color:rgba(10,17,40,0.1);--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:10px;--awb-margin-bottom-large:0px;--awb-spacing-left-large:10px;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:10px;--awb-spacing-left-medium:10px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:10px;--awb-spacing-left-small:10px;\"><div class=\"fusion-column-wrapper lazyload fusion-column-has-shadow fusion-flex-justify-content-center fusion-content-layout-column fusion-column-has-bg-image\" data-bg-url=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\" data-bg=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/background.jpg\"><div class=\"fusion-image-element\" style=\"text-align:center;--awb-margin-right:20px;--awb-margin-left:20px;--awb-max-width:150px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-4 hover-type-none\"><img decoding=\"async\" width=\"72\" height=\"41\" title=\"medium\" src=\"data:image\/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%2772%27%20height%3D%2741%27%20viewBox%3D%270%200%2072%2041%27%3E%3Crect%20width%3D%2772%27%20height%3D%2741%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E\" data-orig-src=\"https:\/\/www.artefact.com\/\/wp-content\/uploads\/2021\/03\/medium.png\" alt class=\"lazyload img-responsive wp-image-60927\"\/><\/span><\/div><div class=\"fusion-title title fusion-title-14 fusion-sep-none fusion-title-center fusion-title-text fusion-title-size-three\" style=\"--awb-margin-top:20px;--awb-margin-bottom:0px;--awb-margin-bottom-small:8px;\"><h3 class=\"fusion-title-heading title-heading-center fusion-responsive-typography-calculated\" style=\"margin:0;--fontSize:20;line-height:1.2;\">Medium Blog by Artefact.<\/h3><\/div><div class=\"fusion-text fusion-text-44\" style=\"--awb-content-alignment:center;\"><p>This article was initially published on <strong>Medium.com<\/strong>.<br \/>\nFollow us on our Medium Blog !<\/p>\n<\/div><div style=\"text-align:center;\"><a class=\"fusion-button button-flat button-medium button-default fusion-button-default button-1 fusion-button-default-span fusion-button-default-type\" target=\"_blank\" rel=\"noopener noreferrer\" href=\"https:\/\/medium.com\/artefact-engineering-and-data-science\/deploying-stable-diffusion-on-vertex-ai-6c05ea53c68f\"><span class=\"fusion-button-text awb-button__text awb-button__text--default\">Read Our Article<\/span><\/a><\/div><\/div><\/div><\/div><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Este artigo fornece um guia para a implanta\u00e7\u00e3o do modelo Stable Diffusion, um modelo popular de gera\u00e7\u00e3o de imagens, no Google Cloud usando o Vertex AI.<\/p>","protected":false},"featured_media":68766,"parent":0,"template":"","meta":{"_acf_changed":false,"ep_exclude_from_search":false},"blog-category":[21939],"blog-language":[2991],"class_list":["post-68762","blog","type-blog","status-publish","has-post-thumbnail","hentry","blog-category-medium","blog-language-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog\/68762","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/types\/blog"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/media\/68766"}],"wp:attachment":[{"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/media?parent=68762"}],"wp:term":[{"taxonomy":"blog-category","embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog-category?post=68762"},{"taxonomy":"blog-language","embeddable":true,"href":"https:\/\/www.artefact.com\/br\/wp-json\/wp\/v2\/blog-language?post=68762"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}