![]() ![]() Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. We are going to use the meta-llama/Llama-2-7b-chat-hf model. %pip install "sagemaker>=2.197.0" -upgradeĪfter we have installed the optimum-neuron we can convert load and convert our model. # Install the required packages %pip install "optimum-neuron=0.0.13" -upgrade Tip: If you are using Amazon SageMaker Notebook Instances or Studio you can go with the conda_python3 conda kernel. Optimum Neuron provides a set of tools enabling easy model loading, training and inference on single- and multi-Accelerator settings for different downstream tasks.Īs a first step, we need to install the optimum-neuron and other required packages. We are going to use the optimum-neuron to compile/convert our model to neuronx. Convert Llama 2 to AWS Neuron (Inferentia2) with optimum-neuron You need access to an IAM Role with the required permissions for Sagemaker. If you are going to use Sagemaker in a local environment (not SageMaker Studio or Notebook Instances). instance sizeĪdditionally, inferentia 2 will support the writing of custom operators in c++ and new datatypes, including FP8 (cFP8). Inferentia 2 is the successor of AWS Inferentia, which promises to deliver up to 4x higher throughput and up to 10x lower latency. Deploy a Real-time Inference Endpoint on Amazon SageMakerĪWS inferentia (Inf2) are purpose-built EC2 for deep learning (DL) inference workloads.Upload the neuron model and inference script to Amazon S3.Create a custom inference.py script for Llama 2.Convert Llama 2 to AWS Neuron (Inferentia2) with optimum-neuron. ![]() Optimum Neuron is the interface between the Hugging Face Transformers & Diffusers library and AWS Accelerators including AWS Trainium and AWS Inferentia2. In this end-to-end tutorial, you will learn how to deploy and speed up Llama 2 inference using AWS Inferentia2 and optimum-neuron on Amazon SageMaker. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |