Problems deploying your ML models? Here is your solution! (2024)

Decoding ML Notes

This week’s topics:

The ultimate guide on installing PyTorch with CUDA support in all possible ways
Generate a synthetic domain-specific Q&A dataset in <30 minutes

The power of serverless in the world of ML

Exciting news 🔥 I was invited by Maven to speak in their Lighting Lesson series about how to 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁 𝗬𝗼𝘂𝗿 𝗟𝗟𝗠 𝗧𝘄𝗶𝗻.

Problems deploying your ML models? Here is your solution! (1)

This 30-min session is for ML & MLOps engineers who want to learn:

𝗟𝗟𝗠 𝗦𝘆𝘀𝘁𝗲𝗺 𝗱𝗲𝘀𝗶𝗴𝗻 𝗼𝗳 𝘆𝗼𝘂𝗿 𝗟𝗟𝗠 𝗧𝘄𝗶𝗻

→ Using the 3-pipeline architecture & MLOps good practices

𝗗𝗲𝘀𝗶𝗴𝗻 𝗮 𝗱𝗮𝘁𝗮 𝗰𝗼𝗹𝗹𝗲𝗰𝘁𝗶𝗼𝗻 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲

→ data crawling, ETLs, CDC, AWS

𝗗𝗲𝘀𝗶𝗴𝗻 𝗮 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲

→ streaming engine in Python, data ingestion for fine-tuning & RAG, vector DBs

𝗗𝗲𝘀𝗶𝗴𝗻 𝗮 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲

→ create a custom dataset, fine-tuning, model registries, experiment trackers, LLM evaluation

𝗗𝗲𝘀𝗶𝗴𝗻 𝗮𝗻 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲

→ real-time deployment, REST API, RAG, LLM monitoring

↓↓↓

Join LIVE on 𝘍𝘳𝘪, 𝘔𝘢𝘺 3!
Register here (it’s free) ←

The ultimate guide on installing PyTorch with CUDA support in all possible ways

Ever wanted to quit ML while wrestling with 𝗖𝗨𝗗𝗔 𝗲𝗿𝗿𝗼𝗿𝘀? I know I did. → Discover 𝗵𝗼𝘄 to install 𝗖𝗨𝗗𝗔 & 𝗣𝘆𝗧𝗼𝗿𝗰𝗵 𝗽𝗮𝗶𝗻𝗹𝗲𝘀𝘀𝗹𝘆 in all possible ways.

Here is the story of most ML people:

1. You just got excited about a new model that came out.

2. You want to try it out.

3. You install everything.

4. You run the model.

5. Bam... CUDA error.

6. You fix the error.

7. Bam... Another CUDA error

7. You fix the error.

8. ...Yet another CUDA error.

You get the idea.

→ Now it is 3:00 am, and you finally solved all your CUDA errors and ran your model.

Now, it's time to do your actual work.

Do you relate?

If so...

I started a Medium article where I documented good practices and step-by-step instructions on how to install CUDA & PyTorch with:

- Pip
- Conda (or Mamba)
- Poetry
- Docker

Problems deploying your ML models? Here is your solution! (2)

Check it out ↓
🔗 The ultimate guide on installing PyTorch with CUDA support in all possible ways

𝗡𝗼𝘁𝗲: Feel free to comment with any improvements on how to install CUDA + PyTorch. Let's make the ultimate tutorial on installing these 2 beasts 🔥

Generate a synthetic domain-specific Q&A dataset in <30 minutes

How do you 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲 a 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗱𝗼𝗺𝗮𝗶𝗻-𝘀𝗽𝗲𝗰𝗶𝗳𝗶𝗰 𝗤&𝗔 𝗱𝗮𝘁𝗮𝘀𝗲𝘁 in <𝟯𝟬 𝗺𝗶𝗻𝘂𝘁𝗲𝘀 to 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗲 your 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝗟𝗟𝗠?

This method is also known as 𝗳𝗶𝗻𝗲𝘁𝘂𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗱𝗶𝘀𝘁𝗶𝗹𝗹𝗮𝘁𝗶𝗼𝗻. Here are its 3 𝘮𝘢𝘪𝘯 𝘴𝘵𝘦𝘱𝘴 ↓

𝘍𝘰𝘳 𝘦𝘹𝘢𝘮𝘱𝘭𝘦, 𝘭𝘦𝘵'𝘴 𝘨𝘦𝘯𝘦𝘳𝘢𝘵𝘦 𝘢 𝘘&𝘈 𝘧𝘪𝘯𝘦-𝘵𝘶𝘯𝘪𝘯𝘨 𝘥𝘢𝘵𝘢𝘴𝘦𝘵 𝘶𝘴𝘦𝘥 𝘵𝘰 𝘧𝘪𝘯𝘦-𝘵𝘶𝘯𝘦 𝘢 𝘧𝘪𝘯𝘢𝘯𝘤𝘪𝘢𝘭 𝘢𝘥𝘷𝘪𝘴𝘰𝘳 𝘓𝘓𝘔.

𝗦𝘁𝗲𝗽 𝟭: 𝗠𝗮𝗻𝘂𝗮𝗹𝗹𝘆 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲 𝗮 𝗳𝗲𝘄 𝗶𝗻𝗽𝘂𝘁 𝗲𝘅𝗮𝗺𝗽𝗹𝗲𝘀

Generate a few input samples (~3) that have the following structure:
- 𝘶𝘴𝘦𝘳_𝘤𝘰𝘯𝘵𝘦𝘹𝘵: describe the type of investor (e.g., "I am a 28-year-old marketing professional")
- 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯: describe the user's intention (e.g., "Is Bitcoin a good investment option?")

𝗦𝘁𝗲𝗽 𝟮: 𝗘𝘅𝗽𝗮𝗻𝗱 𝘁𝗵𝗲 𝗶𝗻𝗽𝘂𝘁 𝗲𝘅𝗮𝗺𝗽𝗹𝗲𝘀 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲 𝗵𝗲𝗹𝗽 𝗼𝗳 𝗮 𝘁𝗲𝗮𝗰𝗵𝗲𝗿 𝗟𝗟𝗠

Use a powerful LLM as a teacher (e.g., GPT4, Falcon 180B, etc.) to generate up to +N similar input examples.

We generated 100 input examples in our use case, but you can generate more.

You will use the manually filled input examples to do few-shot prompting.

This will guide the LLM to give you domain-specific samples.

𝘛𝘩𝘦 𝘱𝘳𝘰𝘮𝘱𝘵 𝘸𝘪𝘭𝘭 𝘭𝘰𝘰𝘬 𝘭𝘪𝘬𝘦 𝘵𝘩𝘪𝘴:
"""
...
Generate 100 more examples with the following pattern:

# USER CONTEXT 1
...

# QUESTION 1
...

# USER CONTEXT 2
...
"""

𝗦𝘁𝗲𝗽 𝟯: 𝗨𝘀𝗲 𝘁𝗵𝗲 𝘁𝗲𝗮𝗰𝗵𝗲𝗿 𝗟𝗟𝗠 𝘁𝗼 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲 𝗼𝘂𝘁𝗽𝘂𝘁𝘀 𝗳𝗼𝗿 𝗮𝗹𝗹 𝘁𝗵𝗲 𝗶𝗻𝗽𝘂𝘁 𝗲𝘅𝗮𝗺𝗽𝗹𝗲𝘀

Now, you will have the same powerful LLM as a teacher, but this time, it will answer all your N input examples.

But first, to introduce more variance, we will use RAG to enrich the input examples with news context.

Afterward, we will use the teacher LLM to answer all N input examples.

...and bam! You generated a domain-specific Q&A dataset with almost 0 manual work.

Now, you will use this data to train a smaller LLM (e.g., Falcon 7B) on a niched task, such as financial advising.

This technique is known as finetuning with distillation because you use a powerful LLM as the teacher (e.g., GPT4, Falcon 180B) to generate the data, which will be used to fine-tune a smaller LLM (e.g., Falcon 7B), which acts as the student.

Problems deploying your ML models? Here is your solution! (3)

✒️ 𝘕𝘰𝘵𝘦: To ensure that the generated data is of high quality, you can hire a domain expert to check & refine it.

The power of serverless in the world of ML

𝗗𝗲𝗽𝗹𝗼𝘆𝗶𝗻𝗴 & 𝗺𝗮𝗻𝗮𝗴𝗶𝗻𝗴 ML models is 𝗵𝗮𝗿𝗱, especially when running your models on GPUs.

But 𝘀𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 makes things 𝗲𝗮𝘀𝘆.

Using Beam as your serverless provider, deploying & managing ML models can be as easy as ↓

𝗗𝗲𝗳𝗶𝗻𝗲 𝘆𝗼𝘂𝗿 𝗶𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 & 𝗱𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝗰𝗶𝗲𝘀

In a few lines of code, you define the application that contains:

- the requirements of your infrastructure, such as the CPU, RAM, and GPU
- the dependencies of your application
- the volumes from where you can load your data and store your artifacts

𝗗𝗲𝗽𝗹𝗼𝘆 𝘆𝗼𝘂𝗿 𝗷𝗼𝗯𝘀

Using the Beam application, you can quickly decorate your Python functions to:

- run them once on the given serverless application
- put your task/job in a queue to be processed or even schedule it using a CRON-based syntax
- even deploy it as a RESTful API endpoint

As you can see in the image below, you can have one central function for training or inference, and with minimal effort, you can switch from all these deployment methods.

Also, you don't have to bother at all with managing the infrastructure on which your jobs run. You specify what you need, and Beam takes care of the rest.

By doing so, you can directly start to focus on your application and stop carrying about the infrastructure.

This is the power of serverless!

Problems deploying your ML models? Here is your solution! (4)

↳🔗 𝘊𝘩𝘦𝘤𝘬 𝘰𝘶𝘵 𝘉𝘦𝘢𝘮 𝘵𝘰 𝘭𝘦𝘢𝘳𝘯 𝘮𝘰𝘳𝘦

Images

If not otherwise stated, all images are created by the author.