Email classification with LLM power supply on data

Original): Gabriele Albini

Originally published in the direction of artificial intelligence.

Email classification with LLM power supply on data

Introduction

Since the introduction AI functions In Databicks LLMS (large language models), you can easily integrate with any data flow: Analysts and business users who may not know Python or ML/AI infrastructure, may perform advanced AI tasks directly from SQL queries.

I recommend watching this great video Review of introduction to this brilliant function.

This article discusses the method of implementing the E -Mail classification: Let's assume that customers write to our company's mailbox to ask for a lack of subscription from marketing or commercial E -Maili. Without any historical data sets, we want to automate the checking of the mailbox and classifying the client's intentions based on the E -Mail body.

Contents:

Part 1: AI functions

Let's use Ai_Query ()Part of the AI ​​Databicks function, to classify the e -mail message.

Suppose we have the following fields available:

Test data set

To use AI_Query () in our “E -Mail_Body” column, we will use the following arguments:

  • end point: Name of the end model model we intend to use (LAMA3.3 in this example) (check Here How to create your model supporting the end point for databicks, choosing one of the supported foundation models).
  • application: prompt that includes “e -mail_body”
  • Model parameters: Additional parameters that we can transfer to LLM. In this example, we will limit the output token to 1 and choose a very low temperature to reduce randomness and creativity of the generated model output.

The monitor template used in this example is based on the tests If you are. (2024)who designed and tested Little shots of the template In the case of detection of E -Mail spam, which has been adapted as follows:

prompt_ = """

Forget all your previous instructions, pretend you are an e-mail
classification expert who tries to identify whether an e-mail is requesting
to be removed from a marketing distribution list.
Answer "Remove" if the mail is requesting to be removed, "Keep" if not.
Do not add any other detail.
If you think it is too difficult to judge, you can exclude the impossible
one and choose the other, just answer "Remove" or "Keep".

Here are a few examples for you:
* "I wish to no longer receive emails" is "Remove";
* "Remove me from any kind of subscriptions" is "Remove";
* "I want to update my delivery address" is "Keep";
* "When is my product warranty expiring?" is "Keep";

Now, identify whether the e-mail is "Remove" or "Keep";
e-mail:

"""

In the end, we can connect all the elements visible above in one SQL inquiry, working Party inference on all E -Mailes and generating labels:

select *,
ai_query(
'databricks-meta-llama-3-3-70b-instruct',
"${prompt}" || Email_body,
modelParameters => named_struct('max_tokens', 1, 'temperature', 0.1)
) as Predicted_Label
from customer_emails;
Data set with generated labels

Part 2: Access to API Gmail

We will need a way to automatically consume E -Mail to implement this case of use. Here is a step -by -step guide on how to use the API Gmail interfaces.

1.1: Configure your Gmail account to work with API interfaces

Recommended approach to enabling the Google APi interfaces on the account is to use Service Accounts. The process is described Here; However, this requires:

  • Corporate account (not ending gmail.com).
  • Access as a super domain administrator of Google's working space to transfer the nationwide authority to the service account.

For this demo we use the gmail account mannequin; Therefore, we will apply a more manual approach to Gmail authentication, described Here.

The first steps are the same for both approaches, so you can follow, but to fully automate access to Gmail via API, you need a service account.

We must create first design:

  • Log in to Google console.
  • Create a new project for this case of use.
  • Turn on the API Gmail interface for your project with this to combine.
Turning the API interfaces in your project

Second, configure Oauth consent screen:

  • As part of the project, go to “API & Services”> “Oith's consent screen”.
  • Go to the “Branding” section and click, start to create an application identity.
  • Then we need to create a web oauth 2.0 application identifier using this to combine.
  • Download the certificate file as Json, because we will need it later.
  • Add the following authorized URI Directions:
Creating the Oauth consent screen

Finally, authorize users to authenticate and publish the application:

  • As part of the project, go to “API & Services”> “Oith's consent screen”.
  • Go to the “Recipients” section and add all test users working on the project so that they can authenticate.
  • To make sure that access does not expire, publish the application by transferring your production status.

1.2 Get access to the Gmail mailbox from Databicks notebooks

To authenticate to Gmail from the Databicks notebook, we can use the following function implemented in the repository. The function requires:

  • To access for the first time, the JSON file certifies that can be saved in the volume.
  • In the case of future access, active certificates will be stored in a token file that will be reused.
gmail_authenticate_manual()

Because we do not use services accounts, Google Cloud authentication requires opening the browser to the Oauth page and generating a temporary code.

However, we will need a bypass to do it on databicks, because clusters do not have access to the browser.

As part of this bypass, we have implemented the following function, which suggests to the user opening the URL from the local browser, continuing authentication, and then landing on the error page.

We can recover the code needed for authentication in the API Google API from the generated URL of this error page:

Note: Thanks to service accounts, this manual step will not be required.

After authentication, we can read E -Mail with Gmail using the following function, save the E -Mail of Coalwate, and ultimately to the Delta table:

# Build Gmail API service and download emails
service_ = build('gmail', 'v1', credentials = access_)
emails = get_email_messages_since(service_, since_day=25, since_month=3, since_year = 2025)

if emails:
spark_emails = spark.createDataFrame(emails)
display(spark_emails)
else:
spark_emails = None
print("No emails found.")

Download E -Maili from Gmail

Conclusions

To sum up, this post:

  • I showed how easy it is to configure AI function and use LLM to automate work flows throughout the organization.
  • We have provided a practical fast template, designed for effective e-mail classification using learning with a small number of shots.
  • We went through the integration of API Gmail interfaces directly in Databicks notebooks.

Ready to improve your own processes?

Photo by John Plenio ON Unsplash

Thank you for reading!

Sources

Published via AI

LEAVE A REPLY

Please enter your comment!
Please enter your name here