AI Mail Tester

AI is a hot topic at the moment and I wanted an excuse to play with it and learn how to use OpenAI’s API. It is likely that email companies will be using AI to determine email legitimacy, and I wondered how effective it would be at this.

Setup

To start setting this up I created the DNS records (A & MX) and created a new virtual machine on my ESXi box (Debian 11). Routing was then setup to allow web and SMTP traffic to the VM in the PFSense router.

Install Lamp Stack

$> apt-get install apache2 mariadb-server php libapache2-mod-php php-cli php-mysql php-zip php-curl php-xml composer

Test the webserver

Setup catchall mail server

$> apt-get install postfix mailutils 

Append to /etc/postfix.cf

virtual_alias_maps = hash:/etc/postfix/virtual
transport_maps = hash:/etc/postfix/transport

nano /etc/postfix/virtual

@aimailtest.swin.es mailbot

nano /etc/postfix/transport

mailbot@aimailtest.swin.es  mailbot:

Append to /etc/postfix/master.cf

mailbot unix - n n - 50 pipe
 flags=R user=www-data argv=/usr/bin/php /var/www/html/pipe_email_input.php -o SENDER=${sender} -m USER=${user} EXTENSION=${extension}

Create the PHP script that emails will be sent to (/var/www/html/pipe_email_input.php):

#!/usr/bin/php
<?php
// Database configuration
$databaseHost = "localhost";  // Replace with your database host
$databaseName = "aimailtest";
$databaseUser = "aimailtest";
$databasePassword = "aimailtest";

// Log the received parameters for debugging
$logFile = '/tmp/debug.log';
$debugMessage = date('Y-m-d H:i:s') . " - Received Parameters:\n";
foreach ($argv as $arg) {
    $debugMessage .= $arg . "\n";
}
file_put_contents($logFile, $debugMessage, FILE_APPEND);

// Read the email content from standard input
$emailContent = '';
while ($line = fgets(STDIN)) {
    $emailContent .= $line;
}

// Parse the email content
if (preg_match('/^To:\s*(.*?)$/m', $emailContent, $matchesTo) &&
    preg_match('/^From:\s*(.*?)$/m', $emailContent, $matchesFrom) &&
    preg_match('/^Subject:\s*(.*?)$/m', $emailContent, $matchesSubject)) {

    $to = trim($matchesTo[1]);
    $from = trim($matchesFrom[1]);
    $subject = trim($matchesSubject[1]);

    // Database connection
    $db = new mysqli($databaseHost, $databaseUser, $databasePassword, $databaseName);

    // Check the database connection
    if ($db->connect_error) {
        die("Database connection failed: " . $db->connect_error);
    }

    // Insert the email details into the database
    $query = "INSERT INTO email (time, email_to, email_from, subject, message) VALUES (NOW(), ?, ?, ?, ?)";
    $statement = $db->prepare($query);
    $statement->bind_param("ssss", $to, $from, $subject, $emailContent);

    if ($statement->execute()) {
        echo "Email inserted into the database.\n";
    } else {
        echo "Failed to insert the email into the database.\n";
    }

    // Close the database connection
    $statement->close();
    $db->close();
} else {
    echo "Failed to parse email details.\n";
}

Create database and setup test user

login to the database and create a new user

GRANT ALL ON *.* TO 'aimailtest'@'localhost' IDENTIFIED BY 'aimailtest' WITH GRANT OPTION;

Test emails can be sent with:

$> echo "Hello world" | mail -s "Test" test@aimailtest.swin.es

Setup OpenAI API

Create a new account and generate an API key

Create /var/www/openAITest.php

<?php

$openAISecretKey = "sk-2-snip-Eb9Yr";

$search = "Give me 2 words related to php";
$data = [
        "model" => "gpt-3.5-turbo",
        'messages' => [
            [
               "role" => "user",
               "content" => $search
           ]
        ],
        'temperature' => 0.5,
        "max_tokens" => 200,
        "top_p" => 1.0,
        "frequency_penalty" => 0.52,
        "presence_penalty" => 0.5,
        "stop" => ["11."],
      ];

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://api.openai.com/v1/chat/completions');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));

$headers = [];
$headers[] = 'Content-Type: application/json';
$headers[] = 'Authorization: Bearer '.$openAISecretKey;
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$response = curl_exec($ch);
if (curl_errno($ch)) {
    echo 'Error:' . curl_error($ch);
}

curl_close($ch);
print_r($response);

Lets test this script:

$>php openAItest.php 
{
  "id": "chatcmpl-89DCdN7pTTyBgKno4vn3YuCbf9fYk",
  "object": "chat.completion",
  "created": 1697206395,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1) Hypertext Preprocessor (PHP is a server-side scripting language widely used for web development)\n
2) MySQL (a popular open-source relational database management system often used in conjunction with PHP)"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 39,
    "total_tokens": 54
  }
}

Full integration

Now that we know all the individual components are working it’s time to get everything together.
First I decided on what I wanted the AI to check on the emails, I went for “AI percentage”, “tag” and “description”.
AI percentage is what percentage the AI thinks the sent email is written by another AI.
Tag is a single word used to describe the email “spam”, “phishing”, “malware” etc. or “legit” if the email is safe.
Description would be a couple of sentences on what the AI thinks the content of the email is about.

So the new database structure looks like:

id (int 11)
time (timestamp)
ai (int 255)
tag (varchar 255)
ai_desc (mediumtext)
email_to (varchar 255)
email_from (varchar 255)
subject (mediumtext)
message (longtext)

Edit the postfix configuration to send all emails to /var/www/html/email_to_ai.php which contains:

<?php
  
$openAISecretKey = "sk-2-snip-EEb9Yr";
// Database configuration
$databaseHost = "localhost";  // Replace with your database host
$databaseName = "aimailtest";
$databaseUser = "aimailtest";
$databasePassword = "aimailtest";
$pretext = file_get_contents('/var/www/html/pretext.txt');


// Log the received parameters for debugging
$logFile = '/tmp/debug.log';
$debugMessage = date('Y-m-d H:i:s') . " - Received Parameters:\n";
foreach ($argv as $arg) {
    $debugMessage .= $arg . "\n";
}
file_put_contents($logFile, $debugMessage, FILE_APPEND);

// Read the email content from standard input
$emailContent = '';
while ($line = fgets(STDIN)) {
    $emailContent .= $line;
}

// Parse the email content
if (preg_match('/^To:\s*(.*?)$/m', $emailContent, $matchesTo) &&
    preg_match('/^From:\s*(.*?)$/m', $emailContent, $matchesFrom) &&
    preg_match('/^Subject:\s*(.*?)$/m', $emailContent, $matchesSubject)) {

    $to = trim($matchesTo[1]);
    $from = trim($matchesFrom[1]);
    $subject = trim($matchesSubject[1]);

    // get openAI response
    $airesponse = getAI($openAISecretKey, $pretext, $emailContent);
    file_put_contents($logFile, $airesponse."\n", FILE_APPEND);
    // Extract relevant data from $airesponse
    $aiData = json_decode($airesponse, true);
    if (isset($aiData['error'])) {
        $aiContent['ai'] = 0;
        $aiContent['tag'] = $aiData['error']['code'];
        $aiContent['ai_desc'] = $aiData['error']['message'];
    } else {
      $aiContent = json_decode($aiData['choices'][0]['message']['content'], true);
    }

    // Database connection
    $db = new mysqli($databaseHost, $databaseUser, $databasePassword, $databaseName);

    // Check the database connection
    if ($db->connect_error) {
        die("Database connection failed: " . $db->connect_error);
    }

    // Insert the email details into the database
    $query = "INSERT INTO email (time, ai, tag, ai_desc, email_to, email_from, subject, message) VALUES (NOW(), ?, ?, ?, ?, ?, ?, ?)";
    $statement = $db->prepare($query);
    $statement->bind_param("issssss", $aiContent['ai'], $aiContent['tag'], $aiContent['ai_desc'], $to, $from, $subject, $emailContent);

    if ($statement->execute()) {
        echo "Email inserted into the database.\n";
    } else {
        echo "Failed to insert the email into the database.\n";
    }

    // Close the database connection
    $statement->close();
    $db->close();
} else {
    echo "Failed to parse email details.\n";
}

function getAI($openAISecretKey, $pretext, $email){   
  $data = [
          "model" => "gpt-3.5-turbo-16k",
          'messages' => [
            [
              "role" => "system",
              "content" => $pretext
            ],
            [
              "role" => "user",
              "content" => $email
            ]
          ],
          'temperature' => 0.5,
          "max_tokens" => 200,
          "top_p" => 1.0,
          "frequency_penalty" => 0.52,
          "presence_penalty" => 0.5,
          "stop" => ["11."],
        ];
    
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, 'https://api.openai.com/v1/chat/completions');
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_POST, 1);
  curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
    
  $headers = [];
  $headers[] = 'Content-Type: application/json';
  $headers[] = 'Authorization: Bearer '.$openAISecretKey;
  curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    
  $response = curl_exec($ch);
  if (curl_errno($ch)) {
      echo 'Error:' . curl_error($ch);
  }

  curl_close($ch);
  return $response;
}

The pretext (/var/www/html/pretext.txt) to sent to OpenAI is:

return a JSON response to the provided email.
The json should look as follows:
{
"ai":[aiInt],
"tag":[tag],
"ai_desc":[aiDesc]
}
Replace "[aiInt]" with the percentage liklihood that the email was created by another AI or LLM.
Determine a meaningful tag to describe the email such as "spam", "mailinglist", "malware", "phishing" etc.
If a mail is legitimate set the tag to "legit". There must be 1 word as a tag.
Replace "[aiDesc]" with one or two sentances describing the contents of the email.

That is all that is needed to get AI to check all incoming emails and store them in a database.

Of course I want a webapp to display this data in a pretty manner, the code for /var/www/html/viewEmails.php is:

<html>
<head>
<style>
   html * {
   font-size: 0.98em !important;
   color: #000 !important;
   font-family: Arial !important;
}

   body {
     font-size: 0.75em;    }
    .email-container {
        width: 100%;
    }

    .email {
        margin: 10px;
        padding: 7px;
        border: 1px solid #ddd;
        border-radius: 10px;
    }

    .toggle-message {
        cursor: pointer;
        background-color: #eee;
    }

    .hidden-message {
        display: none;
    }
</style>
<script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
</head>
<body>

<?php
$databaseHost = "localhost";
$databaseName = "aimailtest";
$databaseUser = "aimailtest";
$databasePassword = "aimailtest";
$databaseTable = "email";

try {
    $pdo = new PDO("mysql:host=$databaseHost;dbname=$databaseName", $databaseUser, $databasePassword);
    $pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

    // Query to retrieve data ordered by id in descending order
    $query = "SELECT * FROM $databaseTable ORDER BY id DESC";
    $stmt = $pdo->query($query);

    echo '<div class="email-container">';

    //$isOdd = true; // Variable to track even/odd rows

    while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
        //$rowClass = $isOdd ? 'powderblue' : 'thistle';
        //$isOdd = !$isOdd; // Toggle for the next row

        switch ($row['tag']) {
            case 'legit':
                $rowClass = 'DarkSeaGreen';
                break;
            case 'phishing':
            case 'spam':
            case 'malware':
                $rowClass = 'IndianRed';
                break;
            default:
                $rowClass = 'thistle';
                break;
        }


        echo '<div class="email" style="background-color:'.$rowClass.'">';
        echo '<table style="width:100%">';
        echo '<tr class="toggle-message">';
        echo '<td style="width: 20%; padding-left:10px">' . htmlentities($row['ai']) . '% AI</td>';
        echo '<td style="width: 20%; padding-left:10px; font-weight:bold">' . htmlentities($row['tag']) . '</td>';
        echo '<td style="width: 60%; padding:5px" colspan=2>' . htmlentities($row['ai_desc']) . '</td>';
        echo '</tr>';
        echo '<tr class="toggle-message">';
        echo '<td style="width: 25%; padding-left:10px">' . htmlentities($row['time']) . '</td>';
        echo '<td style="width: 25%; padding-left:10px">To: ' . htmlentities($row['email_to']) . '</td>';
        echo '<td style="width: 25%; padding-left:10px">From: ' . htmlentities($row['email_from']) . '</td>';
        echo '<td style="width: 25%">' . htmlentities($row['subject']) . '</td>';
        echo '</tr>';
        echo '<tr class="hidden-message">';
        echo '<td colspan="4" style="background-color:cornsilk"><pre>' . htmlentities($row['message']) . '</p-remove this-re></td>';
        echo '</tr>';
        echo '</table>';
        echo '</div>';
    }

    echo '</div>';

} catch (PDOException $e) {
    echo "Error: " . $e->getMessage();
}
?>

<script>
    $(document).ready(function() {
        $('.toggle-message').click(function() {
            $(this).closest('.email').find('.hidden-message').toggle();
        });
    });
</script>

</body>
</html>

Which looks as follows:

The background colours are dependent on the tag. You can click on the table to view the contents of the email:

Results

Emails sent from my personal inbox as expected came through as “legitimate”:

Also as expected emails from my personal Gmail’s spam folder was flagged as such:

Emails sent from my work inbox caused OpenAI to error. This is due to work adding headers and footers to the email content, the images being bas64 embedded in the emails, even a blank email would be to large for the OpenAI model I was using (gpt-3.5-turbo-16k)

Overall, I would say this was a successful experiment. For the most part GPT3.5 was able to determine the contents of the email and if it was legitimate or malicious.

Improvements

A common issue I was coming across was exceeding the size limits, removing base64-encoded data, images and attachments would improve this. Headers and html/plaintext data could be removed (if the other exists) however including them would give the LLM more reference points and areas of interest to detect malicious content.

Attachments could also be extracted and sent to a multi-AV checking API such as virus total.

If a log of what action the user performed per email existed, it would be possible to feed the model all past emails along with actions to prioritise things, e.g. user always replies to person A within an hour, push these emails to the top as they are likely to be important, or emails with a subject tag of “[help]” always get put into spam, do this automatically.

Given an entire organization’s emails, it would be possible for AI to understand relationships between people and flag anomalies when they arise. For example, if an employee has started sending emails to people in a department they don’t usually or shouldn’t need to, perhaps the account has been compromised and the attacker is doing some recon/intel gathering or just plain phishing.

Cost

I estimate I sent 30-40 emails in total to OpenAPI. You are given $5 credit and the total use was $0.22

Final Thoughts

I would like to thank ChatGPT 3.5 for writing the majority of code for this experiment (with some coaxing) it made this project super quick and easy!

Sharing is caring!

Leave a Reply