This post explores the expected value framework using inference data to understand decision-making under uncertainty in ML contexts. More specifically, this is a demo to demonstrate a business-oriented optimization of a classification threshold for a binary classification model used for email marketing campaign optimization.
1. Expected Value Framework Fundamentals
In probability theory, the expected value (EV) of a random variable is the long-run average of outcomes weighted by their probabilities:
\[\mathbb{E}[X] = \sum_{i} P(y_i) \cdot V(y_i)\]Where:
- $ y_i $: possible class label or outcome (e.g., 0 or 1 in binary classification)
- $ P(y_i) $: predicted probability of class $ y_i $
- $ V(y_i) $: business benefit or cost associated with class $ y_i $
It reflects the average expected business outcome you can achieve by taking action under uncertainty, weighted by the model’s estimated probabilities. This helps bridge predictions and real-world decision-making by accounting for both uncertainty and impact.
In the context of machine learning, especially for classification models, we can interpret model outputs (typically class probabilities) as estimated likelihoods of different outcomes. We can then associate business values or costs with those outcomes to compute the expected value of taking a specific action.
2. The Email Marketing Campaign Use Case
In email marketing, there is always a risk that customers will unsubscribe or mark emails as spam if they receive too many promotional messages. The goal of the ML model is to predict whether a customer will unsubscribe from the mailing list if they receive a promotional email.
The ML model is defined as follows:
- Label 0: Customer will not unsubscribe → good candidate to send email
- Label 1: Customer will unsubscribe → should not send email
- Prediction = 1 → The ML model suggests to not send promotional email
- Prediction = 0 → The ML model suggests to send promotional email
2.1 Understanding the Confusion Matrix
Case | Prediction | Actual | Action Taken | Outcome Type | business_values Key |
Description |
---|---|---|---|---|---|---|
✅ True Positive (TP) | 1 | 1 | Don’t send email | Correct action | "correct_no_send" |
We avoided sending email to someone who would have unsubscribed |
❌ False Positive (FP) | 1 | 0 | Don’t send email | Incorrect action | "false_no_send" |
We wrongly withheld email from an engaged customer |
❌ False Negative (FN) | 0 | 1 | Send email | Incorrect action | "false_send" |
We sent email to someone who unsubscribed |
✅ True Negative (TN) | 0 | 0 | Send email | Correct action | "correct_send" |
We correctly sent email to someone who engaged positively |
3. Mapping Model Predictions to Business Impact
To map model predictions to meaningful business impact, we define a value for each type of prediction outcome. These values are based on expected revenue per email and customer lifetime value impacts from A/B testing.
The reason we are interested in this framework and decision making method is because not all prediction errors are equal. Some outcomes have a much higher cost or benefit than others — for example, wrongly sending emails to unsubscribe-prone customers damages list health, while correctly targeting engaged customers drives substantial revenue. Assigning different business values allows us to capture these asymmetries and make more informed threshold decisions.
"correct_send"
(True Negative)
- Customer received email and engaged positively (clicked, purchased, etc.)
- Measured as average revenue per email for engaged customers
- Typical value: $2.50 per successful email based on conversion rates
"false_send"
(False Negative)
- Customer received email and unsubscribed
- Captures customer lifetime value loss and list health damage
- Reflects long-term revenue impact from losing the customer
"false_no_send"
(False Positive)
- Customer was not sent email but would have engaged positively
- Value set to 0 to reflect missed opportunity (no direct cost but lost revenue)
"correct_no_send"
(True Positive)
- Customer was not sent email and would have unsubscribed
- Assigned +0.25 to capture soft benefits such as improved deliverability and sender reputation
# Business values calculated from A/B testing email campaigns
avg_revenue_per_engaged_email = 2.50 # Revenue from customers who engage
avg_clv_loss_from_unsubscribe = -15.0 # Customer lifetime value loss
business_values = {
# When you do send email:
"correct_send": avg_revenue_per_engaged_email,
"false_send": avg_clv_loss_from_unsubscribe,
# When you don't send email:
"false_no_send": 0, # Missed opportunity cost
"correct_no_send": 0.25, # Soft benefits: deliverability, sender reputation
}
4. Threshold Optimization via Expected Value
Rather than optimizing for metrics like accuracy, recall, or F1, we select the classification threshold that maximizes expected business value.
Each model prediction leads to an action (send email or don’t send), with asymmetric outcomes. To make the best decision, we identify the threshold that yields the highest total expected value:
- Send email if $P(\text{unsubscribe}) < \text{threshold}$
- Don’t send email if $P(\text{unsubscribe}) \geq \text{threshold}$
We evaluate a range of thresholds and simulate outcomes using the assigned business value of each possible decision (TP, FP, FN, TN).
def evaluate_threshold_expected_value(
df, prob_col, target_col, business_values, thresholds
):
"""
Evaluate expected business value and standard classification metrics across thresholds.
"""
results = []
for threshold in thresholds:
temp_df = df.copy()
temp_df["predicted"] = (temp_df[prob_col] >= threshold).astype(int)
tp = ((temp_df["predicted"] == 1) & (temp_df[target_col] == 1)).sum()
fp = ((temp_df["predicted"] == 1) & (temp_df[target_col] == 0)).sum()
fn = ((temp_df["predicted"] == 0) & (temp_df[target_col] == 1)).sum()
tn = ((temp_df["predicted"] == 0) & (temp_df[target_col] == 0)).sum()
# Business value
total_ev = (
tp * business_values["correct_no_send"]
+ fp * business_values["false_no_send"]
+ fn * business_values["false_send"]
+ tn * business_values["correct_send"]
)
# Classification metrics
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
f1 = (
2 * precision * recall / (precision + recall)
if (precision + recall) > 0
else 0
)
results.append({
"threshold": threshold,
"total_expected_value": total_ev,
"avg_ev_per_user": total_ev / len(df),
"precision": precision,
"recall": recall,
"f1": f1,
"tp": tp,
"fp": fp,
"fn": fn,
"tn": tn,
})
return pd.DataFrame(results)
4.1 Results
The analysis shows that the model performs best with a 30% threshold when maximizing expected value:
threshold | total_expected_value | avg_ev_per_user | precision | recall | f1 |
---|---|---|---|---|---|
0.30 | $127,450 | $0.85 | 0.385 | 0.946 | 0.548 |
0.50 | $89,230 | $0.59 | 0.469 | 0.682 | 0.556 |
0.70 | $45,180 | $0.30 | 0.640 | 0.180 | 0.280 |
The model performs best when it sends emails to more customers (with a 30% threshold) — even if some will unsubscribe — because:
- Correct email sends drive substantial revenue gains
- False email sends have manageable customer lifetime value impact
- Reaching engaged customers is far more valuable than avoiding every possible unsubscribe
✅ It’s better to risk sending to a potentially unengaged customer than miss the opportunity to generate revenue from someone who would have converted.
5. Customer Personalization using the EV Framework
For each customer (or instance), we compare:
- EV of Acting: Take an action (i.e. send promotional email to customer)
- EV of Not Acting: Do nothing (i.e. do not send email to the customer)
Let:
- $ p $ = probability that the customer will unsubscribe (from model)
- Business outcomes:
- $ V_{\text{correct send}} $: reward when we send email and the customer engages
- $ V_{\text{false send}} $: loss when we send email but the customer unsubscribes
- $ V_{\text{false no send}} $: loss when we don’t send email and the customer would have engaged
- $ V_{\text{correct no send}} $: outcome when we don’t send email and the customer would have unsubscribed
Then:
\[\text{EV}_{\text{send}} = (1 - p) \cdot V_{\text{correct send}} + p \cdot V_{\text{false send}}\] \[\text{EV}_{\text{no send}} = (1 - p) \cdot V_{\text{false no send}} + p \cdot V_{\text{correct no send}}\]✅ Decision Rule
Choose to send email to the customer if:
\[\text{EV}_{\text{send}} > \text{EV}_{\text{no send}}\]This ensures that each decision maximizes the expected return — rather than merely acting on probabilities, we’re optimizing for value.
def compute_expected_value(df, prob_col, business_values):
"""
Compute expected value for both action and inaction, vectorized over a DataFrame.
"""
p = df[prob_col] # probability of unsubscribe (P(class=1))
# Action: Send Email
ev_send = (1 - p) * business_values["correct_send"] + p * business_values["false_send"]
# Action: No Send Email
ev_no_send = (1 - p) * business_values["false_no_send"] + p * business_values["correct_no_send"]
# Decide whether to send email
should_send = ev_send > ev_no_send
# Add expected value columns to the dataframe
df = df.copy()
df["ev_send"] = ev_send
df["ev_no_send"] = ev_no_send
df["should_send"] = should_send
return df
6. Comparing Customer Personalization vs. Global Thresholding
There are two ways to operationalize inference decisions using the model’s predicted unsubscribe probabilities: a personalized expected value rule or a global classification threshold (e.g. 30%).
6.1 EV-based Personalization (per-user decision)
Each customer is evaluated individually by comparing the expected value of sending vs. not sending them emails. This method takes into account the customer’s predicted probability of unsubscribing and the assigned business value of each possible outcome.
Pros:
- Tailored decision for each customer
- Maximizes total expected business value since it maximizes EV on a customer level granularity
- Ideal when model probabilities are well-calibrated
Cons:
- Harder to explain or control (e.g., no fixed email send rate)
- Slightly more complex to deploy in production systems compared to global thresholding
- Less transparent in policy testing or experimentation setups
6.2 Global Thresholding (fixed policy)
A single probability threshold is chosen (e.g., 30%), above which customers are not sent emails. This creates a simple rule that applies equally to all customers.
Pros:
- Easy to communicate and implement
- Directly controls email send rate
- Suitable for A/B testing and policy evaluation
Cons:
- Less precise — treats customers near the threshold identically
- Ignores differences in expected value between customers
- Can underperform in value maximization compared to EV-based personalization
7. Policy Evaluation
After building a model to predict customer unsubscribe behavior, we want to evaluate how different decision strategies perform when applied in practice.
Each strategy (e.g., email everyone, email no one, or use a threshold) leads to different business outcomes and classification trade-offs. To choose the best approach, we simulate their performance using two complementary perspectives:
- Expected Business Value: How much value does the policy generate (or lose) based on predicted vs actual outcomes?
- Classification Quality: How accurately does the policy identify who should or shouldn’t receive emails?
7.1 Policy Comparison Results
Policy | Threshold | Total Expected Value | Avg EV per User | F1 |
---|---|---|---|---|
Best Threshold | 0.30 | $127,450 | $0.85 | 0.548 |
Email Everyone | 0.00 | $89,680 | $0.60 | 0.000 |
Fixed Threshold (0.50) | 0.50 | $89,230 | $0.59 | 0.556 |
Email No One | 1.00 | $12,580 | $0.08 | 0.506 |
The results show that the optimized threshold (30%) delivers the highest expected value, even outperforming the “email everyone” strategy while maintaining better precision and protecting sender reputation.
Conclusion
The Expected Value framework provides a powerful approach to optimize ML decision-making by:
- Aligning model outputs with business objectives rather than just statistical accuracy
- Accounting for asymmetric costs of different prediction errors
- Enabling personalized decision-making based on individual risk-reward profiles
- Providing interpretable business metrics for model evaluation
This framework is particularly valuable when prediction errors have different business impacts, making it essential for real-world ML applications where maximizing business value is more important than maximizing traditional classification metrics.
Key takeaway: It’s better to risk sending to a potentially unengaged customer than miss the opportunity to generate revenue from someone who would have converted from the email campaign.