
Project Background: Why SQL Matters
In any data-driven organization, raw data means nothing until it’s turned into insights that drive growth.
At a fast-growing OTA, I worked with massive datasets across:
- Marketing campaigns (Google Ads, Meta, Email)
- Customer transactions & purchase history
- Product performance (flights, hotels, packages)
- User behavior (web & app analytics)
The challenge was extracting insights fast to support growth decisions. SQL became my main engine — transforming unstructured data into actionable insights before sending it to BI tools like Looker Studio or Power BI.
This project demonstrates a full SQL workflow: from raw exploration to advanced analytics, connecting technical mastery with real business impact.
Step 1: Data Exploration and Cleaning
🧩 Scenario:
Inspect customer, order, and campaign tables to check structure and missing data.
💻 SQL Query:
-- Inspect sample data
SELECT * FROM customers LIMIT 10;
SELECT * FROM orders LIMIT 10;
SELECT * FROM marketing_campaigns LIMIT 10;
-- Check for missing values
SELECT
COUNT(*) AS total_rows,
COUNT(email) AS email_present,
COUNT(first_purchase_date) AS purchase_date_present
FROM customers;📊 Outcome Table (sample):
| total_rows | email_present | purchase_date_present |
|---|---|---|
| 10,000 | 9,950 | 9,870 |
💡 Insight:
This identifies incomplete data that could break joins or aggregations later.
Step 2: Joining Tables for Holistic Analysis
🧩 Scenario:
Combine multiple datasets to find which campaigns bring high-value customers.
💻 SQL Query:
SELECT
c.campaign_id,
mc.campaign_name,
COUNT(DISTINCT c.customer_id) AS new_customers,
SUM(o.order_value) AS total_revenue
FROM customers c
JOIN orders o
ON c.customer_id = o.customer_id
JOIN marketing_campaigns mc
ON c.campaign_id = mc.campaign_id
WHERE c.first_purchase_date BETWEEN '2025-01-01' AND '2025-06-30'
GROUP BY c.campaign_id, mc.campaign_name
ORDER BY total_revenue DESC;📊 Outcome Table (sample):
| campaign_id | campaign_name | new_customers | total_revenue |
|---|---|---|---|
| 12 | “Spring Promo” | 1,200 | 89,000 |
| 9 | “Google Ads – Europe” | 800 | 75,500 |
💡 Insight:
Ranks campaigns by both revenue and acquisition impact.
Step 3: Cohort Analysis Using Window Functions
🧩 Scenario:
Track repeat purchase behavior by cohort.
💻 SQL Query:
WITH customer_cohorts AS (
SELECT
customer_id,
DATE_TRUNC('month', first_purchase_date) AS cohort_month,
order_id,
order_date,
order_value
FROM orders
)
SELECT
cohort_month,
DATE_TRUNC('month', order_date) AS order_month,
COUNT(DISTINCT customer_id) AS active_customers,
SUM(order_value) AS revenue
FROM customer_cohorts
GROUP BY cohort_month, order_month
ORDER BY cohort_month, order_month;📊 Outcome Table (sample):
| cohort_month | order_month | active_customers | revenue |
|---|---|---|---|
| 2025-01 | 2025-02 | 1,200 | 45,000 |
| 2025-01 | 2025-03 | 950 | 38,000 |
💡 Insight:
Shows which cohorts have better retention or purchase frequency.
Step 4: Calculating Customer Lifetime Value (CLV)
🧩 Scenario:
Compute profit-based CLV using transaction history.
💻 SQL Query:
SELECT
c.customer_id,
SUM(o.order_value * 0.10) AS lifetime_profit
FROM customers c
JOIN orders o
ON c.customer_id = o.customer_id
GROUP BY c.customer_id
ORDER BY lifetime_profit DESC
LIMIT 20;📊 Outcome Table (sample):
| customer_id | lifetime_profit |
|---|---|
| 10322 | 820 |
| 10987 | 790 |
💡 Insight:
Pinpoints high-value customers for loyalty programs or remarketing.
Step 5: Segmentation Using CASE
🧩 Scenario:
Segment users by lifetime profit value.
💻 SQL Query:
WITH customer_clv AS (
SELECT
customer_id,
SUM(order_value * 0.10) AS lifetime_profit
FROM orders
GROUP BY customer_id
)
SELECT
customer_id,
lifetime_profit,
CASE
WHEN lifetime_profit >= 200 THEN 'High Value'
WHEN lifetime_profit BETWEEN 100 AND 199 THEN 'Medium Value'
ELSE 'Low Value'
END AS segment
FROM customer_clv
ORDER BY lifetime_profit DESC;📊 Outcome Table (sample):
| customer_id | lifetime_profit | segment |
|---|---|---|
| 10322 | 820 | High Value |
| 10102 | 120 | Medium Value |
💡 Insight:
Supports campaign personalization and audience prioritization.
Step 6: Time-Series Analysis
🧩 Scenario:
Monitor campaign revenue trends over time.
💻 SQL Query:
SELECT
mc.campaign_name,
DATE_TRUNC('day', o.order_date) AS order_day,
SUM(o.order_value) AS daily_revenue,
COUNT(DISTINCT o.customer_id) AS daily_customers
FROM orders o
JOIN marketing_campaigns mc
ON o.campaign_id = mc.campaign_id
WHERE o.order_date BETWEEN '2025-07-01' AND '2025-09-30'
GROUP BY mc.campaign_name, order_day
ORDER BY order_day, mc.campaign_name;📊 Outcome Table (sample):
| campaign_name | order_day | daily_revenue | daily_customers |
|---|---|---|---|
| “Google Ads – Asia” | 2025-07-15 | 12,300 | 180 |
| “Instagram Summer” | 2025-07-15 | 9,800 | 145 |
💡 Insight:
Reveals daily patterns and anomalies to optimize spend timing.
Step 7: Funnel Analysis
🧩 Scenario:
Calculate campaign conversion rates from clicks to bookings.
💻 SQL Query:
WITH funnel AS (
SELECT
customer_id,
campaign_id,
MAX(CASE WHEN action = 'click' THEN 1 ELSE 0 END) AS clicked,
MAX(CASE WHEN action = 'booking' THEN 1 ELSE 0 END) AS booked
FROM campaign_events
GROUP BY customer_id, campaign_id
)
SELECT
campaign_id,
SUM(booked) AS total_bookings,
SUM(clicked) AS total_clicks,
ROUND(SUM(booked)::decimal / SUM(clicked) * 100, 2) AS conversion_rate
FROM funnel
GROUP BY campaign_id
ORDER BY conversion_rate DESC;📊 Outcome Table (sample):
| campaign_id | total_clicks | total_bookings | conversion_rate |
|---|---|---|---|
| 9 | 8,000 | 520 | 6.5% |
| 12 | 12,000 | 780 | 6.2% |
💡 Insight:
Shows which campaigns drive conversions most efficiently.
Step 8: CLV by Campaign (Advanced CTE Example)
🧩 Scenario:
Find average CLV for new users acquired from each campaign.
💻 SQL Query:
WITH new_users AS (
SELECT
c.customer_id,
c.campaign_id,
MIN(o.order_date) AS first_order_date
FROM customers c
JOIN orders o
ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01'
GROUP BY c.customer_id, c.campaign_id
),
clv_6_months AS (
SELECT
n.campaign_id,
SUM(o.order_value * 0.10) AS profit_6_months
FROM new_users n
JOIN orders o
ON n.customer_id = o.customer_id
WHERE o.order_date BETWEEN n.first_order_date AND n.first_order_date + INTERVAL '6 months'
GROUP BY n.campaign_id
)
SELECT
mc.campaign_name,
profit_6_months
FROM clv_6_months c
JOIN marketing_campaigns mc
ON c.campaign_id = mc.campaign_id
ORDER BY profit_6_months DESC;📊 Outcome Table (sample):
| campaign_name | profit_6_months |
|---|---|
| “Google Ads – New Users” | 18,700 |
| “Meta Retargeting” | 12,300 |
💡 Insight:
Identifies campaigns bringing profitable new customers early.
Step 9: Query Optimization
💻 SQL Optimization Example:
WITH orders_summary AS (
SELECT customer_id, SUM(order_value) AS total_value
FROM orders
GROUP BY customer_id
)
SELECT c.customer_id, os.total_value
FROM customers c
JOIN orders_summary os
ON c.customer_id = os.customer_id;💡 Tip: Always pre-aggregate large tables and avoid SELECT * for performance.
Step 10: Predicting Churn Using SQL Probability
🧩 Scenario:
Estimate customer churn probability based on inactivity.
💻 SQL Query:
WITH activity_gap AS (
SELECT
customer_id,
MAX(order_date) AS last_order_date,
CURRENT_DATE - MAX(order_date) AS days_since_last_order
FROM orders
GROUP BY customer_id
)
SELECT
customer_id,
days_since_last_order,
CASE
WHEN days_since_last_order > 180 THEN 'High Risk'
WHEN days_since_last_order BETWEEN 90 AND 180 THEN 'Medium Risk'
ELSE 'Low Risk'
END AS churn_risk
FROM activity_gap
ORDER BY days_since_last_order DESC;📊 Outcome Table (sample):
| customer_id | days_since_last_order | churn_risk |
|---|---|---|
| 10401 | 240 | High Risk |
| 10110 | 110 | Medium Risk |
💡 Insight:
Helps target at-risk customers with reactivation campaigns.
Step 11: Multi-Dimensional Campaign Profitability
🧩 Scenario:
Combine acquisition cost, CLV, and retention rate to find true ROI per campaign.
💻 SQL Query:
WITH campaign_metrics AS (
SELECT
o.campaign_id,
SUM(o.order_value * 0.10) AS total_profit,
COUNT(DISTINCT o.customer_id) AS total_customers,
AVG(DATE_PART('day', MAX(o.order_date) - MIN(o.order_date))) AS avg_retention_days
FROM orders o
GROUP BY o.campaign_id
),
campaign_costs AS (
SELECT campaign_id, SUM(spend) AS total_cost
FROM marketing_spend
GROUP BY campaign_id
)
SELECT
mc.campaign_name,
cm.total_profit,
cc.total_cost,
ROUND((cm.total_profit - cc.total_cost) / cc.total_cost * 100, 2) AS roi_percentage,
cm.avg_retention_days
FROM campaign_metrics cm
JOIN campaign_costs cc
ON cm.campaign_id = cc.campaign_id
JOIN marketing_campaigns mc
ON mc.campaign_id = cm.campaign_id
ORDER BY roi_percentage DESC;📊 Outcome Table (sample):
| campaign_name | total_profit | total_cost | roi_percentage | avg_retention_days |
|---|---|---|---|---|
| “Google Ads – New Users” | 21,500 | 10,000 | 115.0% | 142 |
| “Meta Summer” | 9,800 | 6,200 | 58.0% | 110 |
💡 Insight:
Gives a complete profitability picture — not just revenue, but cost, retention, and ROI.
Final Takeaways
✅ SQL isn’t just for querying — it’s a strategic growth weapon.
✅ Advanced techniques (CTEs, window functions, CASE, joins) reveal hidden drivers of growth.
✅ This end-to-end project connects technical mastery with real marketing decisions.





