SQL Mastery for Data-Driven Insights: How I Turned Raw Data into Actionable Business Intelligence

Project Background: Why SQL Matters

In any data-driven organization, raw data means nothing until it’s turned into insights that drive growth.

At a fast-growing OTA, I worked with massive datasets across:

  • Marketing campaigns (Google Ads, Meta, Email)
  • Customer transactions & purchase history
  • Product performance (flights, hotels, packages)
  • User behavior (web & app analytics)

The challenge was extracting insights fast to support growth decisions. SQL became my main engine — transforming unstructured data into actionable insights before sending it to BI tools like Looker Studio or Power BI.

This project demonstrates a full SQL workflow: from raw exploration to advanced analytics, connecting technical mastery with real business impact.


Step 1: Data Exploration and Cleaning

🧩 Scenario:

Inspect customer, order, and campaign tables to check structure and missing data.

💻 SQL Query:

-- Inspect sample data
SELECT * FROM customers LIMIT 10;
SELECT * FROM orders LIMIT 10;
SELECT * FROM marketing_campaigns LIMIT 10;

-- Check for missing values
SELECT
    COUNT(*) AS total_rows,
    COUNT(email) AS email_present,
    COUNT(first_purchase_date) AS purchase_date_present
FROM customers;

📊 Outcome Table (sample):

total_rowsemail_presentpurchase_date_present
10,0009,9509,870

💡 Insight:

This identifies incomplete data that could break joins or aggregations later.


Step 2: Joining Tables for Holistic Analysis

🧩 Scenario:

Combine multiple datasets to find which campaigns bring high-value customers.

💻 SQL Query:

SELECT
    c.campaign_id,
    mc.campaign_name,
    COUNT(DISTINCT c.customer_id) AS new_customers,
    SUM(o.order_value) AS total_revenue
FROM customers c
JOIN orders o
    ON c.customer_id = o.customer_id
JOIN marketing_campaigns mc
    ON c.campaign_id = mc.campaign_id
WHERE c.first_purchase_date BETWEEN '2025-01-01' AND '2025-06-30'
GROUP BY c.campaign_id, mc.campaign_name
ORDER BY total_revenue DESC;

📊 Outcome Table (sample):

campaign_idcampaign_namenew_customerstotal_revenue
12“Spring Promo”1,20089,000
9“Google Ads – Europe”80075,500

💡 Insight:

Ranks campaigns by both revenue and acquisition impact.


Step 3: Cohort Analysis Using Window Functions

🧩 Scenario:

Track repeat purchase behavior by cohort.

💻 SQL Query:

WITH customer_cohorts AS (
    SELECT
        customer_id,
        DATE_TRUNC('month', first_purchase_date) AS cohort_month,
        order_id,
        order_date,
        order_value
    FROM orders
)
SELECT
    cohort_month,
    DATE_TRUNC('month', order_date) AS order_month,
    COUNT(DISTINCT customer_id) AS active_customers,
    SUM(order_value) AS revenue
FROM customer_cohorts
GROUP BY cohort_month, order_month
ORDER BY cohort_month, order_month;

📊 Outcome Table (sample):

cohort_monthorder_monthactive_customersrevenue
2025-012025-021,20045,000
2025-012025-0395038,000

💡 Insight:

Shows which cohorts have better retention or purchase frequency.


Step 4: Calculating Customer Lifetime Value (CLV)

🧩 Scenario:

Compute profit-based CLV using transaction history.

💻 SQL Query:

SELECT
    c.customer_id,
    SUM(o.order_value * 0.10) AS lifetime_profit
FROM customers c
JOIN orders o
    ON c.customer_id = o.customer_id
GROUP BY c.customer_id
ORDER BY lifetime_profit DESC
LIMIT 20;

📊 Outcome Table (sample):

customer_idlifetime_profit
10322820
10987790

💡 Insight:

Pinpoints high-value customers for loyalty programs or remarketing.


Step 5: Segmentation Using CASE

🧩 Scenario:

Segment users by lifetime profit value.

💻 SQL Query:

WITH customer_clv AS (
    SELECT
        customer_id,
        SUM(order_value * 0.10) AS lifetime_profit
    FROM orders
    GROUP BY customer_id
)
SELECT
    customer_id,
    lifetime_profit,
    CASE
        WHEN lifetime_profit >= 200 THEN 'High Value'
        WHEN lifetime_profit BETWEEN 100 AND 199 THEN 'Medium Value'
        ELSE 'Low Value'
    END AS segment
FROM customer_clv
ORDER BY lifetime_profit DESC;

📊 Outcome Table (sample):

customer_idlifetime_profitsegment
10322820High Value
10102120Medium Value

💡 Insight:

Supports campaign personalization and audience prioritization.


Step 6: Time-Series Analysis

🧩 Scenario:

Monitor campaign revenue trends over time.

💻 SQL Query:

SELECT
    mc.campaign_name,
    DATE_TRUNC('day', o.order_date) AS order_day,
    SUM(o.order_value) AS daily_revenue,
    COUNT(DISTINCT o.customer_id) AS daily_customers
FROM orders o
JOIN marketing_campaigns mc
    ON o.campaign_id = mc.campaign_id
WHERE o.order_date BETWEEN '2025-07-01' AND '2025-09-30'
GROUP BY mc.campaign_name, order_day
ORDER BY order_day, mc.campaign_name;

📊 Outcome Table (sample):

campaign_nameorder_daydaily_revenuedaily_customers
“Google Ads – Asia”2025-07-1512,300180
“Instagram Summer”2025-07-159,800145

💡 Insight:

Reveals daily patterns and anomalies to optimize spend timing.


Step 7: Funnel Analysis

🧩 Scenario:

Calculate campaign conversion rates from clicks to bookings.

💻 SQL Query:

WITH funnel AS (
    SELECT
        customer_id,
        campaign_id,
        MAX(CASE WHEN action = 'click' THEN 1 ELSE 0 END) AS clicked,
        MAX(CASE WHEN action = 'booking' THEN 1 ELSE 0 END) AS booked
    FROM campaign_events
    GROUP BY customer_id, campaign_id
)
SELECT
    campaign_id,
    SUM(booked) AS total_bookings,
    SUM(clicked) AS total_clicks,
    ROUND(SUM(booked)::decimal / SUM(clicked) * 100, 2) AS conversion_rate
FROM funnel
GROUP BY campaign_id
ORDER BY conversion_rate DESC;

📊 Outcome Table (sample):

campaign_idtotal_clickstotal_bookingsconversion_rate
98,0005206.5%
1212,0007806.2%

💡 Insight:

Shows which campaigns drive conversions most efficiently.


Step 8: CLV by Campaign (Advanced CTE Example)

🧩 Scenario:

Find average CLV for new users acquired from each campaign.

💻 SQL Query:

WITH new_users AS (
    SELECT
        c.customer_id,
        c.campaign_id,
        MIN(o.order_date) AS first_order_date
    FROM customers c
    JOIN orders o
        ON c.customer_id = o.customer_id
    WHERE o.order_date >= '2025-01-01'
    GROUP BY c.customer_id, c.campaign_id
),
clv_6_months AS (
    SELECT
        n.campaign_id,
        SUM(o.order_value * 0.10) AS profit_6_months
    FROM new_users n
    JOIN orders o
        ON n.customer_id = o.customer_id
    WHERE o.order_date BETWEEN n.first_order_date AND n.first_order_date + INTERVAL '6 months'
    GROUP BY n.campaign_id
)
SELECT
    mc.campaign_name,
    profit_6_months
FROM clv_6_months c
JOIN marketing_campaigns mc
    ON c.campaign_id = mc.campaign_id
ORDER BY profit_6_months DESC;

📊 Outcome Table (sample):

campaign_nameprofit_6_months
“Google Ads – New Users”18,700
“Meta Retargeting”12,300

💡 Insight:

Identifies campaigns bringing profitable new customers early.


Step 9: Query Optimization

💻 SQL Optimization Example:

WITH orders_summary AS (
    SELECT customer_id, SUM(order_value) AS total_value
    FROM orders
    GROUP BY customer_id
)
SELECT c.customer_id, os.total_value
FROM customers c
JOIN orders_summary os
    ON c.customer_id = os.customer_id;

💡 Tip: Always pre-aggregate large tables and avoid SELECT * for performance.


Step 10: Predicting Churn Using SQL Probability

🧩 Scenario:

Estimate customer churn probability based on inactivity.

💻 SQL Query:

WITH activity_gap AS (
    SELECT
        customer_id,
        MAX(order_date) AS last_order_date,
        CURRENT_DATE - MAX(order_date) AS days_since_last_order
    FROM orders
    GROUP BY customer_id
)
SELECT
    customer_id,
    days_since_last_order,
    CASE
        WHEN days_since_last_order > 180 THEN 'High Risk'
        WHEN days_since_last_order BETWEEN 90 AND 180 THEN 'Medium Risk'
        ELSE 'Low Risk'
    END AS churn_risk
FROM activity_gap
ORDER BY days_since_last_order DESC;

📊 Outcome Table (sample):

customer_iddays_since_last_orderchurn_risk
10401240High Risk
10110110Medium Risk

💡 Insight:

Helps target at-risk customers with reactivation campaigns.


Step 11: Multi-Dimensional Campaign Profitability

🧩 Scenario:

Combine acquisition cost, CLV, and retention rate to find true ROI per campaign.

💻 SQL Query:

WITH campaign_metrics AS (
    SELECT
        o.campaign_id,
        SUM(o.order_value * 0.10) AS total_profit,
        COUNT(DISTINCT o.customer_id) AS total_customers,
        AVG(DATE_PART('day', MAX(o.order_date) - MIN(o.order_date))) AS avg_retention_days
    FROM orders o
    GROUP BY o.campaign_id
),
campaign_costs AS (
    SELECT campaign_id, SUM(spend) AS total_cost
    FROM marketing_spend
    GROUP BY campaign_id
)
SELECT
    mc.campaign_name,
    cm.total_profit,
    cc.total_cost,
    ROUND((cm.total_profit - cc.total_cost) / cc.total_cost * 100, 2) AS roi_percentage,
    cm.avg_retention_days
FROM campaign_metrics cm
JOIN campaign_costs cc
    ON cm.campaign_id = cc.campaign_id
JOIN marketing_campaigns mc
    ON mc.campaign_id = cm.campaign_id
ORDER BY roi_percentage DESC;

📊 Outcome Table (sample):

campaign_nametotal_profittotal_costroi_percentageavg_retention_days
“Google Ads – New Users”21,50010,000115.0%142
“Meta Summer”9,8006,20058.0%110

💡 Insight:

Gives a complete profitability picture — not just revenue, but cost, retention, and ROI.


Final Takeaways

✅ SQL isn’t just for querying — it’s a strategic growth weapon.

✅ Advanced techniques (CTEs, window functions, CASE, joins) reveal hidden drivers of growth.

✅ This end-to-end project connects technical mastery with real marketing decisions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button