ETL Transformations
Discover how our ETL pipeline transforms raw data into actionable insights using Informatica Cloud.
ETL Transformations
The heart of our customer analytics project lies in the ETL pipeline built using Informatica Cloud. Multiple transformations were applied to extract meaningful and accurate information from raw customer data. Each transformation stage contributed to creating a unified dataset suitable for downstream analytics.
1️⃣ Joiner Transformation
The Joiner transformation was used to merge CRM customer data and e-commerce purchases based on the customer_id. This created a comprehensive view of each customer with their purchase history.
- Type: Inner Join
- Fields: customer_id
- Output: Enriched customer profiles with purchase records
Simulated Output
| eco_order_id | eco_customer_id | eco_product_id | eco_amount | eco_purchase_date | eco_payment_method | eco_Total_amount | eco_last_purchase_date | cmr_customer_id | cmr_email | cmr_phone_number | cmr_created_date | cmr_country | cmr_Full_Name |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ORD009 | CUST001 | PROD08 | 217.82 | 25-05-2025 | Credit Card | 530.40 | 25-05-2025 | CUST001 | alice@example.com | 1234567890 | 01-01-2023 | USA | Jagannath Malode |
| ORD010 | CUST002 | PROD09 | 389.43 | 26-05-2025 | Credit Card | 389.43 | 26-05-2025 | CUST002 | bob@example.com | 1987654321 | 02-01-2023 | UK | Varun Gani |
| ORD007 | CUST003 | PROD06 | 259.03 | 23-05-2025 | Debit Card | 1012.80 | 23-05-2025 | CUST003 | charlie@example.com | 1472583690 | 03-01-2023 | Canada | Charlie Brown |
| ORD004 | CUST004 | PROD03 | 194.68 | 20-05-2025 | Credit Card | 669.64 | 20-05-2025 | CUST004 | david@example.com | 1122334455 | 04-01-2023 | USA | David Wilson |
| ORD008 | CUST005 | PROD07 | 355.88 | 24-05-2025 | PayPal | 355.88 | 24-05-2025 | CUST005 | eva@example.com | 1098765432 | 05-01-2023 | Australia | Eva Taylor |
2️⃣ Union Transformation
Support tickets and marketing contact data were combined using the Union transformation to create a single timeline of customer interactions.
- Sources: support_tickets.csv + marketing_contacts.csv
- Fields: customer_id, interaction_date, action_type
- Output: Unified customer interaction timeline
Simulated Output
| interaction_channel | interaction_type | customer_id | interaction_date |
|---|---|---|---|
| Marketing | CUST001 | 03-02-2024 | |
| Marketing | CUST003 | 04-02-2024 | |
| Login | Support | CUST005 | 03-01-2024 |
| Payment | Support | CUST005 | 04-01-2024 |
| Payment | Support | CUST005 | 05-01-2024 |
| SMS | Marketing | CUST002 | 01-02-2024 |
| SMS | Marketing | CUST002 | 05-02-2024 |
| Shipping | Support | CUST004 | 01-01-2024 |
| Shipping | Support | CUST005 | 02-01-2024 |
| Social Media | Marketing | CUST003 | 02-02-2024 |
3️⃣ Sorter Transformation
The combined interaction data was sorted using the Sorter transformation to arrange interactions in chronological order.
- Sort Keys: customer_id (ASC), interaction_date (DESC)
- Purpose: Enables sequential behavioral analysis
Simulated Output
| customer_id | phone_number | created_date | country | Full_Name | |
|---|---|---|---|---|---|
| CUST001 | alice@example.com | 1234567890 | 01-01-2023 | USA | Jagannath Malode |
| CUST002 | bob@example.com | 1987654321 | 02-01-2023 | UK | Varun Gani |
| CUST003 | charlie@example.com | 1472583690 | 03-01-2023 | Canada | Charlie Brown |
| CUST004 | david@example.com | 1122334455 | 04-01-2023 | USA | Abhinandan Onajol |
| CUST005 | eva@example.com | 1098765432 | 05-01-2023 | Australia | Eva Taylor |
4️⃣ Aggregator Transformation
To derive customer-level metrics, the Aggregator transformation was applied on the joined dataset.
- Metrics:
- SUM(amount) → Total Spent
- COUNT(order_id) → Purchase Count
- MAX(purchase_date) → Last Purchase Date
Simulated Output
| order_id | customer_id | product_id | amount | purchase_date | payment_method | Total_amount | last_purchase_date |
|---|---|---|---|---|---|---|---|
| ORD009 | CUST001 | PROD08 | 217.82 | 25-05-2025 | Credit Card | 530.40 | 25-05-2025 |
| ORD010 | CUST002 | PROD09 | 389.43 | 26-05-2025 | Credit Card | 389.43 | 26-05-2025 |
| ORD007 | CUST003 | PROD06 | 259.03 | 23-05-2025 | Debit Card | 1012.80 | 23-05-2025 |
| ORD004 | CUST004 | PROD03 | 194.68 | 20-05-2025 | Credit Card | 669.64 | 20-05-2025 |
| ORD008 | CUST005 | PROD07 | 355.88 | 24-05-2025 | PayPal | 355.88 | 24-05-2025 |
Final Output
The final output was a clean, unified, analytics-ready customer dataset containing demographics, purchases, support history, and marketing engagement data — all connected via customer_id.
| interaction_channel | interaction_type | customer_id | interaction_date | eco_order_id | eco_customer_id | eco_product_id | eco_amount | eco_purchase_date | eco_payment_method | eco_Total_amount | eco_last_purchase_date | cmr_customer_id | cmr_email | cmr_phone_number | cmr_created_date | cmr_country | cmr_Full_Name |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Marketing | CUST001 | 03-02-2024 | ORD009 | CUST001 | PROD08 | 217.82 | 25-05-2025 | Credit Card | 530.40 | 25-05-2025 | CUST001 | alice@example.com | 1234567890 | 01-01-2023 | USA | Jagannath Malode | |
| Marketing | CUST003 | 04-02-2024 | ORD007 | CUST003 | PROD06 | 259.03 | 23-05-2025 | Debit Card | 1012.80 | 23-05-2025 | CUST003 | charlie@example.com | 1472583690 | 03-01-2023 | Canada | Charlie Brown | |
| Login | Support | CUST005 | 03-01-2024 | ORD008 | CUST005 | PROD07 | 355.88 | 24-05-2025 | PayPal | 355.88 | 24-05-2025 | CUST005 | eva@example.com | 1098765432 | 05-01-2023 | Australia | Eva Taylor |