Back to projects

iceberg-streaming-risk-engine

Build a real-time fraud detection system that analyzes credit card transactions as they happen, flagging suspicious activity in under 2 seconds

Scala
0 stars
0 forks
View on GitHub

Project Overview

What You'll Build

A production-ready fraud detection system that monitors credit card transactions in real-time. When someone swipes their card at a store, your system will analyze the transaction instantly and flag potential fraud before the payment is approved. This saves banks money by preventing fraudulent charges.

Why This Project Matters

  • Real Business Impact: Credit card fraud costs billions annually. Catching fraud in real-time (before authorization) can save 0.1-0.5% of transaction value
  • Learn Stream Processing: Master Apache Flink, the same technology used by Uber, Netflix, and Alibaba
  • Modern Data Architecture: Work with cutting-edge tools like Apache Iceberg for data versioning and time-travel queries

Tech Stack Explained for Beginners

TechnologyWhat it isWhy you'll use it
Apache FlinkA powerful engine that processes data streams in real-timeAnalyzes transactions as they happen, detecting patterns like "5 purchases in 5 different cities in 1 hour"
Apache KafkaA message queue that stores events in orderReceives credit card swipes from point-of-sale systems and ensures no transaction is lost
Apache IcebergA modern table format that versions your dataStores transaction history with "time travel" - you can query data as it looked yesterday
TrinoA fast SQL query engineLets you join real-time transactions with historical data to calculate risk scores
GrafanaMonitoring dashboard toolShows system performance metrics like "How fast are we processing transactions?"
TerraformInfrastructure-as-code toolDeploys your entire system to AWS with one command

Step-by-Step Build Plan

  1. Week 1: Set up development environment and create sample transaction data
  2. Week 2: Build Kafka pipeline to ingest transactions
  3. Week 3: Implement Flink rules to detect fraud patterns (velocity, geography)
  4. Week 4: Store results in Iceberg and build historical analysis
  5. Week 5: Add monitoring and performance testing
  6. Week 6: Deploy to cloud and run chaos tests

Detailed Requirements

Functional Requirements

  • Fraud Detection Rules:
    • Velocity check: Flag if >3 transactions in 5 minutes
    • Geographic anomaly: Flag if transactions >500km apart within 1 hour
    • Amount spike: Flag if transaction >3x average for that merchant category
  • Data Processing:
    • Handle 10,000 transactions per second
    • Process each transaction in under 2 seconds (99th percentile)
    • Store all transactions for 90 days with hourly partitions
  • Analytics:
    • Real-time dashboard showing fraud rate by region
    • Historical reports on false positive rates
    • Ability to replay past transactions for testing

Technical Requirements

  • Performance:
    • End-to-end latency < 2 seconds at 99th percentile
    • Support 10,000 events/second sustained load
    • Zero data loss (exactly-once processing guarantee)
  • Reliability:
    • Survive loss of any single component
    • Automatic failover and recovery
    • Pass chaos engineering tests (random pod kills)
  • Observability:
    • Trace every transaction through the system
    • Alert if latency exceeds 2 seconds
    • Monitor memory/CPU usage of all components

Prerequisites & Learning Path

  • Required: Basic SQL and one programming language (Python/Java/Scala)
  • Helpful: Understanding of event-driven systems
  • You'll Learn: Stream processing, distributed systems, cloud deployment

Success Metrics

  • Your system processes 1 million test transactions with <2s latency
  • Correctly identifies 95% of fraudulent patterns
  • Maintains <1% false positive rate
  • Deploys successfully to Kubernetes

Technologies Used

FlinkKafkaIcebergScalaTrinoGrafanaTerraform

Project Topics

#flink#iceberg#fraud-detection#streaming

Ready to explore the code?

Dive deep into the implementation, check out the documentation, and feel free to contribute!

Open in GitHub →

Interested in this project?

Check out the source code, documentation, and feel free to contribute or use it in your own projects.