Back to projectsView on GitHub
iceberg-streaming-risk-engine
Build a real-time fraud detection system that analyzes credit card transactions as they happen, flagging suspicious activity in under 2 seconds
Scala
0 stars
0 forks
Flink
Kafka
Iceberg
Scala
Trino
Grafana
Project Overview
What You'll Build
A production-ready fraud detection system that monitors credit card transactions in real-time. When someone swipes their card at a store, your system will analyze the transaction instantly and flag potential fraud before the payment is approved. This saves banks money by preventing fraudulent charges.
Why This Project Matters
- Real Business Impact: Credit card fraud costs billions annually. Catching fraud in real-time (before authorization) can save 0.1-0.5% of transaction value
- Learn Stream Processing: Master Apache Flink, the same technology used by Uber, Netflix, and Alibaba
- Modern Data Architecture: Work with cutting-edge tools like Apache Iceberg for data versioning and time-travel queries
Tech Stack Explained for Beginners
Technology | What it is | Why you'll use it |
---|---|---|
Apache Flink | A powerful engine that processes data streams in real-time | Analyzes transactions as they happen, detecting patterns like "5 purchases in 5 different cities in 1 hour" |
Apache Kafka | A message queue that stores events in order | Receives credit card swipes from point-of-sale systems and ensures no transaction is lost |
Apache Iceberg | A modern table format that versions your data | Stores transaction history with "time travel" - you can query data as it looked yesterday |
Trino | A fast SQL query engine | Lets you join real-time transactions with historical data to calculate risk scores |
Grafana | Monitoring dashboard tool | Shows system performance metrics like "How fast are we processing transactions?" |
Terraform | Infrastructure-as-code tool | Deploys your entire system to AWS with one command |
Step-by-Step Build Plan
- Week 1: Set up development environment and create sample transaction data
- Week 2: Build Kafka pipeline to ingest transactions
- Week 3: Implement Flink rules to detect fraud patterns (velocity, geography)
- Week 4: Store results in Iceberg and build historical analysis
- Week 5: Add monitoring and performance testing
- Week 6: Deploy to cloud and run chaos tests
Detailed Requirements
Functional Requirements
- Fraud Detection Rules:
- Velocity check: Flag if >3 transactions in 5 minutes
- Geographic anomaly: Flag if transactions >500km apart within 1 hour
- Amount spike: Flag if transaction >3x average for that merchant category
- Data Processing:
- Handle 10,000 transactions per second
- Process each transaction in under 2 seconds (99th percentile)
- Store all transactions for 90 days with hourly partitions
- Analytics:
- Real-time dashboard showing fraud rate by region
- Historical reports on false positive rates
- Ability to replay past transactions for testing
Technical Requirements
- Performance:
- End-to-end latency < 2 seconds at 99th percentile
- Support 10,000 events/second sustained load
- Zero data loss (exactly-once processing guarantee)
- Reliability:
- Survive loss of any single component
- Automatic failover and recovery
- Pass chaos engineering tests (random pod kills)
- Observability:
- Trace every transaction through the system
- Alert if latency exceeds 2 seconds
- Monitor memory/CPU usage of all components
Prerequisites & Learning Path
- Required: Basic SQL and one programming language (Python/Java/Scala)
- Helpful: Understanding of event-driven systems
- You'll Learn: Stream processing, distributed systems, cloud deployment
Success Metrics
- Your system processes 1 million test transactions with <2s latency
- Correctly identifies 95% of fraudulent patterns
- Maintains <1% false positive rate
- Deploys successfully to Kubernetes
Technologies Used
FlinkKafkaIcebergScalaTrinoGrafanaTerraform
Project Topics
#flink#iceberg#fraud-detection#streaming
Ready to explore the code?
Dive deep into the implementation, check out the documentation, and feel free to contribute!
Open in GitHub →Interested in this project?
Check out the source code, documentation, and feel free to contribute or use it in your own projects.