Back to projectsView on GitHub
vector-rag-feature-store
Build a real-time vector database system that powers AI chatbots with up-to-date knowledge, achieving <100ms semantic search on millions of documents
Scala
0 stars
0 forks
Flink
pgvector
Pinecone
FastAPI
OpenTelemetry
GPU
Project Overview
What You'll Build
A production-ready feature store that converts customer support tickets into searchable vectors in real-time. When users ask questions, your AI chatbot finds the most relevant past tickets instantly, providing accurate answers based on your company's actual support history.
Why This Project Matters
- AI/LLM Skills: Vector databases are the backbone of modern AI applications
- Real Business Value: Reduce support ticket resolution time by 50%
- Hot Technology: RAG (Retrieval Augmented Generation) is how companies customize LLMs
Tech Stack Explained for Beginners
Technology | What it is | Why you'll use it |
---|---|---|
Debezium | Change Data Capture tool | Streams database changes in real-time as events |
Apache Flink | Stream processing engine | Deduplicates and orders events before processing |
FastAPI | Python web framework | Hosts the embedding service that converts text to vectors |
pgvector | PostgreSQL extension for vectors | Stores and searches embeddings with SQL simplicity |
Pinecone | Managed vector database | Alternative for scaling beyond single server |
OpenTelemetry | Distributed tracing | Tracks request flow to find bottlenecks |
Step-by-Step Build Plan
- Week 1: Set up PostgreSQL with pgvector and sample data
- Week 2: Build embedding service with sentence transformers
- Week 3: Implement CDC pipeline with Debezium
- Week 4: Create Flink job for deduplication and ordering
- Week 5: Optimize vector search performance
- Week 6: Add monitoring and observability
- Week 7: Build chatbot integration and testing
Detailed Requirements
Functional Requirements
- Data Pipeline:
- Capture inserts/updates from support ticket database
- Handle schema changes gracefully
- Process tickets within 30 seconds of creation
- Support backfilling historical data
- Embedding Generation:
- Convert ticket text to 1024-dimension vectors
- Use GPU acceleration for <50ms processing
- Support multiple languages (English, Spanish)
- Handle long documents (chunk if >512 tokens)
- Vector Search:
- Find top-10 similar tickets
- Support hybrid search (vector + keyword)
- Filter by date, category, status
- Explain why matches were returned
Technical Requirements
- Performance:
- 95th percentile search latency < 100ms
- Process 1000 tickets/second
- Support 10M+ vectors in database
- Concurrent searches: 100/second
- Accuracy:
- Embedding drift detection
- A/B testing framework
- Relevance scoring metrics
- Human feedback loop
- Operations:
- Zero-downtime reindexing
- Automated vector dimension upgrades
- Cost monitoring (GPU usage)
- Data privacy compliance
Prerequisites & Learning Path
- Required: Python, basic ML concepts, PostgreSQL
- Helpful: Understanding of embeddings and vector similarity
- You'll Learn: Vector databases, LLM integration, real-time ML systems
Success Metrics
- Achieve <100ms search latency at 1M vectors
- Process live updates with <30s lag
- Maintain 90% search relevance score
- Handle 100 concurrent searches
- Reduce chatbot response time by 60%
Technologies Used
FlinkpgvectorPineconeFastAPIOpenTelemetryGPU
Project Topics
#flink#pgvector#rag#llm
Ready to explore the code?
Dive deep into the implementation, check out the documentation, and feel free to contribute!
Open in GitHub →Interested in this project?
Check out the source code, documentation, and feel free to contribute or use it in your own projects.