Back to projects

vector-rag-feature-store

Build a real-time vector database system that powers AI chatbots with up-to-date knowledge, achieving <100ms semantic search on millions of documents

Scala
0 stars
0 forks
View on GitHub

Project Overview

What You'll Build

A production-ready feature store that converts customer support tickets into searchable vectors in real-time. When users ask questions, your AI chatbot finds the most relevant past tickets instantly, providing accurate answers based on your company's actual support history.

Why This Project Matters

  • AI/LLM Skills: Vector databases are the backbone of modern AI applications
  • Real Business Value: Reduce support ticket resolution time by 50%
  • Hot Technology: RAG (Retrieval Augmented Generation) is how companies customize LLMs

Tech Stack Explained for Beginners

TechnologyWhat it isWhy you'll use it
DebeziumChange Data Capture toolStreams database changes in real-time as events
Apache FlinkStream processing engineDeduplicates and orders events before processing
FastAPIPython web frameworkHosts the embedding service that converts text to vectors
pgvectorPostgreSQL extension for vectorsStores and searches embeddings with SQL simplicity
PineconeManaged vector databaseAlternative for scaling beyond single server
OpenTelemetryDistributed tracingTracks request flow to find bottlenecks

Step-by-Step Build Plan

  1. Week 1: Set up PostgreSQL with pgvector and sample data
  2. Week 2: Build embedding service with sentence transformers
  3. Week 3: Implement CDC pipeline with Debezium
  4. Week 4: Create Flink job for deduplication and ordering
  5. Week 5: Optimize vector search performance
  6. Week 6: Add monitoring and observability
  7. Week 7: Build chatbot integration and testing

Detailed Requirements

Functional Requirements

  • Data Pipeline:
    • Capture inserts/updates from support ticket database
    • Handle schema changes gracefully
    • Process tickets within 30 seconds of creation
    • Support backfilling historical data
  • Embedding Generation:
    • Convert ticket text to 1024-dimension vectors
    • Use GPU acceleration for <50ms processing
    • Support multiple languages (English, Spanish)
    • Handle long documents (chunk if >512 tokens)
  • Vector Search:
    • Find top-10 similar tickets
    • Support hybrid search (vector + keyword)
    • Filter by date, category, status
    • Explain why matches were returned

Technical Requirements

  • Performance:
    • 95th percentile search latency < 100ms
    • Process 1000 tickets/second
    • Support 10M+ vectors in database
    • Concurrent searches: 100/second
  • Accuracy:
    • Embedding drift detection
    • A/B testing framework
    • Relevance scoring metrics
    • Human feedback loop
  • Operations:
    • Zero-downtime reindexing
    • Automated vector dimension upgrades
    • Cost monitoring (GPU usage)
    • Data privacy compliance

Prerequisites & Learning Path

  • Required: Python, basic ML concepts, PostgreSQL
  • Helpful: Understanding of embeddings and vector similarity
  • You'll Learn: Vector databases, LLM integration, real-time ML systems

Success Metrics

  • Achieve <100ms search latency at 1M vectors
  • Process live updates with <30s lag
  • Maintain 90% search relevance score
  • Handle 100 concurrent searches
  • Reduce chatbot response time by 60%

Technologies Used

FlinkpgvectorPineconeFastAPIOpenTelemetryGPU

Project Topics

#flink#pgvector#rag#llm

Ready to explore the code?

Dive deep into the implementation, check out the documentation, and feel free to contribute!

Open in GitHub →

Interested in this project?

Check out the source code, documentation, and feel free to contribute or use it in your own projects.