Other AI & Machine Learning Case Studies

NEW ON MY PORTFOLIO

The following case studies highlight standalone machine learning projects, demonstrating data science and AI applications completely separate from the Rewura portfolio.

LLM based projects

Job-Specific CV Generator

A custom CV generator tailored to job descriptions based on your work experience.


Machine Learning

Employee Turnover Analytics Dashboard

A comprehensive analysis of employee attrition to help HR departments proactively identify flight risks and understand key drivers of turnover.

Cohorts of Songs Recommendation System

A machine learning approach for analyzing Spotify song data to build cohort-based recommendations and explore musical feature patterns.


Deep Learning

Coming soon ...

AI • RAG • AWS • React • Spring Security • Langgraph • MCP

AI Based Portfolio Management

This portfolio project showcases an AI-powered portfolio management application designed to enhance investment decision-making through

  • Continuous news ingestion
  • RAG-based conversational interface
  • Integrated web search
  • MCP: Technical indicators by Symbol
A live deployment of the application is available via the link below.

AWS architecture

The platform is deployed on AWS with clear separation across edge routing, API access, application services, data persistence, observability, and management tooling.

AWS architecture diagram

Core services

  • Authorization Service for authentication, validation, and secure access control.
  • Portfolio ServiceIt is a cost-basis portfolio tracking system that maintains a full history of buy and sell transactions and calculates realized and unrealized P&L per position.
  • LangGraph Service A LangGraph-driven system responsible for retrieval, document scoring, and orchestration of reasoning workflows, with real-time response streaming using SSE, enabling word-by-word output for improved responsiveness.
  • News Ingestion Implements periodic ingestion and chunking of crypto news data, generates embeddings, and persists them in a Pinecone vector database for efficient retrieval.
  • MCP Indicator Service MCP Indicator Service that computes technical indicators and generates final buy/sell signals based on portfolio positions.
Route 53 CloudFront API Gateway RDS Pinecone

Frontend experience

A modern React-based UI provides a conversational interface, portfolio views, and explainable AI responses tailored for investment workflows.

React Responsive UI Chat Experience

Backend foundation

Python services and Spring Security based authorization separate business capabilities cleanly while enforcing secure access boundaries.

Python Spring Security Service Design

AI decision pipeline

LangGraph drives retrieval, scoring, reasoning, and generation so the system can select the most fitting information before answering.

LangGraph RAG Reasoning Flow

Deployment with Terraform on AWS

Infrastructure is defined as code using Terraform, allowing reproducible environments, controlled changes, and clear separation between application evolution and cloud provisioning.

Infrastructure as code

Terraform provisions edge routing, API components, service infrastructure, database resources, and monitoring primitives in a repeatable way.

Operational governance

CloudTrail, CloudWatch, and Trusted Advisor support compliance visibility, cost awareness, and production health review.

Scalable platform delivery

AWS-native deployment patterns allow the project to evolve from portfolio showcase to production-grade application architecture.

Periodic ingestion and vector freshness

The ingestion service uses the CryptoNews API to fetch the latest market news every 15 minutes. Incoming articles are chunked, enriched with metadata, and written into Pinecone for retrieval by the LangGraph workflow.

Source
Crypto news feed integration via cryptonews-api.com.
Cadence
Polling runs every 15 minutes to keep the knowledge base aligned with current market narratives.
Chunking
Articles are split into retrieval-friendly segments before embedding, improving answer precision and evidence coverage.
Retention
Documents older than 1 week are removed from Pinecone using upload timestamp metadata, reducing stale financial context.
Outcome
The RAG pipeline stays relevant, fast, and focused on fresh market signals rather than outdated sentiment.

LangGraph reasoning flow

The platform uses a retrieval-and-reasoning graph to normalize user requests, gather candidate documents, score relevance, and decide whether generation should continue or fall back to web search.

LangGraph workflow diagram

Flow summary

  • normalize_input standardizes the prompt and prepares it for retrieval.
  • retrieve fetches candidate documents from the vector database.
  • grade_documents scores relevance and filters weak context.
  • generate produces the answer when support is sufficient.
  • websearch is used when retrieved material is not useful or support is incomplete.
  • end is reached when the answer is considered useful and supported.
Normalize Retrieve Score Generate Fallback Search

MCP Server: AI-Powered Technical Analysis with Linear Regression

The MCP Indicator Service uses scikit-learn's Linear Regression combined with traditional technical indicators (SMA, RSI, MACD) to generate weighted buy/sell signals. The system fetches 5 years of historical data, builds a regression model to detect long-term trends, and scores assets on a 0-100 scale.

Regression Model & Technical Indicators

  • Linear Regression Model: Fits a scikit-learn LinearRegression on 5 years of closing prices to identify the overall growth trajectory (slope) and predicted price.
  • SMA (50 & 200 day): Simple Moving Averages detect short-term and macro-term trends. Golden Cross (SMA50 > SMA200) signals bullish momentum.
  • RSI (14-day): Relative Strength Index measures momentum. RSI < 30 indicates oversold (buy signal), RSI> 70 indicates overbought (sell signal).
  • MACD: Moving Average Convergence Divergence identifies momentum shifts. MACD > Signal line suggests buy pressure.
  • Weighted Scoring Engine: Combines all indicators with weights (+/- points) to produce a final 0-100 score determining action: Strong Sell (0-20), Sell (20-40), Hold (40-60), Buy (60-80), Strong Buy (80-100).
scikit-learn Linear Regression SMA/RSI/MACD Weighted Scoring
{
  "ticker": "BTC-USD",
  "metrics": {
    "current_trade_price": 29.65,
    "predicted_linear_price": 32.32,
    "sma_50": 31.78,
    "sma_200": 43.12,
    "rsi_14_day": 39.21,
    "macd": -0.56,
    "macd_signal": -0.51,
    "regression_trend_slope": -0.1018
  },
  "analysis": {
    "macro_trend_context": "BEARISH",
    "buy_sell_signal_percentage": 20,
    "recommended_target_action": "🔴 DISTRIBUTE / SELL"
  }
}

ML Model Lifecycle: Continuous Monitoring & Refinement

Machine learning models require ongoing validation and retraining to remain accurate. The indicator service demonstrates a production ML workflow where models are continuously monitored for drift, validated against new data, tuned with updated parameters, and redeployed.

ML Model Lifecycle: Continuous Improvement 1. DATA COLLECTION Fetch 5y historical data from Yahoo Finance (yfinance API) 2. MODEL TRAINING Fit LinearRegression Compute indicators (SMA, RSI, MACD) 3. VALIDATION Score accuracy check Backtesting signals (Historical accuracy) 4. DEPLOYMENT MCP Server (FastMCP) App Runner (AWS) SSE endpoint /sse 5. MONITORING Track prediction drift CloudWatch metrics (Accuracy degradation) 6. MODEL TUNING Adjust weights Update parameters (RSI thresholds, etc.) Continuous Cycle Automated Retraining Triggered by drift detection Key ML Production Principles Data Freshness: Models retrained with latest 5y data to adapt to market regime changes Drift Detection: Monitor accuracy degradation when predictions deviate from actual prices Backtesting: Validate signal accuracy against historical data before deployment A/B Testing: Compare model versions (e.g., weight adjustments) to optimize signal quality Automated Pipelines: CI/CD triggers retraining and redeployment when code or data changes

Why Continuous Monitoring Matters

Financial markets evolve constantly. A model trained on 2020-2025 data may perform poorly in 2026 due to regime changes, new regulations, or macro shifts. Continuous monitoring detects when predictions diverge from reality, triggering retraining.

Validation & Backtesting

Before deploying a new model version, backtesting ensures signals would have produced profitable outcomes historically. This prevents deploying models that overfit recent noise rather than capturing true trends.

Automated Retraining Pipeline

When drift is detected (e.g., prediction error exceeds threshold for 7 consecutive days), the system automatically fetches fresh data, retrains the model, validates accuracy, and redeploys via App Runner with zero downtime.

Contract-First Development with CI/CD Automation

The project follows a contract-first approach where OpenAPI specifications are the source of truth. Changes to contracts trigger automated generation, publishing, and deployment across all services.

1. OPENAPI SPECIFICATION CHANGE Developer updates openapi/*.yaml → Push to GitHub 2. GITHUB ACTIONS WORKFLOW (build-and-promote.yaml) • Detect changed services (changed-services.sh) • Generate stubs: Python (FastAPI) + TypeScript (fetch clients) 3A. PYTHON STUBS generated// FastAPI server stubs 3B. TYPESCRIPT CLIENTS generated/typescript/ / Fetch-based clients 4. PUBLISH TO AWS CODEARTIFACT Version: 1.0.0+. → Python wheels + NPM packages 5A. SERVER INTEGRATION Services import Python stubs 5B. CLIENT INTEGRATION Frontend imports TS clients

Automated Contract Pipeline

  • Source of Truth: OpenAPI YAML specs in openapi/ directory define all API contracts.
  • Change Detection: GitHub Actions detect which services changed and trigger builds only for affected services.
  • Code Generation: generate_stubs.sh uses OpenAPI Generator to create Python FastAPI stubs and TypeScript fetch clients.
  • Versioning: Git hash + timestamp ensures every build has a unique, traceable version.
  • Publishing: publish_contracts.sh authenticates with AWS CodeArtifact and publishes both Python wheels and NPM packages.
  • Consumption: Services import from CodeArtifact during Docker builds using UV_INDEX_REWURA_PASSWORD.
OpenAPI GitHub Actions AWS CodeArtifact Type Safety

Security Architecture: OAuth2 + JWT Flow

The authorization service implements OAuth2 with support for Google and Facebook providers, issuing JWTs that downstream services validate to enforce resource-level access control.

User / Browser Authorization Service (Spring Security) OAuth Provider (Google/Facebook) 1. GET /oauth2/authorize/google 2. Redirect to OAuth provider User authenticates & grants consent 3. Callback /oauth2/callback/* (authorization code) OAuth2UserService Exchange code for token Fetch user info User Creation Create or update user in DB JWT Generation Sign token with secret Payload: {sub, email, roles} 4. Redirect with JWT token Store JWT (localStorage/cookie) Portfolio Service (Python/FastAPI) 5. API Request Authorization: Bearer JWTAuthMiddleware 1. Validate with auth service 2. Decode payload (user_id) 3. Check resource ownership 6. Protected resource data

OAuth2 + JWT Security Flow

  • OAuth2 Authorization: Spring Security handles Google/Facebook OAuth2 flows with custom user services.
  • Token Generation: After successful OAuth, the auth service issues a signed JWT containing user ID, email, and roles.
  • Token Distribution: Frontend stores JWT and includes it in Authorization: Bearer header for all API calls.
  • Token Validation: Downstream services (Portfolio, LangGraph) validate JWT by calling auth service's /api/auth/check_token endpoint.
  • Resource-Level Authorization: Services decode JWT payload to extract user_id and verify resource ownership before granting access.
  • Stateless Sessions: SessionCreationPolicy.STATELESS ensures scalability without server-side session storage.
Spring Security OAuth2 JWT Resource Authorization

Security, validation, error handling, and PROD support

User stories are connected to secure authorization, validation, rate limiting, consistent error contracts, and observability requirements. The goal is not only correctness, but also operability in production.

Quality requirements

  • Spring Security based authorization with resource-level access checks.
  • Clear validation behavior for malformed requests and missing permissions.
  • Rate limiting to protect APIs and AI workloads from abuse.
  • Structured error payloads for support teams, dashboards, and alerting.
  • Trace IDs for cross-service debugging in distributed environments.
  • Monitoring dashboards for PROD support using CloudWatch metrics and logs.
{
  "type": "security",
  "title": "Unauthorized access to a resource",
  "status": 401,
  "detail": "The user: cf5c709c-c68e-405b-852b-19b6f7fc1bc3 has no access to the resource: 1f95abbb-127f-4f0e-85df-681665f9849b",
  "instance": "/api/portfolios/v1/detail/1f95abbb-127f-4f0e-85df-681665f9849b",
  "service": "portfolio service",
  "trace_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "priority": "HIGH",
  "errorCode": "ERR-PORT-021",
  "context": {
    "userId": "cf5c709c-c68e-405b-852b-19b6f7fc1bc3",
    "resource_type": "portfolio",
    "resource_id": "1f95abbb-127f-4f0e-85df-681665f9849b"
  }
}

Contract-Based Error Handling and Production Monitoring

All services use a centralized error code registry that maps business errors to structured payloads with error codes, priorities, trace IDs, and i18n support. This enables clear edge case handling during story writing and production debugging.

Error Code Architecture

  • Planning Errors Before They Happen: When writing user stories, developers identify and catalog potential failure scenarios upfront. Each error gets a unique code, making it easier to discuss edge cases during design.
  • Consistent Error Experience: Every error across all services follows the same structure with a code, priority level, and human-readable message. Users and support teams always know what went wrong and how urgent it is.
  • Production Detective Work: When something breaks in production, error codes act like fingerprints. Operations teams can search logs for specific codes, filter by priority, and trace errors across multiple services using trace IDs.
  • Speaking Multiple Languages: Error messages automatically adapt to the user's language preference, providing clear explanations in English, German, and other languages without code changes.
  • Building Dashboards and Alerts: Because every error has a priority and code, it's simple to create CloudWatch dashboards showing high-priority errors or set up alerts when critical failures occur.
  • Better Than Stack Traces: Instead of cryptic technical errors, the system returns structured, meaningful information with context about what the user was trying to do and why it failed.
RFC 7807 Error Codes Trace IDs i18n
{
  "error": {
    "type": "https://api.rewura.com/errors/por-unauthenticated",
    "title": "Unauthorized Access",
    "status": 401,
    "detail": "The user cf5c709c-c68e-405b-852b-19b6f7fc1bc3 has no access to resource 1f95abbb-127f-4f0e-85df-681665f9849b",
    "instance": "/api/portfolios/v1/detail/1f95abbb-127f-4f0e-85df-681665f9849b",
    "service": "portfolio-service",
    "trace_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "priority": "HIGH",
    "errorCode": "POR_001",
    "context": {
      "userId": "cf5c709c-c68e-405b-852b-19b6f7fc1bc3",
      "resource_type": "portfolio",
      "resource_id": "1f95abbb-127f-4f0e-85df-681665f9849b"
    }
  }
}

Error Code Examples from Registry

POR_UNAUTHENTICATED
HTTP 401 | Invalid or expired JWT token | Priority: HIGH
BAC_001
HTTP 502 | Bedrock AI model unavailable | Priority: HIGH
ING_001
HTTP 409 | Ingestion job already running | Priority: LOW
ING_002
HTTP 500 | CryptoNews API fetch failed | Priority: HIGH
BAC_002
HTTP 500 | Conversation persistence failed | Priority: MEDIUM