Case Study

FinSight AI

Short-term stock-movement classification fusing market indicators, financial-news sentiment, and document retrieval.

Python
PyTorch
FinBERT
Transformers
RAG
scikit-learn

Metric shown
SHAP explainability
Reproducible pipeline
Docker included
CI/tests included

At a glance

Problem: Can short-term stock direction be predicted using market data, news sentiment and retrieved document context?
Built: End-to-end financial ML pipeline with technical indicators, FinBERT sentiment and document retrieval.
Models / methods: Logistic regression, random forest, gradient boosting, LSTM, GRU and Transformer-style models.
Result: Models performed only slightly above baseline, showing the difficulty of short-horizon prediction.
Strength shown: Leakage-free evaluation, reproducible experiments, honest reporting.
Links: DashboardCase StudyRepo available on request

Visual proof

Live Streamlit dashboard — prediction, sentiment, backtest & document Q&A

Model comparison — macro-F1 vs random baseline

Pipeline architecture — data, features, models, RAG & serving

Charts and diagrams are real outputs and architecture from the project.

01Objective

Predict whether a stock moves UP, DOWN, or SIDEWAYS over the next few trading days, and test honestly whether news sentiment and document context improve on price-only models.

02Dataset / input

Daily OHLCV price history for a basket of large-cap tickers
Engineered technical indicators (returns, moving averages, RSI, MACD, volatility)
Financial-news headlines scored for sentiment with FinBERT
Filings and reports indexed for retrieval-augmented context

03Model approach

Framed as a 3-class problem with strict, leakage-free time-based splits
Classical baselines: logistic regression, random forest, gradient boosting
Deep sequence models: LSTM, GRU and a Transformer-style encoder
Ablations to isolate the contribution of sentiment and retrieval

04Results / metrics

Evaluated with macro-F1 to respect class imbalance. Models clustered just above the random baseline (0.33) — the honest finding that 5-day direction is close to unpredictable on this universe. SHAP confirmed the models leaned on volatility and momentum signals rather than artefacts.

05Deployment / reproducibility

Packaged as a modular pipeline with a single config, unit tests, a Streamlit dashboard, a Dockerfile, and CI — so every experiment is reproducible end to end.

06Limitations

Short-horizon market direction is inherently noisy and near-random
Sentiment used a reproducible sample feed rather than a paid dated archive
Single market, daily resolution, a handful of tickers

07Future improvements

Sector-specific market context and a real dated news API
Walk-forward cross-validation and probability calibration
A FastAPI inference service alongside the dashboard

08Key takeaway

A negative result, reported honestly: the engineering rigour (leakage-free splits, ablations, reproducibility) matters more than chasing an inflated score on a near-random problem.

Back to all projects