System Architecture
The system combines three complementary model components, each producing a distinct market view that is fused into a single allocation signal. These are model components, not autonomous agents; the dashboard's advisor personas are a separate, rule-based presentation layer.
Technical Agent
Three-model stacking ensemble — XGBoost[1], LightGBM, and CatBoost as base learners with a Ridge meta-learner for final prediction.
Temporal Agent
PyTorch GRU with a scaled dot-product attention head, capturing sequential dependencies in price dynamics over variable-length lookback windows.
NLP Agent
ProsusAI/FinBERT[2] transformer fine-tuned on financial text. Produces headline-level sentiment with confidence-weighted aggregation.
Feature Engineering
33 technical features are computed from OHLCV data and external signals, organized into eight categories.
| Momentum | RSI, ROC-5, ROC-10, Stochastic %K, Stochastic %D, Williams %R |
| Trend | SMA short / long, MA crossover, trend slope 5d / 20d, trend direction |
| Volatility | ATR%, Bollinger %B, Bollinger width, Garman-Klass volatility, volatility squeeze, volatility regime |
| Volume | Relative volume, volume shock |
| Time | Hour / day-of-week cyclical encoding (sin & cos pairs) |
| Pattern | Candlestick pattern confidence, chart pattern confidence |
| Statistical | Hurst exponent, rolling Sharpe 20d, efficiency ratio, return skewness, return kurtosis |
| Sentiment | VIX z-score, Fear & Greed Index |
Signal Composition
The deployed daily scanner produces its signal as a 5-component weighted composite, blending mean-reversion, momentum, trend, ML, and sentiment views. This is a hand-set heuristic prototype; the thesis's validated allocation differs (standardized causal signals, weights tuned on held-out data, a dedicated forward-cheapness ML model, and budget neutrality via a self-funded reserve), and it is what produces the results in the Validation section.
Validation
Purged Walk-Forward Cross-Validation
Following de Prado[5], training and test folds are separated by a purge gap equal to the maximum label horizon, preventing information leakage from overlapping samples. An embargo period further removes observations whose features could span the train/test boundary.
Causal, Budget-Neutral Evaluation
The allocation backtest is strictly causal: both strategies buy on the same fixed day and budget neutrality is enforced by a self-funded reserve, with no in-window day selection or full-horizon renormalization (either of which would inflate a naive backtest toward 3%). Allocation weights are tuned on a set of symbols and rolling windows that is disjoint from the reported instruments, which are scored once with frozen weights.
Results
On held-out assets (2024 onward) the system delivers a consistent reduction in average cost basis of about 0.5%, positive on seven of eight instruments at roughly 100% capital deployment. A cross-asset extension that reallocates the budget across a basket is larger and statistically significant: a mean cross-sectional active return with t = 2.22 (p = 0.028) over 128 monthly observations.
Explainability
Per-prediction feature attributions are computed using SHAP[3] TreeExplainer on the XGBoost base model. Each signal displayed in the dashboard includes a waterfall chart decomposing the allocation score into additive contributions from individual features — enabling the investor to understand why the model recommends increasing or decreasing allocation on any given day.
NLP Pipeline
Financial sentiment is extracted from live news via a three-stage pipeline. Up to 50 articles are analyzed per symbol per day.
- Collection — Google News RSS feed is scraped for headlines matching tracked asset tickers and sector keywords.
- Classification — Each headline is scored by FinBERT[2] (Araci, 2019), a BERT model pre-trained on financial corpora, producing a 3-class probability distribution (positive / neutral / negative).
- Aggregation— Per-headline scores are aggregated using confidence-weighted averaging, where the weight is the model's softmax confidence. This suppresses low-conviction predictions.
Portfolio Optimization
Multi-asset allocation follows Modern Portfolio Theory[4] (Markowitz, 1952). The efficient frontier is computed via SciPy SLSQP constrained optimization with per-asset-class bounds (equities 40-80%, bonds 10-40%, alternatives 0-20%). Three pre-configured risk profiles — conservative, balanced, and aggressive — correspond to target annualized volatilities of 8%, 12%, and 18%, respectively.
Multi-Asset Portfolio & Currency Conversion
The system supports a multi-asset portfolio across 10 tracked instruments: VWCE.DE, SPY, QQQ, EFA, EEM, GLD, TLT, BTC-USD, AAPL, XOM. Portfolio value is computed by fetching the latest price for each invested symbol and converting to the user's chosen display currency using live exchange rates.
Exchange Rate API
Currency conversion uses the fawazahmed0/currency-api (free, daily-updated, 150+ currencies) with a 1-hour cache. Supported display currencies include USD, EUR, GBP, CHF, RON, JPY, CAD, and AUD. Investments are stored in the asset's native trading currency and converted on-the-fly for display.
Alpha Calculation
The dashboard's simulated alpha compares Smart DCA against fixed-schedule DCA over the period currently shown. Both buy on the same fixed day each month and invest the same total; Smart only tilts the monthly amount by that day's allocation signal (causal, no look-ahead). Because it runs on the displayed window of recent daily signals from the deployed scanner, it is a short, noisy figure, not the thesis's long-run, tuned-model result. Strategy-level risk metrics (Sharpe ratio, maximum drawdown) are computed from monthly portfolio returns.
References
[1]Chen, T. & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD.
[2] Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063.
[3]Lundberg, S. & Lee, S. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30.
[4] Markowitz, H. (1952). Portfolio Selection. The Journal of Finance, 7(1), 77-91.
[5] de Prado, M.L. (2018). Advances in Financial Machine Learning. Wiley.