Methodology | Smart DCA

System Architecture

The system combines three complementary model components, each producing a distinct market view that is fused into a single allocation signal. These are model components, not autonomous agents; the dashboard's advisor personas are a separate, rule-based presentation layer.

analytics

Technical Agent

Three-model stacking ensemble — XGBoost^[1], LightGBM, and CatBoost as base learners with a Ridge meta-learner for final prediction.

XGBoostLightGBMCatBoostRidge

timeline

Temporal Agent

PyTorch GRU with a scaled dot-product attention head, capturing sequential dependencies in price dynamics over variable-length lookback windows.

GRUAttentionPyTorch

article

NLP Agent

ProsusAI/FinBERT^[2] transformer fine-tuned on financial text. Produces headline-level sentiment with confidence-weighted aggregation.

FinBERTTransformers

Feature Engineering

33 technical features are computed from OHLCV data and external signals, organized into eight categories.

Momentum	RSI, ROC-5, ROC-10, Stochastic %K, Stochastic %D, Williams %R
Trend	SMA short / long, MA crossover, trend slope 5d / 20d, trend direction
Volatility	ATR%, Bollinger %B, Bollinger width, Garman-Klass volatility, volatility squeeze, volatility regime
Volume	Relative volume, volume shock
Time	Hour / day-of-week cyclical encoding (sin & cos pairs)
Pattern	Candlestick pattern confidence, chart pattern confidence
Statistical	Hurst exponent, rolling Sharpe 20d, efficiency ratio, return skewness, return kurtosis
Sentiment	VIX z-score, Fear & Greed Index

Signal Composition

The deployed daily scanner produces its signal as a 5-component weighted composite, blending mean-reversion, momentum, trend, ML, and sentiment views. This is a hand-set heuristic prototype; the thesis's validated allocation differs (standardized causal signals, weights tuned on held-out data, a dedicated forward-cheapness ML model, and budget neutrality via a self-funded reserve), and it is what produces the results in the Validation section.

30%

Trailing 20-day returnMean reversion — buy more after drawdowns

25%

RSI signalOversold / overbought oscillator

20%

Price vs. SMATrend-following confirmation

15%

ML ensemble scoreStacking ensemble predicted return

10%

VIX / sentiment fear signalContrarian fear indicator

Validation

Purged Walk-Forward Cross-Validation

Following de Prado^[5], training and test folds are separated by a purge gap equal to the maximum label horizon, preventing information leakage from overlapping samples. An embargo period further removes observations whose features could span the train/test boundary.

Causal, Budget-Neutral Evaluation

The allocation backtest is strictly causal: both strategies buy on the same fixed day and budget neutrality is enforced by a self-funded reserve, with no in-window day selection or full-horizon renormalization (either of which would inflate a naive backtest toward 3%). Allocation weights are tuned on a set of symbols and rolling windows that is disjoint from the reported instruments, which are scored once with frozen weights.

Results

On held-out assets (2024 onward) the system delivers a consistent reduction in average cost basis of about 0.5%, positive on seven of eight instruments at roughly 100% capital deployment. A cross-asset extension that reallocates the budget across a basket is larger and statistically significant: a mean cross-sectional active return with t = 2.22 (p = 0.028) over 128 monthly observations.

Explainability

Per-prediction feature attributions are computed using SHAP^[3] TreeExplainer on the XGBoost base model. Each signal displayed in the dashboard includes a waterfall chart decomposing the allocation score into additive contributions from individual features — enabling the investor to understand why the model recommends increasing or decreasing allocation on any given day.

NLP Pipeline

Financial sentiment is extracted from live news via a three-stage pipeline. Up to 50 articles are analyzed per symbol per day.

Collection — Google News RSS feed is scraped for headlines matching tracked asset tickers and sector keywords.
Classification — Each headline is scored by FinBERT^[2] (Araci, 2019), a BERT model pre-trained on financial corpora, producing a 3-class probability distribution (positive / neutral / negative).
Aggregation— Per-headline scores are aggregated using confidence-weighted averaging, where the weight is the model's softmax confidence. This suppresses low-conviction predictions.

Portfolio Optimization

Multi-asset allocation follows Modern Portfolio Theory^[4] (Markowitz, 1952). The efficient frontier is computed via SciPy SLSQP constrained optimization with per-asset-class bounds (equities 40-80%, bonds 10-40%, alternatives 0-20%). Three pre-configured risk profiles — conservative, balanced, and aggressive — correspond to target annualized volatilities of 8%, 12%, and 18%, respectively.

Multi-Asset Portfolio & Currency Conversion

The system supports a multi-asset portfolio across 10 tracked instruments: VWCE.DE, SPY, QQQ, EFA, EEM, GLD, TLT, BTC-USD, AAPL, XOM. Portfolio value is computed by fetching the latest price for each invested symbol and converting to the user's chosen display currency using live exchange rates.

Exchange Rate API

Currency conversion uses the fawazahmed0/currency-api (free, daily-updated, 150+ currencies) with a 1-hour cache. Supported display currencies include USD, EUR, GBP, CHF, RON, JPY, CAD, and AUD. Investments are stored in the asset's native trading currency and converted on-the-fly for display.

Alpha Calculation

The dashboard's simulated alpha compares Smart DCA against fixed-schedule DCA over the period currently shown. Both buy on the same fixed day each month and invest the same total; Smart only tilts the monthly amount by that day's allocation signal (causal, no look-ahead). Because it runs on the displayed window of recent daily signals from the deployed scanner, it is a short, noisy figure, not the thesis's long-run, tuned-model result. Strategy-level risk metrics (Sharpe ratio, maximum drawdown) are computed from monthly portfolio returns.

REF

References

[1]Chen, T. & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD.

[2] Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063.

[3]Lundberg, S. & Lee, S. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30.

[4] Markowitz, H. (1952). Portfolio Selection. The Journal of Finance, 7(1), 77-91.

[5] de Prado, M.L. (2018). Advances in Financial Machine Learning. Wiley.

ML-Powered Value-WeightedDollar-Cost Averaging