Avatar of Sarbhanu Baidya

Corporate Bond Spread Forecasting

MIT License

Objective

This project aims to forecast corporate bond spreads using historical market and macroeconomic data. The analysis demonstrates exploratory data analysis, feature engineering, and multiple time series and machine learning models. The final output highlights a linear regression model with lagged features as an interpretable and robust forecasting tool.


0. Browse Notebooks

Notebook LinkDescription
01_data_collection.ipynbData collection and merging
02_EDA.ipynbExploratory data analysis and feature engineering
03_Modeling.ipynbTime series and machine learning modeling, hyperparameter tuning, final model export

1. Datasets

The dataset includes:

  • Corporate bond yields
  • 10-year Treasury yields
  • VIX (market volatility index)
  • CPI (Consumer Price Index)
  • Fed Funds rate

The corporate bond spread, our primary target, is calculated as:

Spread = Corporate Bond Yield - 10-Year Treasury Yield


2. Exploratory Data Analysis

Corporate Spread Over Time

Corporate Spread Over Time

Correlation Structure

Correlation Matrix


3. Forecasting Models

ARIMA

Captures autocorrelation in the spread series. ARIMA Forecast

VAR

Incorporates interdependencies between spread and macroeconomic variables. VAR Forecast

SARIMA

Accounts for potential seasonality in the spread. SARIMA Forecast

Holt-Winters

Captures both trend and seasonal components for robust forecasting. Holt-Winters Forecast

Regression on Lagged Features

Simple linear regression with lagged spreads and macro variables. Regression Lagged Forecast

Gradient Boosting

Non-linear tree-based model capturing complex interactions. Gradient Boosting Forecast

Random Forest

Ensemble model with robust performance and low sensitivity to hyperparameters. Random Forest Forecast

Final Linear Regression Model

Interpretable model using five lagged spreads and macro variables. Final Linear Regression Forecast


4. Feature Contribution

FeatureCoefficientImpact
spread0.95941.6013
log_vix0.13560.3959
cpi-0.00016-0.0372
fed_funds_rate0.005680.00734

Interpretation: Lagged spreads dominate the forecast, reflecting strong autocorrelation. VIX contributes meaningfully to risk adjustments. CPI and Fed Funds rate have minor short-term influence.


5. Model Evaluation

Models are evaluated using root mean squared error (RMSE) and mean absolute percentage error (MAPE).

  • RMSE quantifies the average magnitude of forecast errors, penalizing larger deviations.
  • MAPE expresses errors as a percentage, providing an intuitive measure of forecast accuracy across varying scales.

These metrics are standard in quantitative finance for short-term risk and spread forecasting.

Key Observations:

  • Linear regression with lagged features outperforms more complex tree-based and ARIMA models in rolling out-of-sample validation.
  • Walk-forward validation provides a realistic estimate of forecast performance, avoiding data leakage.

6. Project Artifacts

Notebooks

NotebookDescriptionPath
01_data_collection.ipynbData collection and mergingNotebooks/01_data_collection.ipynb
02_EDA.ipynbExploratory data analysis and feature engineeringNotebooks/02_EDA.ipynb
03_Modeling.ipynbTime series and machine learning modeling, hyperparameter tuning, final model exportNotebooks/03_Modeling.ipynb

Models and Outputs

FileDescriptionPath
final_lr_model.pklTrained linear regression modelmodel/final_lr_model.pkl
feature_impact.csvFeature contribution tablemodel/feature_impact.csv
predictions.csvObserved vs predicted spreads for test periodmodel/predictions.csv

7. Insights and Takeaways

  1. Feature Engineering
    Lagged spreads and macroeconomic transformations are critical for accurate short-term forecasting.

  2. Model Selection
    Linear models offer interpretability and stability; tree-based and ARIMA models may require more data or hyperparameter tuning.

  3. Interpretability
    Coefficients provide actionable insights into which features drive spreads, supporting quantitative risk management decisions.

  4. Robust Forecasting
    Walk-forward validation ensures models generalize to unseen data, providing a realistic measure of prediction risk.


8. Conclusion

An interpretable linear regression model with well-engineered lag features provides robust short-term forecasts for corporate bond spreads. While complex models capture non-linearities, the linear approach balances accuracy, stability, and interpretability, aligning with quantitative risk management objectives.

All plots, forecasts, notebooks, and model artifacts are included above for full transparency of methodology and results.