Price forecasting of SGB and IRFC Bonds and comparing there returns

Overview

Project_Bonds

Project Title : Price forecasting of SGB and IRFC Bonds and comparing there returns.

Introduction of the Project

The 2008-09 global financial crises and 2020-21 pandemic have shown us the volatility of the market. Many people have are finding a way to invest money to secure their future. People are trying to find a secure investment with minimum financial risks with higher returns. This is also a fact that with investment their also comes with risks. There is a saying in the world of investment “Do not put all your egg in one basket”. We need to diverse portfolio in the area of investment, so that if one investment does not give you enough yields due to fluctuations in the market rates then other will give you higher yield. Bonds are one such investment people prefer the most. The Bonds we have selected are two government bonds – SGB (Sovereign Gold Bond) and IRFC (Indian Railway Finance Corporation). The objective was to forecast the prices of SGB and IRFC bond and calculate the returns. Compare the returns and recommend the client which one to pick based on the input that is number of years to forecast.

Technologies Used

  • Python – ML model (auto_arima (for grid search to find p,q,d values), ARIMA(for forecasting values))
  • SQLite – Database
  • Flask – Front End for deployment
  • Python Libraries – numpy, pandas, Statsmodels, re, nsepy, matplotlib
  • HTML/CSS

General info

This project is simple Forecasting model. Not taxes were put into use when calculating returns. IRFC Bond is a tax free bond but SGB we need to pay taxes if we try to sell it before the maturity period is over. Inflation rate and global pandemic situation is a rare phenonmenon and it is beyond anyone's control. It has been taken into business restriction.
Data has been collected from National Stock exchange of India The two bonds selected from NSE was -

Requirement file (contains libraries and their versions)

Libraries Used

Project Architecture

alt text

Explaining Project Architecture

Live data extraction

The data collected from NSE website (historical data) and the library which is used to collect live daily data from the website is nsepy. The data is then goes to python, two things happens in python. First, out of all the attributes, we only take "Close Price" and then the daily is then converted into monthly data. We use mean to calculate the the monthly average.

Data storage in sqlite

We chose SQLite because it is very easy to use and one does not need the knowledge of sql to observe the data. the database is created locally and and is being updated when the user usses the application. the user can easliy take the database and see the data in SQL viewr online available.

Data is then used by the model

When data is then called back by the python. the python then perform differencing method to remove the trend and seasonality from the data so that our data can be stable. For successful forecasting, it is necessary to keepp the time series data to be stationary.

p,d,q Hyperparameters

We use auto_arima function to calculate p,d,q value. We use re(regex) to store the summary of auto_arima in string format. then use "re.findall()" funtion to collect the value of p,d,q values. The downpoint of using this auto_arima function is that it runs two times when the programes gets executed. It calculate the hyperparameter values for both SGB and IRFC data.

ARIMA

This part is where the data is taken and then fit & predict.
This is for 12 months. Actual Data vs Predicted Data

Model Evaluation

SGB

The RMSE: 93.27 Rs. & The MAPE: 0.0185

IRFC

The RMSE: 21.62 Rs. & The MAPE: 0.0139
(Pretty Good)

Forecasting (12 Months)

Forecasted Data (12 Months)

Returns

This is the part where both SGB and IRFC foecasted data is being collected and based on that returns are calculated. If the SGB returns is higher than IRFC bonds then it will tell the customer about the amount of return for a specific time period.

User Input

The user will be given 3 options as Input. The user will select a specific time period from a drop down list. The options are -

  1. 4 Months (Quaterly)
  2. 6 Months (Half yearly)
  3. 12 Months (Anually)
    This options are time pperiod to forecast. If the user press 6 then the output page will show "6" forecasted values with a range Upper Price, Forecasted Price, Lower Price for both the bonds side by side. Below there will be a text where the returns will be diplayed if the user decides to sell the bonds then.
    12 Months Forecasted Prices - forecasted_prices

Python_code

correlation matrix fig=plt.gcf() fig.set_size_inches(10,8) plt.show() heatmap(gold) heatmap(bond) ############################### Live data to Feature engineering ################################################3 ##Taking close price as our univariate variable ##For gold gold=pd.DataFrame(gold["Close"]) gold["date"]=gold.index gold["date"]=gold['date'].astype(str) gold[["year", "month", "day"]] = gold["date"].str.split(pat="-", expand=True) gold['Dates'] = gold['month'].str.cat(gold['year'], sep ="-") gold.Dates=pd.to_datetime(gold.Dates) gold.set_index('Dates',inplace=True) col_sgb=pd.DataFrame(gold.groupby(gold.index).Close.mean()) ##For bond bond=pd.DataFrame(bond["Close"]) bond["date"]=bond.index bond["date"]=bond['date'].astype(str) bond[["year", "month", "day"]] = bond["date"].str.split(pat="-", expand=True) bond['Dates'] = bond['month'].str.cat(bond['year'], sep ="-") bond.Dates=pd.to_datetime(bond.Dates) bond.set_index('Dates',inplace=True) col_bond=pd.DataFrame(bond.groupby(bond.index).Close.mean()) col_sgb.columns = ["Avg_price"] col_bond.columns = ["Avg_price"] col_bond.isnull().sum() col_sgb.isnull().sum() ############################ SQL connection with monthly data ################################################ ############################### SQL database is created ################################################3 # Connect to the database from sqlalchemy import create_engine engine_sgb = create_engine('sqlite:///gold_database.db', echo=False) col_sgb.to_sql('SGB', con=engine_sgb,if_exists='replace') df_sgb = pd.read_sql('select * from SGB',engine_sgb ) df_sgb.Dates=pd.to_datetime(df_sgb.Dates) df_sgb.set_index('Dates',inplace=True) engine_irfcb = create_engine('sqlite:///irfcb_database.db', echo=False) col_bond.to_sql('IRFCB', con=engine_irfcb,if_exists='replace') df_bond = pd.read_sql('select * from IRFCB',engine_irfcb) df_bond.Dates=pd.to_datetime(df_bond.Dates) df_bond.set_index('Dates',inplace=True) ############################### SQL data to python ################################################3 # Plotting def plotting_bond(y): fig, ax = plt.subplots(figsize=(20, 6)) ax.plot(y,marker='.', linestyle='-', linewidth=0.5, label='Monthly Average') ax.plot(y.resample('Y').mean(),marker='o', markersize=8, linestyle='-', label='Yearly Mean Resample') ax.set_ylabel('Avg_price') ax.legend(); plotting_bond(df_sgb) plotting_bond(df_bond) #univariate analysis of Average Price df_sgb.hist(bins = 50) df_bond.hist(bins = 50) # check Stationary and adf test def test_stationarity(timeseries): #Determing rolling statistics rolmean = timeseries.rolling(12).mean() rolstd = timeseries.rolling(12).std() #Plot rolling statistics: fig, ax = plt.subplots(figsize=(16, 4)) ax.plot(timeseries, label = "Original Price") ax.plot(rolmean, label='rolling mean'); ax.plot(rolstd, label='rolling std'); plt.legend(loc='best') plt.title('Rolling Mean and Standard Deviation - Removed Trend and Seasonality') plt.show(block=False) print("Results of dickey fuller test") adft = adfuller(timeseries,autolag='AIC') print('Test statistic = {:.3f}'.format(adft[0])) print('P-value = {:.3f}'.format(adft[1])) print('Critical values :') for k, v in adft[4].items(): print('\t{}: {} - The data is {} stationary with {}% confidence'.format(k, v, 'not' if v y: a = print("The retrun of SGB is {a} and the return of IRFC Bond is {b} after {c} months".format(a=x,b=y,c=t)) else: a = print("The return of IRFC Bond is{a} and the return of SGB Bond is {b} after {c} months".format(a=x,b=y,c=t)) return a output_(gain_sgb,gain_bond, n) ">
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pylab import rcParams
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
from pmdarima.arima import auto_arima
from sklearn.metrics import mean_squared_error
import math
import re
from datetime import date
import nsepy 
import warnings
warnings.filterwarnings("ignore")
####################################          Live data extraction              ###################################################
##Extracting data from nsepy package
da=date.today()
gold= pd.DataFrame(nsepy.get_history(symbol="SGBAUG24",series="GB", start=date(2016,9,1), end=da))
bond= pd.DataFrame(nsepy.get_history(symbol="IRFC",series="N2", start=date(2012,1,1), end=da))

#############################                 Live data  extraction end                  ###############################################

# Heatmap - to check collinearity
def heatmap(x):
    plt.figure(figsize=(16,16))
    sns.heatmap(x.corr(),annot=True,cmap='Blues',linewidths=0.2) #data.corr()-->correlation matrix
    fig=plt.gcf()
    fig.set_size_inches(10,8)
    plt.show()
heatmap(gold)
heatmap(bond)
###############################                Live data to Feature engineering            ################################################3             

##Taking close price as our univariate variable
##For gold
gold=pd.DataFrame(gold["Close"])
gold["date"]=gold.index
gold["date"]=gold['date'].astype(str)
gold[["year", "month", "day"]] = gold["date"].str.split(pat="-", expand=True)
gold['Dates'] = gold['month'].str.cat(gold['year'], sep ="-")
gold.Dates=pd.to_datetime(gold.Dates)
gold.set_index('Dates',inplace=True)
col_sgb=pd.DataFrame(gold.groupby(gold.index).Close.mean())

##For bond
bond=pd.DataFrame(bond["Close"])
bond["date"]=bond.index
bond["date"]=bond['date'].astype(str)
bond[["year", "month", "day"]] = bond["date"].str.split(pat="-", expand=True)
bond['Dates'] = bond['month'].str.cat(bond['year'], sep ="-")
bond.Dates=pd.to_datetime(bond.Dates)
bond.set_index('Dates',inplace=True)
col_bond=pd.DataFrame(bond.groupby(bond.index).Close.mean())

col_sgb.columns = ["Avg_price"]
col_bond.columns = ["Avg_price"]

col_bond.isnull().sum()
col_sgb.isnull().sum()

############################                  SQL connection with monthly data           ################################################ 
###############################                SQL database is created                  ################################################3             

# Connect to the database
from sqlalchemy import create_engine
engine_sgb = create_engine('sqlite:///gold_database.db', echo=False)
col_sgb.to_sql('SGB', con=engine_sgb,if_exists='replace')
df_sgb = pd.read_sql('select * from SGB',engine_sgb )

df_sgb.Dates=pd.to_datetime(df_sgb.Dates)
df_sgb.set_index('Dates',inplace=True)


engine_irfcb = create_engine('sqlite:///irfcb_database.db', echo=False)
col_bond.to_sql('IRFCB', con=engine_irfcb,if_exists='replace')
df_bond = pd.read_sql('select * from IRFCB',engine_irfcb)

df_bond.Dates=pd.to_datetime(df_bond.Dates)
df_bond.set_index('Dates',inplace=True)
###############################                SQL data to python                 ################################################3             



# Plotting
def plotting_bond(y):
    fig, ax = plt.subplots(figsize=(20, 6))
    ax.plot(y,marker='.', linestyle='-', linewidth=0.5, label='Monthly Average')
    ax.plot(y.resample('Y').mean(),marker='o', markersize=8, linestyle='-', label='Yearly Mean Resample')
    ax.set_ylabel('Avg_price')
    ax.legend();
plotting_bond(df_sgb)
plotting_bond(df_bond)

#univariate analysis of Average Price
df_sgb.hist(bins = 50)
df_bond.hist(bins = 50)

# check Stationary and adf test
def test_stationarity(timeseries):
    #Determing rolling statistics
    rolmean = timeseries.rolling(12).mean()
    rolstd = timeseries.rolling(12).std()
    #Plot rolling statistics:
    fig, ax = plt.subplots(figsize=(16, 4))
    ax.plot(timeseries, label = "Original Price")
    ax.plot(rolmean, label='rolling mean');
    ax.plot(rolstd, label='rolling std');
    plt.legend(loc='best')
    plt.title('Rolling Mean and Standard Deviation - Removed Trend and Seasonality')
    plt.show(block=False)
    
    print("Results of dickey fuller test")
    adft = adfuller(timeseries,autolag='AIC')
    print('Test statistic = {:.3f}'.format(adft[0]))
    print('P-value = {:.3f}'.format(adft[1]))
    print('Critical values :')
    for k, v in adft[4].items():
        print('\t{}: {} - The data is {} stationary with {}% confidence'.format(k, v, 'not' if v
    
      y:
        a = print("The retrun of SGB is {a} and the return of IRFC Bond is {b} after {c} months".format(a=x,b=y,c=t))
    else:
        a = print("The return of IRFC Bond is{a} and the return of SGB Bond is {b} after {c} months".format(a=x,b=y,c=t))
    return a
output_(gain_sgb,gain_bond, n)

    

Home Page (Used HTML and CSS)

home

Predict Page

predict

Output Page

output

Project Completed --

Owner
Tishya S
Data Science aspirant
Tishya S
Simple and flexible ML workflow engine.

This is a simple and flexible ML workflow engine. It helps to orchestrate events across a set of microservices and create executable flow to handle requests. Engine is designed to be configurable wit

Katana ML 295 Jan 06, 2023
ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

Broad Institute 65 Dec 20, 2022
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

Benedek Rozemberczki 619 Dec 14, 2022
Titanic Traveller Survivability Prediction

The aim of the mini project is predict whether or not a passenger survived based on attributes such as their age, sex, passenger class, where they embarked and more.

John Phillip 0 Jan 20, 2022
Applied Machine Learning for Graduate Program in Computer Science (PPGCC)

Applied Machine Learning for Graduate Program in Computer Science (PPGCC) - Federal University of Santa Catarina

Jônatas Negri Grandini 1 Dec 22, 2021
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

SUN Group @ UMN 28 Aug 03, 2022
PySpark ML Bank Churn Prediction

PySpark-Bank-Churn Surname: corresponds to the record (row) number and has no effect on the output. CreditScore: contains random values and has no eff

kemalgunay 2 Nov 11, 2021
Responsible Machine Learning with Python

Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.

ph_ 624 Jan 06, 2023
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Jan 06, 2023
To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Astitva Veer Garg 1 Jan 11, 2022
MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data

MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data. We demonstrate its use

Pachter Lab 26 Nov 29, 2022
TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models

538 Jan 01, 2023
Tutorial for Decision Threshold In Machine Learning.

Decision-Threshold-ML Tutorial for improve skills: 'Decision Threshold In Machine Learning' (from GeeksforGeeks) by Marcus Mariano For more informatio

0 Jan 20, 2022
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics

Facebook Research 4.1k Dec 29, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 08, 2023
Code for the TCAV ML interpretability project

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Been Kim, Martin Wattenberg, Justin Gilmer, C

552 Dec 27, 2022
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. Solve a variety of tasks with pre-trained models or finetune them in

Backprop 227 Dec 10, 2022
Machine Learning e Data Science com Python

Machine Learning e Data Science com Python Arquivos do curso de Data Science e Machine Learning com Python na Udemy, cliqe aqui para acessá-lo. O prin

Renan Barbosa 1 Jan 27, 2022
Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Covid-polygraph, a set of Machine Learning-driven fact-checking tools that aim to address the issue of misleading information related to COVID-19.

1 Apr 22, 2022