I coded up something that scrapes the federal reserve for machine learning (Multiple and Logistic Regression).
I call it SemanticFilter. It uses keywords and scrapes the federal reserve and dumps a na backfilled/joined list. It's growing in capability and I just started implementing Multiple Regression Analysis (and almost sliding windows). My goal is to grow this into a financial flag/alert generator using binary logistic regression. I can utilize AlphaAdvantage API to acquire other markets.
Sample dataset (gold). These predictors will more or less predict gold's up/down movement 80% of the time (I use confusion matrix's to acquire scores).
I'm halfway done with a Masters in Data Science from California State University and I'm learning a lot of neat applicable ideas.
The next two things I'm working on is testing for collinearity (tbh I've been doing this post by just observing the graphs) as well as comparing influence of [grouped] standardized coefficients (using SPSS). However, I've read that the effects of collinearity are reduced once Principle Component Analysis is applied. Which I do plan on doing after I implement Logistic Binary Regression.
Here's a picture of an Adjusted R2 of .933 multiple linear regression model applied to Gold
This is the 3rd financial market I've tried to figure out. I started with the 20 shiller index with some limited success (best predictor was a 3 quarter linear regression of the dependent variable, but I could sometimes get closer with multiple regression analysis. I would get closer up/down predictions using binary logistic). I have yet to test using binary logistic due to my novice skills with R, but I know how to derive the values manually if I need to.
submitted by /u/Thistleknot