Skip to content

ayam04/StockMarketML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Stock Analysis with Machine Learning Documentation

The code provided performs stock analysis using machine learning techniques. It utilizes historical stock data to train a random forest classifier and make predictions about future stock prices. Additionally, it includes functionality to perform the analysis using Apache Spark's machine learning library.

Let's understand the code and its components in detail:

1. Importing Required Libraries: The code starts by importing the necessary libraries and modules, including yfinance for fetching stock data, datetime for handling dates, pandas for data manipulation, and various modules from sklearn and pyspark for machine learning tasks.

2. Ticker Class: The Ticker class encapsulates the functionality related to stock analysis. It is responsible for fetching historical data, generating technical indicators, creating predictions, splitting the data, and training the machine learning model.

Properties:

NUM_DAYS: The number of days of historical data to retrieve. INTERVAL: The sample rate of historical data. INDICATORS: A list of symbols representing technical indicators.

Methods:

init(self, symbol): The constructor initializes the Ticker object with the given stock symbol and calls the _get_historical_data() method.

_get_historical_data(self): This method uses the yfinance API to fetch the historical stock data for the specified number of days. The data is stored in the data attribute.

_get_indicator_data(self): This method uses the finta API to calculate the technical indicators specified in the INDICATORS list. The calculated indicators are merged with the existing stock data.

_produce_prediction(self, window=10): This method generates the "truth" values for the classification problem. It compares the current stock price with the price window days ahead to determine if the price increased (1) or decreased (0). The predictions are stored in the pred column of the data.

_produce_data(self, window): This is the main data processing method that calls other methods to smooth the data, calculate technical indicators, and create predictions based on a specified window. It also removes unnecessary columns from the data.

_split_data(self): This method splits the data into training and testing sets. It assigns the target variable (pred) to y and selects the feature columns for X.

_train_random_forest(self): This method trains a random forest classifier using the RandomForestClassifier from sklearn. It fits the classifier on the training data, makes predictions on the testing data, and prints the classification report, confusion matrix, and feature importances.

_data_clean(self, x=15): This method is responsible for cleaning and preparing the data for analysis. It calls the necessary data processing methods and splits the data. It also measures the time taken for data cleaning.

_model(self): This method trains the random forest classifier and prints the time taken for training.

_spark_rf(self): This method performs stock analysis using Spark's machine learning library. It creates a Spark DataFrame from the data, defines the features, splits the data into training and testing sets, assembles the features using VectorAssembler, and creates a pipeline with a Gradient Boosting Tree (GBT) classifier. It fits the pipeline on the training data, makes predictions on the testing data, and evaluates the accuracy.

3. Code Execution

After defining the Ticker class and its methods, the code creates an instance of the Ticker class with the stock symbol 'SPY'. Then, it calls the _data_clean() and _spark_rf() methods to perform the stock analysis.

The _data_clean() method cleans and prepares the data by calling the necessary data processing methods and splitting the data into training and testing sets. It also measures the time taken for data cleaning.

The _spark_rf() method performs stock analysis using Spark's machine learning library. It creates a Spark DataFrame from the data, defines the features, splits the data into training and testing sets, assembles the features using VectorAssembler, and creates a pipeline with a Gradient Boosting Tree (GBT) classifier. It fits the pipeline on the training data, makes predictions on the testing data, and evaluates the accuracy.

Finally, the code instantiates the Ticker object, calls the _data_clean() and _spark_rf() methods to perform the stock analysis, and prints the necessary results.

It's worth noting that the code provided is a simplified version and might require additional modifications or adjustments to run successfully. Additionally, the code assumes that the necessary libraries and dependencies are installed in the environment.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published