PhishGPT: Text/Email content Phishing Detector based on OpenAI API
The goal of this project is to show the capabilities of the gpt-3.5-turbo model to analyze and detect Phishing content. Given the current situation caused by this type of attack, this is one the approches using the NLP(transformers) power. Also, This is a capstone project for Wizeline Developers Sprint.
This file contains the main Flask application responsible for handling user requests, interacting with the data processing module (data.py
), and querying the OpenAI GPT-3.5 model through the chat.py
module.
app.py
serves as the entry point for the Flask web application. It handles incoming HTTP requests, processes user input, interacts with the data
module for text handling, and utilizes the chat
module to obtain AI-driven responses based on user input prompts.
This file depends on:
Flask
: Used for creating the web application and handling routes.data
: Module containing functions for text processing and API handling.chat
: Module responsible for interacting with the OpenAI GPT-3.5 model.
- Renders the
index.html
template when the user accesses the root URL.
- Handles POST requests sent to
/text
. - Processes user input obtained from a form submission.
- Constructs a prompt based on the user's input.
- Queries the GPT-3.5 model using the
query()
function from thechat
module. - Prints the classification and likelihood obtained from the model.
- Renders the
results.html
template with relevant information obtained from the model for display to the user.
This file contains the functionalities related to interacting with the OpenAI API for conducting conversations and processing the responses.
chat.py
serves as a module responsible for communication with the OpenAI API using the OpenAI Python library. It is primarily used to generate AI-driven responses based on given prompts.
This file depends on:
openai
library: Used to interact with OpenAI's GPT models.os
module: Utilized for accessing environment variables.
This function sends a prompt to the OpenAI GPT-3.5 model and processes the generated response.
prompt
: The text prompt sent to the GPT-3.5 model for generating a response.
- Environment Setup: Retrieves the OpenAI API key from environment variables.
- OpenAI API Interaction: Utilizes the OpenAI Python library to create a chat completion by sending a prompt to the GPT-3.5 model.
- Response Processing: Extracts the response from the model dump, sends it to the
apiHandler
function in thedata.py
file for further processing. - Return Values:
queryResponse
: The processed response obtained fromapiHandler
.likelihood
: Likelihood score related to the response (fromapiHandler
).classification
: Classification of the response (fromapiHandler
).
This file contains functions responsible for handling text manipulation and processing the API response obtained from the OpenAI GPT-3.5 model.
data.py
serves as a module primarily focused on text handling and processing API responses to extract relevant information such as classification, likelihood scores, and details.
- This function processes the input text by converting it to lowercase and removing extra whitespace using regular expressions.
text
: The input text to be processed.
- Text Processing: Converts the input text to lowercase and removes excess whitespace using regular expressions (
re.sub()
). - Return Value: Returns the processed text.
- This function extracts classification, likelihood, and details from the GPT-3.5 model's response.
gpt_response
: The response obtained from the GPT-3.5 model.
- Response Parsing: Splits the response into parts based on whitespace.
- Initialization: Initializes variables to store classification, likelihood, and details.
- Iteration: Iterates through the response parts to identify and extract specific information such as classification, likelihood, and details based on predefined markers ("Classification:", "Likelihood:", "Details:").
- Return Values:
details
: Extracted details from the response.likelihood
: Likelihood score extracted from the response.classification
: Classification extracted from the response.
All the dependencies required to run the project.
- Flask==1.1.4
- openai==0.10.2
- Cloning the repository
- Installing dependencies (
pip install -r requirements.txt
) - Environment setup: install anaconda, create an env (use the name you like).
- Once you setup and installed everything, you can go to your terminal, beware of the env, it must be activated, then you can type this -> (
python app.py
) and press enter. - if the gods of coding bless you and no errors prompted, you should see an ip address, something like this:
Running on http://127.0.0.1:5000
you press CTRL+click on this address and it will open your default browser with the app running. - Now you have to press the button to access the text area; furthermore you can paste any phishing related text and press the Submit button. and the
results.html
page will show with the results.
- this is a capstone project, in the future will be updated with more functionalities and better UI/IX.
-
Error Handling: The code does not appear to have error handling. If the OpenAI API call fails or does not return the expected format, the application might experience unexpected crashes or behavior.
-
Input Sanitization: 'data.py' textHandler function santizes input to some extent but does not guard against potential security risks that might arise from user input.
-
Parsing Reliability: The 'apiHandler' assumes a specific response format that is prone to break if the format changes. It is based on splitting the response text and finding keywords, which is not very robust.
-
Code Structure: The application lacks modularization; the functionality could be better separated into distinct components. Additionally, there is little to no in-code documentation which makes understanding and maintaining the code difficult.
-
Security: There are potential security issues with the way the API key is accessed and the lack of input validation/sanitization.
-
Innovation: While the application's base functionality is not highly innovative, the idea of using GPT-3 for phishing detection is a novel approach.