-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worksheet Generator Feature PR #102
Changes from all commits
a9d0436
d60d37f
7c2cd78
28778b5
9ce7580
ebadf0c
0e1ae95
874fcb6
dd4aa20
da07544
7ac5a18
1dbe456
1b90333
9708ba2
be0750c
702c174
8408e3d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Dockerfile | ||
.gitignore | ||
contribution.md | ||
diagram.png | ||
LICENSE | ||
load_env.sh | ||
local-start.sh | ||
README.md | ||
.env | ||
.pytest_cache/ | ||
.github/ | ||
app/__pycache__/ | ||
__pycache__/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# Worksheet Question Generator | ||
|
||
This project provides a framework for generating quiz and worksheet questions using a machine learning model, integrated with the LangChain framework. The system allows users to input topics, difficulty levels, and hints to generate customized questions. To ensure the accuracy, quality, and relevance of the generated content, it leverages various validation mechanisms, including cosine similarity scores, to minimize hallucinations and ensure the correctness of the questions and answers. | ||
|
||
## Key Features | ||
|
||
### 1. Dual-Prompt Setup for Optimal Performance | ||
Through extensive fine-tuning and prompt engineering experiments, it was discovered that using two distinct prompts yields the best results. One prompt is designed for quiz question generation, and another is tailored for worksheet question creation. This dual-prompt approach optimizes question generation by better capturing the specific nuances of each question type, leading to higher-quality and more relevant outputs. | ||
|
||
### 2. Worksheet and Quiz Question Generation | ||
The core functionality of this tool is to generate questions based on the provided topic, level, hint, and question type (quiz or worksheet). The `WorksheetBuilder` class is responsible for this generation process. By invoking machine learning models, configured through LangChain's VertexAI, the system can generate customized questions with high accuracy. | ||
|
||
### 3. Parameters | ||
- **Topic**: The subject matter for the questions. | ||
- **Level**: The difficulty level of the questions (e.g., beginner, intermediate, advanced). | ||
- **Hint**: A hint to guide the style or focus of the questions (e.g., "single sentence answer questions" or "multiple choice questions"). | ||
- **q_type**: Specifies whether to generate quiz questions or worksheet questions. | ||
|
||
### 4. Question Validation | ||
The generated questions undergo several layers of validation to ensure quality, relevance, and correctness. | ||
|
||
#### a. Format Validation | ||
For **quiz questions**, validation checks for essential components such as the question, multiple answer choices, the correct answer, and an explanation. For **worksheet questions**, it ensures the presence of the question, the correct answer, and an explanation. | ||
|
||
#### b. Cosine Similarity Validation | ||
To further ensure the relevance of the question-answer pair: | ||
- The system uses a `SentenceTransformer` model to calculate cosine similarity scores between: | ||
- The question and its answer. | ||
- The question and its explanation. | ||
- These scores help validate whether the generated answer and explanation are semantically aligned with the question. | ||
|
||
### 5. Correctness and Avoiding Hallucinations | ||
To avoid irrelevant or incorrect outputs (hallucinations) from the language model, the system implements the following approach: | ||
- **Cosine Similarity Score**: The system computes similarity scores between the question-answer pair and the question-explanation pair. | ||
- **Maximum Similarity Score**: The higher score between these two pairs is chosen as a measure of content relevance. | ||
- **Validation Threshold**: Only questions where the cosine similarity score exceeds a pre-set threshold (typically 0.6) are deemed valid and added to the final set of generated questions. | ||
|
||
This validation pipeline ensures that the generated questions are not only syntactically correct but also semantically aligned with the topic, minimizing the risk of irrelevant or nonsensical questions. | ||
|
||
### 6. Logging and Error Handling | ||
The system includes detailed logging and error handling mechanisms. Logs capture key stages such as question generation, validation, and any encountered errors. This makes debugging and system monitoring more efficient. | ||
|
||
## How to Use | ||
|
||
1. Instantiate the `WorksheetBuilder` class with the necessary parameters, including the topic, level, hint, and question type. | ||
2. Use one of the two dedicated prompts based on your requirements: | ||
- For quiz generation, call the `create_questions()` method. | ||
- For worksheet generation, use the `create_worksheet_questions()` method. | ||
3. The system will generate, validate, and return the questions as a list of dictionaries. Each dictionary contains a question, its answer, and an explanation. | ||
|
||
## Example Usage | ||
```python | ||
executor([ToolFile(url="https://courses.edx.org/asset-v1:ColumbiaX+CSMM.101x+1T2017+type@asset+block@AI_edx_ml_5.1intro.pdf")], | ||
"machine learning", "Masters", "single sentence answer questions", 5, 5) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# import sys | ||
# import os | ||
from features.quizzify.tools import RAGpipeline | ||
from services.tool_registry import ToolFile | ||
from services.logger import setup_logger | ||
from features.worksheet_generator.tools import WorksheetBuilder | ||
from api.error_utilities import LoaderError, ToolExecutorError | ||
logger = setup_logger() | ||
|
||
|
||
def executor(files: list[ToolFile], topic: str, level: str, hint: str, hint_num: int, num_questions: int, verbose=True): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can use a Pydantic schema for decoupling the args. |
||
try: | ||
if verbose: logger.debug(f"Files: {files}") | ||
# Instantiate RAG pipeline with default values | ||
pipeline = RAGpipeline(verbose=verbose) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Chage the name of this class as it does not support a complete understanding of what it does for the Worksheet Generator. |
||
pipeline.compile() | ||
# Process the uploaded files | ||
db = pipeline(files) | ||
|
||
# Create and return the quiz questions | ||
output = WorksheetBuilder(db, topic, level, hint, "quiz").create_questions(num_questions) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can instantiate your WorksheetBuilder class first instead for reusing the same variable and improve the readability. |
||
output.extend(WorksheetBuilder(db, topic, level, hint, "worksheet").create_worksheet_questions(hint_num)) | ||
|
||
# Try-Except blocks on custom defined exceptions to provide better logging | ||
except LoaderError as e: | ||
error_message = e | ||
logger.error(f"Error in RAGPipeline -> {error_message}") | ||
raise ToolExecutorError(error_message) | ||
|
||
# These help differentiate user-input errors and internal errors. Use 4XX and 5XX status respectively. | ||
except Exception as e: | ||
error_message = f"Error in executor: {e}" | ||
logger.error(error_message) | ||
raise ValueError(error_message) | ||
|
||
return output |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
{ | ||
"inputs": [ | ||
{ | ||
"label": "Topic", | ||
"name": "topic", | ||
"type": "text" | ||
}, | ||
{ | ||
"label": "Level", | ||
"name": "level", | ||
"type": "text" | ||
}, | ||
{ | ||
"label": "Hint", | ||
"name": "hint", | ||
"type": "text" | ||
}, | ||
{ | ||
"label": "Number of Questions", | ||
"name": "num_questions", | ||
"type": "number" | ||
}, | ||
{ | ||
"label": "Upload PDF files", | ||
"name": "files", | ||
"type": "file" | ||
} | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
You are a subject matter expert on the topic: | ||
{topic} | ||
|
||
You have to generate {q_type} type questions on the topic based on academic qualification level of a {level} degree | ||
|
||
Follow these instructions if you are creating a quiz question: | ||
1. Generate a question based on the topic provided and context as key "question" | ||
2. Provide 4 multiple choice answers to the question as a list of key-value pairs "choices" | ||
3. Provide the correct answer for the question from the list of answers as key "answer" | ||
4. Provide an explanation as to why the answer is correct as key "explanation" | ||
|
||
You must respond as a JSON object: | ||
{format_instructions} | ||
|
||
Context: | ||
{context} |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the difference between this file and the other |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
You are a subject matter expert on the topic: | ||
{topic} | ||
|
||
You have to generate {q_type} type questions on the topic based on academic qualification level of a {level} degree | ||
|
||
Follow these instructions if you are creating worksheet type questions: | ||
1. Generate a question based on the topic provided, which should follow the constraint "{hint}" and should be value of key "question" | ||
2. There should be no answer choices for these type of questions | ||
3. Provide the correct answer to the question following the constraint {hint} as a string value of key "answer" | ||
4. Provide an explanation as to why the answer is correct as key "explanation" | ||
|
||
You must respond as a JSON object: | ||
{format_instructions} | ||
|
||
Context: | ||
{context} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good README.md. You can try adding the request and response interfaces for enabling a better understanding of what the dev team can expect to interact with during the requests to the AI endpoint.