Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worksheet Generator Feature PR #102

Closed
wants to merge 17 commits into from
Closed

Conversation

vash02
Copy link

@vash02 vash02 commented Sep 4, 2024

Worksheet Question Generator

This project enables the generation of quiz and worksheet questions using a machine learning model, integrated with LangChain. Users can provide topics, difficulty levels, and hints to generate customized questions, with robust validation processes to ensure accuracy and relevance.

Key Features

Dual-Prompt Setup

Based on experimentation, using two distinct prompts—one for quiz questions and another for worksheet questions—yields optimal results. This dual-prompt approach enhances the quality and relevance of the generated content.

Question Generation

The WorksheetBuilder class supports the generation of both quiz and worksheet questions based on:

  • Topic: Subject of the questions.
  • Level: Difficulty (beginner, intermediate, advanced).
  • Hint: Question style (e.g., "single sentence" or "multiple choice").
  • q_type: Specifies quiz or worksheet.

Validation Mechanism

  • Format Validation: Ensures quiz questions contain choices, answers, and explanations; worksheet questions contain a question, answer, and explanation.
  • Cosine Similarity Validation: Ensures question-answer relevance using cosine similarity scores, rejecting hallucinations with a similarity threshold.

Logging & Error Handling

Detailed logs at key stages ensure smooth execution and help debug any issues.

How to Use

  1. Instantiate the WorksheetBuilder class with the required parameters.
  2. Use create_questions() for quiz questions or create_worksheet_questions() for worksheet questions.
  3. The validated questions are returned as dictionaries containing the question, answer, and explanation.

Example

executor([ToolFile(url="https://example.com/resource.pdf")],
         "machine learning", "Masters", "single sentence answer questions", 5, 5)

Copy link
Contributor

@hash2004 hash2004 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add description to your PR as well. You can use this pull request as inspiration.

app/utils/auth.py Outdated Show resolved Hide resolved
app/utils/auth.py Outdated Show resolved Hide resolved
@AaronSosaRamos AaronSosaRamos changed the base branch from main to Develop November 17, 2024 15:06
@AaronSosaRamos AaronSosaRamos self-requested a review November 17, 2024 15:06
Copy link
Contributor

@AaronSosaRamos AaronSosaRamos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, solve the given comments

- For worksheet generation, use the `create_worksheet_questions()` method.
3. The system will generate, validate, and return the questions as a list of dictionaries. Each dictionary contains a question, its answer, and an explanation.

## Example Usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good README.md. You can try adding the request and response interfaces for enabling a better understanding of what the dev team can expect to interact with during the requests to the AI endpoint.

logger = setup_logger()


def executor(files: list[ToolFile], topic: str, level: str, hint: str, hint_num: int, num_questions: int, verbose=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use a Pydantic schema for decoupling the args.

try:
if verbose: logger.debug(f"Files: {files}")
# Instantiate RAG pipeline with default values
pipeline = RAGpipeline(verbose=verbose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chage the name of this class as it does not support a complete understanding of what it does for the Worksheet Generator.

db = pipeline(files)

# Create and return the quiz questions
output = WorksheetBuilder(db, topic, level, hint, "quiz").create_questions(num_questions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can instantiate your WorksheetBuilder class first instead for reusing the same variable and improve the readability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this file and the other worksheet-generator-quiz-prompt.txt? If they are different, please make sure to use different names for better readability.

attempts = 0
max_attempts = num_questions * 5 # Allow for more attempts to generate questions
print("len of gen qs", len(generated_questions))
while len(generated_questions) < num_questions and attempts < max_attempts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that this while loop is not costful for optimal results.


# Return the list of questions
return generated_questions[:num_questions]
def validate_qtype_response(self, hint: str, answer) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required as the JSONOutputParser supports this.

except ValueError:
return False

def is_response_relevant(self, response: dict) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal is interesting but costful in terms of production resources. Indeed, Google actually provides metrics to accomplish this semantic search process with their embeddings.

max_attempts = hint_num * 5

print("len of gen qs", len(generated_questions))
while len(generated_questions) < hint_num and attempts < max_attempts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that this while loop is not costful in terms of token generation.

if os.environ['ENV_TYPE'] == "production":
set_key = access_secret_file("backend-access")
else:
set_key = "dev"

if api_key is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, remove this as it is affecting the auth. of the AI endpoint.

@AaronSosaRamos AaronSosaRamos self-assigned this Nov 17, 2024
@AaronSosaRamos AaronSosaRamos added type:enhancement For minor updates or changes that improve an existing feature or process. TOOL This is a tool that is currently being worked on Worksheet Generator For the Worksheet Generator tool labels Nov 17, 2024
@Ahmedr275 Ahmedr275 linked an issue Nov 19, 2024 that may be closed by this pull request
32 tasks
@AaronSosaRamos
Copy link
Contributor

Closed due inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TOOL This is a tool that is currently being worked on type:enhancement For minor updates or changes that improve an existing feature or process. Worksheet Generator For the Worksheet Generator tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants