Skip to content

Reproduce Constitutional AI Steps #2

@Mistobaan

Description

@Mistobaan

Overview

This issue captures some of the key steps required to reproduce the Constitutional AI paper steps to fine tune a RLHF model with feedback generated by a RLAIF model.

Phase One

image

  • Gather a dataset of harmful prompts
  • Create a base script to compose prompts using a base constitution
  • Generate a new dataset of prompts + responses using Carper's GPT-J RLHF to review / critique the output
  • Fine-tune the original model on revised responses using supervised learning

Phase Two

image

  • Sample the fine tuned model using the dataset of harmful prompts to create a new dataset with multiple outputs
  • Train a "reward model' (i.e. https://github.com/Dahoas/reward-modeling) to select the best result (fine tuned preference model)
  • Use RLAIF training to fine tune the RLHF model

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions