Reproduce Constitutional AI Steps

# Overview

This issue captures some of the key steps required to reproduce the Constitutional AI paper steps to fine tune a RLHF model with feedback generated by a RLAIF model. 

## Phase One
<img width="414" alt="image" src="https://user-images.githubusercontent.com/112599/231852338-a4a2e323-891e-44f8-b14e-d7d813ac6c08.png">

- [ ] Gather a dataset of harmful prompts 
- [ ] Create a base script to compose prompts using a base *constitution* 
- [ ] Generate a new dataset of prompts + responses using Carper's [GPT-J RLHF](https://huggingface.co/reciprocate/ppo_hh_gpt-j) to review / critique the output 
- [ ] Fine-tune the original model on revised responses using supervised learning

## Phase Two
<img width="528" alt="image" src="https://user-images.githubusercontent.com/112599/231852381-e674bae9-83b2-4c34-9900-8d671d1db052.png">

- [ ] Sample the fine tuned model using the dataset of harmful prompts to create a new dataset with multiple outputs
- [ ] Train a "reward model' (i.e. https://github.com/Dahoas/reward-modeling) to select the best result (fine tuned preference model)
- [ ] Use RLAIF training to fine tune the RLHF model 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce Constitutional AI Steps #2

Overview

Phase One

Phase Two

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproduce Constitutional AI Steps #2

Description

Overview

Phase One

Phase Two

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions