Skip to content

Commit

Permalink
OpenAI API example to create instruction examples
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt committed May 25, 2024
1 parent 9c93300 commit 0c253ad
Show file tree
Hide file tree
Showing 5 changed files with 1,653 additions and 4 deletions.
19 changes: 17 additions & 2 deletions ch07/02_dataset-utilities/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ pip install -r requirements-extra.txt




### Finding near duplicates
 
## Finding Near-duplicates

The `find-near-duplicates.py` function can be used to identify duplicates and near-duplicates in an instruction dataset. For example,

Expand All @@ -23,6 +23,7 @@ python find-near-duplicates.py --json_file instruction-examples.json
```

```
scikit-learn version: 1.3.1
==================================================
Expand Down Expand Up @@ -69,3 +70,17 @@ Duplicate pair found with similarity 1.00:
```


 
## Creating Passive Voice Entries

- The [create-passive-voice-entries.ipynb](create-passive-voice-entries.ipynb) notebook uses OpenAI's GPT-4 to create "passive voice" entries for an instruction dataset, as shown in the example below

```python
{
'instruction': 'Identify the verb in the following sentence',
'input': 'The cat sleeps on the couch.',
'output': 'The verb in the sentence is "sleeps."',
'output_2': 'The sentence is "sleeps."' # <---- Newly created entry
}
```
Loading

0 comments on commit 0c253ad

Please sign in to comment.