This repository contains the source code for the complete workshop. You will follow the step-by-step guide below, completing all the steps while working with data and GitHub Copilot within Codespaces.
Note
This repo is intended to give an introduction to various GitHub Copilot features, such as Copilot Chat and inline chat. Hence the step-by-step guides below contain the general description of what needs to be done, and Copilot Chat or inline chat can support you in generating the necessary commands.
Each step (where applicable) also contains a Cheatsheet
which can be used to validate the Copilot suggestion(s) against the correct command.
💡 Play around with different prompts and see how it affects the accuracy of the GitHub Copilot suggestions. For example, when using inline chat, you can use an additional prompt to refine the response without having to rewrite the whole prompt.
In this workshop, you will be working with data from a CSV file included in this repository, as well as a Python script file that will interact with the CSV file. Here are some features of the project you will work with:
- Consume a CSV dataset and perform transformations
- Identify and implement validations
- Create a command-line tool that can be used in CI/CD environments
This repository is Codespaces-ready and is pre-configured so that you have all dependencies installed including the Visual Studio Code extensions necessary to work with GitHub Copilot and Python:
- GitHub Copilot
- Python extension
- Pre-installed Python dependencies with an activated Virtual Environment
Note
If using this repository in your account or a non-GitHub-Universe organization, you might incur in charges or consumption of your free quoota for Codespaces.
Progress: [🟢⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪] 1/12 (8%)
⏳ ~2min
- Click
Use this template
👉Create a new repository
- Set the owner to the GitHub Workshop organization:
githubuniverseworkshop
- Give it a name
- Set visibility to
Private
- Click
Create repository
Progress: [🟢🟢⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪] 2/12 (16%)
⏳ ~3min
-
In the newly created repo, click
Code
👉Codespaces
👉[ellipsis menu]
👉New with options
👉 Ensure thatDev container configuration
is set toDefault project configuration
👉Create codespace
-
❗If you're having problems launching the Codespace then you can also clone the repo and continue from here in your IDE:
git clone https://github.com/<YOUR_NAME_SPACE>/<YOUR_REPO_NAME>.git cd <YOUR_REPO_NAME>
📝 Note: There is no need to push changes back to the repo during the workshop
Progress: [🟢🟢🟢⚪⚪⚪⚪⚪⚪⚪⚪⚪] 3/12 (25%)
⏳ ~2min
- Use the command palette to toggle the terminal (search for "Create new terminal")
- Run
which python
and make sure it points to the Virtual Environment (home/vscode/venv/bin/python
) - Run
which pip
and ensure that it also points to the Virtual Environment (home/vscode/venv/bin/pip
)
Progress: [🟢🟢🟢🟢⚪⚪⚪⚪⚪⚪⚪⚪] 4/12 (33%)
⏳ ~2min
-
Run the
main.py
script and confirm no errors occur:python main.py
-
Run the
check.py
script and confirm no errors occur:python check.py
There should be some OK and some FAIL lines:
[OK ] verify_drop_notes [FAIL] verify_high_ratings - Not all ratings are 90 or higher. [FAIL] verify_one_hot_encoding - The 'Red Wine' column was not one-hot encoded correctly. [OK ] verify_remove_newlines_carriage_returns [FAIL] verify_ratings_to_int - The 'ratings' column was not converted to integers correctly.
Progress: [🟢🟢🟢🟢🟢⚪⚪⚪⚪⚪⚪⚪] 5/12 (41%)
⏳ ~2min
GitHub Copilot benefits from having context. One way to enhance context is by opening relevant files.
- Open the
main.py
,check.py
,train.csv
, andtransformed_train.csv
files
Progress: [🟢🟢🟢🟢🟢🟢⚪⚪⚪⚪⚪⚪] 6/12 (50%) - Half way through!
⏳ ~5min
- Open GitHub Copilot Chat
- Use the
@workspace
agent to ask Copilot what themain.py
andcheck.py
files are doing - Again, with the
@workspace
agent, ask Copilot what is the nature of the data you are going to work with
Progress: [🟢🟢🟢🟢🟢🟢🟢⚪⚪⚪⚪⚪] 7/12 (58%)
⏳ ~3min
- In
main.py
find theselect_high_ratings()
function - Insert your cursor in the function signature or body somewhere
- Open inline chat with
Ctrl-i
orCmd-i
(on Mac) - Prefix your prompt with the
/explain
slash command - Ask why this function might not be working and implement any potential fixes
- Re-generate
transformed_train.csv
by runningpython main.py
- Verify the fix by running
python check.py
with this function returning anOK
Cheatsheet
/explain why this function might not be working correctly?
Based on the provided code, there are a few potential reasons why the `select_high_ratings` function might not be working correctly:
1. Typo in Column Name: The function checks if the column name 'ratings' exists in the DataFrame `df.columns` If there is a typo in the column name, such as 'rating' instead of 'ratings', the condition `if 'ratings' in df.columns`
Note
Feel free to use the GitHub Copilot Chat to implement or corroborate the fix
Progress: [🟢🟢🟢🟢🟢🟢🟢🟢⚪⚪⚪⚪] 8/12 (66%)
⏳ ~3min
- Select the
drop_and_one_hot_encode_red_wine()
function and open inline chat (Ctrl-i/ Cmd-i) - Use the
/fix
slash command to ask GitHub Copilot to suggest a potential fix - Re-generate
transformed_train.csv
by runningpython main.py
- Verify the fix by running
python check.py
with this function returning anOK
Cheatsheet
This function is not doing one-hot encoding on the variety column
def drop_and_one_hot_encode_red_wine(df):
"""
Create a 'Red_Wine' column that is 1 if 'variety' is 'Red Wine' and 0 otherwise.
Drop the original 'variety' column.
"""
if 'variety' in df.columns:
df['Red_Wine'] = df['variety'].apply(lambda x: 1 if x == 'Red Wine' else 0)
df = df.drop(columns=['variety'])
return df
Progress: [🟢🟢🟢🟢🟢🟢🟢🟢🟢⚪⚪⚪] 9/12 (75%)
⏳ ~3min
- If your
verify_ratings_to_int
check is already "OK", then Copilot may have already fixed this issue for you. If not, continue with the following steps. - Select the
convert_ratings_to_int()
function and open inline chat (Ctrl-i / Cmd-i) - Use the
/explain
slash command to ask GitHub Copilot why this function might not be working correctly - Identify the problem and implement the fix
- Re-generate
transformed_train.csv
by runningpython main.py
- Verify the fix by running
python check.py
with this function returning anOK
Cheatsheet
This function is not converting the ratings to integers
def convert_ratings_to_int(df):
"""
Convert the 'rating' column from float to integer.
"""
if 'rating' in df.columns:
df['rating'] = df['rating'].to_int()
return df
A robust Python project can benefit from testing. You will add some unit tests and find more on how to run these tests with the help of GitHub Copilot.
Make sure you have the terminal open by running the command palette and typing toggle terminal
Progress: [🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢⚪⚪] 10/12 (83%)
⏳ ~5min
- This project has the Pytest test runner and framework pre-installed, make sure it is available by running
pytest --version
- Run the tests while in the
workshop
directory by using the following command:pytest -v
The output should give you a failure.
Progress: [🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢⚪] 11/12 (91%)
⏳ ~5min
- Select the output in the terminal and paste it into GitHub Copilot Chat, prefixing with a request for a fix and an explanation
- Use the output from the chat to implement a fix in
test_main.py
Cheatsheet
Fix the test and explain your fix:
// Copy and paste the output from the terminal here
The error occurs because the drop_notes function is likely modifying the DataFrame in place, rather than returning a new DataFrame with the 'notes' column dropped. To fix the test, we need to ensure that the drop_notes function returns a new DataFrame without the 'notes' column and then check the columns of this new DataFrame.
Here is the corrected test function:
def test_drop_notes_column_dropped():
# Create a sample DataFrame with 'notes' column
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'notes': ['note1', 'note2', 'note3']})
# Call the drop_notes function and get the result
result = main.drop_notes(df)
# Assert that 'notes' column in result dataframe is dropped
columns = result.columns.to_list()
assert columns == ['A', 'B']
By storing the result of the drop_notes function in a new variable and checking its columns, we ensure that the test correctly verifies the behavior of the function.
Progress: [🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢] 12/12 (100%) - 🎉 Celebrate!
⏳ ~5min
- Ensure that the
python check.py
has all OK output, otherwise this step might not work. - Select the
drop_and_one_hot_encode_red_wine
function inmain.py
- Open inline chat with
Ctrl-i
/Cmd-i
- Use the
/tests
slash command to get suggestions on tests - Review the output, accept it, and verify with the
pytest -v
command. All tests should be passing, and you should have more tests than before. Sometimes Copilot will indent the suggested tests incorrectly (nesting them inside an existing test function), so make sure the tests are correctly indented before running them.
Cheatsheet
/tests
def test_drop_and_one_hot_encode_red_wine_red_wine_column_created():
# Create a sample DataFrame with 'variety' column
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'variety': ['Red Wine', 'White Wine', 'Red Wine']})
# Call the drop_and_one_hot_encode_red_wine function
result = main.drop_and_one_hot_encode_red_wine(df)
# Assert that 'Red_Wine_Red Wine' column is created
assert 'Red_Wine_Red Wine' in result.columns
def test_drop_and_one_hot_encode_red_wine_red_wine_column_encoded():
# Create a sample DataFrame with 'variety' column
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'variety': ['Red Wine', 'White Wine', 'Red Wine']})
# Call the drop_and_one_hot_encode_red_wine function
result = main.drop_and_one_hot_encode_red_wine(df)
# Assert that 'Red_Wine_Red Wine' column is encoded correctly
assert result['Red_Wine_Red Wine'].tolist() == [1, 0, 1]
def test_drop_and_one_hot_encode_red_wine_variety_column_dropped():
# Create a sample DataFrame with 'variety' column
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'variety': ['Red Wine', 'White Wine', 'Red Wine']})
# Call the drop_and_one_hot_encode_red_wine function
result = main.drop_and_one_hot_encode_red_wine(df)
# Assert that 'variety' column is dropped
assert 'variety' not in result.columns
def test_drop_and_one_hot_encode_red_wine_dataframe_unchanged():
# Create a sample DataFrame with 'variety' column
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'variety': ['Red Wine', 'White Wine', 'Red Wine']})
# Call the drop_and_one_hot_encode_red_wine function
result = main.drop_and_one_hot_encode_red_wine(df)
# Assert that the shape of the DataFrame is unchanged
assert result.shape == (3, 3)
# Assert that the original DataFrame is not modified
assert 'variety' in df.columns
assert df.shape == (3, 3)
There are a couple of bonus challenges if you've completed all the tasks and your scripts are in good shape.
Progress: [🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢] 12/12 (100%) 🎉
Optional: [🚀 Turbo Boost! (1/2) 33%]
- Use GitHub Copilot chat with the
@workspace
prefix to convert the project into a CLI with options and a help menu - Ensure that the prompt specifies no external dependencies and it should only use the standard library
Cheatsheet
@workspace I want to convert this project into a CLI with a help menu. Help me do this without using any dependencies, just pure Python standard library
def main():
parser = argparse.ArgumentParser(description="DataFrame manipulation CLI")
subparsers = parser.add_subparsers(dest="command")
subparsers.add_parser("drop_notes", help="Drop the 'notes' column from the DataFrame")
subparsers.add_parser("select_high_ratings", help="Select rows where the 'rating' column is 90 or higher")
subparsers.add_parser("drop_and_one_hot_encode_red_wine", help="One-hot encode 'Red Wine' and drop 'variety' column")
subparsers.add_parser("remove_newlines_carriage_returns", help="Remove newlines and carriage returns from string columns")
subparsers.add_parser("convert_ratings_to_int", help="Convert the 'rating' column from float to integer")
args = parser.parse_args()
# Load your DataFrame here
df = pd.read_csv('workshop/train.csv')
if args.command == "drop_notes":
df = drop_notes(df)
elif args.command == "select_high_ratings":
df = select_high_ratings(df)
elif args.command == "drop_and_one_hot_encode_red_wine":
df = drop_and_one_hot_encode_red_wine(df)
elif args.command == "remove_newlines_carriage_returns":
df = remove_newlines_carriage_returns(df)
elif args.command == "convert_ratings_to_int":
df = convert_ratings_to_int(df)
else:
parser.print_help()
# Save the transformed DataFrame
df.to_csv('workshop/transformed_train.csv', index=False)
if __name__ == "__main__":
main()
Progress: [🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢] 12/12 (100%) 🎉
Optional: [🚀 Turbo Boost! (1/2) 33%]
Optional: [🌟 Extra Points! (2/2) 66%]
- Create a new
docs/
folder and a file calledREADME.md
- Use GitHub Copilot chat with the
@workspace
prefix to get started documenting your project using Markdown in a README.md - In your prompt, ask for help documenting the project goals, the nature of the data in the CSV files, and how the CLI works.
Cheatsheet
@workspace help me create a good README.md file in Markdown so that I can document this and help others understand how it works and the nature of the data
# DataFrame Manipulation CLI
This project provides a command-line interface (CLI) for manipulating a DataFrame using various operations. The CLI is built using the Python standard library and does not require any external dependencies.
## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
- [Commands](#commands)
- [Data](#data)
- [Contributing](#contributing)
- [License](#license)
## Installation
1. Clone the repository:
```sh
git clone https://github.com/yourusername/your-repo.git
cd your-repo
```
2. Set up a virtual environment:
```sh
python3 -m venv venv
source venv/bin/activate
```
3. Install the required packages:
```sh
pip install -r workshop/requirements.txt
```
## Usage
To use the CLI, navigate to the `workshop` directory and run the `main.py` script with the desired command.
### Commands
- `drop_notes`: Drop the 'notes' column from the DataFrame.
- `select_high_ratings`: Select rows where the 'rating' column is 90 or higher.
- `drop_and_one_hot_encode_red_wine`: One-hot encode 'Red Wine' and drop the 'variety' column.
- `remove_newlines_carriage_returns`: Remove newlines and carriage returns from string columns.
- `convert_ratings_to_int`: Convert the 'rating' column from float to integer.
### Examples
1. Drop the 'notes' column:
```sh
python main.py drop_notes
```
2. Select rows with high ratings:
```sh
python main.py select_high_ratings
```
3. One-hot encode 'Red Wine' and drop 'variety' column:
```sh
python main.py drop_and_one_hot_encode_red_wine
```
4. Remove newlines and carriage returns from string columns:
```sh
python main.py remove_newlines_carriage_returns
```
5. Convert 'rating' column to integer:
```sh
python main.py convert_ratings_to_int
```
## Data
The data used in this project is stored in CSV files located in the `workshop` directory. The primary file is `train.csv`, which contains the following columns:
- `notes`: Text notes about the data.
- `ratings`: Numeric ratings for the data.
- `variety`: The variety of the data (e.g., 'Red Wine').
The transformed data is saved to `transformed_train.csv` after applying the CLI commands.
## Contributing
Contributions are welcome! Please follow these steps to contribute:
1. Fork the repository.
2. Create a new branch (`git checkout -b feature-branch`).
3. Make your changes.
4. Commit your changes (`git commit -m 'Add new feature'`).
5. Push to the branch (`git push origin feature-branch`).
6. Open a pull request.
## License
This project is licensed under the MIT License. See the [LICENSE](../LICENSE) file for details.
Progress: [🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢] 12/12 (100%) 🎉
Optional: [🚀 Turbo Boost! (1/2) 33%]
Optional: [🌟 Extra Points! (2/2) 66%]
Optional: [🏆 Triple Threat! (2/2) 100%]
- Create a new file called
transform-data.yaml
in the.github/workflows/
directory. - Use GitHub Copilot chat with the
@workspace
prefix to create a GitHub Action that will transform the data whenever a push or pull request is made to the repository. - Open a pull request to test the action. If any errors occur, use the GitHub Copilot chat to help you fix them.
Cheatsheet
@workspace Generate a GitHub action that transforms the data on every push and pull request
To create a GitHub Action that will transform the data using your CLI, you can create a workflow file in the `.github/workflows` directory. Here is an example of a GitHub Action workflow file named `transform-data.yml`:
```yaml
// Workflow omitted, since this is the final bonus!
```
This workflow will:
* Trigger on pushes to the main branch and on manual dispatch.
* Check out the repository.
* Set up Python.
* Install the necessary dependencies (in this case, pandas).
* Run the data transformation using the `run_all` command from your CLI.
* Upload the transformed data as an artifact.
⏳ ~1min
Before deleting, if you wish, you can push your changes. Remember workshop repositories are temporary too.
Go to https://github.com/codespaces and find your current running Codespace and delete it.
If you want to learn more about using GitHub Copilot, check out these resources:
- GitHub Copilot Documentation
- VS Code video series: GitHub Copilot
- Blog: Best practices for prompting Copilot
Also check out the GitHub Foundations learning path for more resources on GitHub and GitHub Copilot.