Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/doc edits feb 6 #147

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 47 additions & 79 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,72 +2,67 @@

<img src="https://raw.githubusercontent.com/SFDO-Community-Sprints/DataGenerationToolkit/master/Assets/DataGenerationLogoFinal051320.png" width="300" alt="Data Generation Toolkit Logo featuring Astro in a rain of data" style="float:right" >

> This project aims to create toolkit of tools and documentation for generating test data sets based on admin-selected criteria.
> This project aims to document a toolkit of tools for generating test data sets based on admin-selected criteria.

> This project a proud part of Salesforce.org Open Source Commons initiative.

**[Read our guide on getting data into your Salesforce Sandbox!](https://sfdo-community-sprints.github.io/DataGenerationToolkit/DataGenGuide)**

### Problem
Salesforce Admins, Developers, and Consultants need to populate test environments with valid data. The existing methods for completing this task are costly, inefficient, or inaccessible. Utilizing workaround solutions - or worse, making demos/changes in Production - are disruptive and potentially hazardous.
Salesforce Admins, Developers, and Consultants need to populate test environments with [valid test data](#working-shared-definitions). The existing methods for completing this task can be costly, inefficient, or inaccessible. Utilizing workaround solutions - or worse, making demos/changes in Production - are disruptive and potentially hazardous.

### Solution
We propose a combination of tooling, documentation, and advocacy efforts that offer technical and recommended best practices the problem described above. By developing tools like Snowfakery, as well as documenting existing tools and methods, we empower Admins, Developers, and Consultants to populate Salesforce environments with valid data sets.
We provide a combination of tooling, documentation, and advocacy efforts that offer technical and recommended best practices to solve the problem described above. By developing tools like Snowfakery, as well as documenting other tools and methods, we empower Admins, Developers, and Consultants to populate Salesforce environments with [valid test data](#working-shared-definitions) sets.

### Project Vision/Goals
- Every Admin, Developer, and Consultant has access to test environments populated with valid data
- Ability to generate data sets that follow a story
- Ability to generate data sets that are valid for small, medium, and large data volumes (including LDV) for testing purposes
- Ability to generate data sets that are fake but realistic-seeming
- Open source and free tools facilitate not only data generation, but also sharing best practices *about* data generation between orgs
- High quality documentation guides users to the appropriate method for their data generation goals and how to execute that method
- Support for declarative and developer focused solutions
- Striving for admin-friendly UI for creating data sets [tentatively titled SnowMakery]
- This project places a particular emphasis on leadership development, with active participation from many community members who are underrepresented in technology, less familiar with code solutions, and/or early in their career. Contributors who "stick" can be described as: ambitious, stubborn, playful, creative
- The Data Generation Toolkit leadership team also maintains the [Snowfakery Recipe Repo](https://github.com/SFDO-Community-Sprints/Snowfakery-Recipe-Templates)
## Sub-Projects

### Working Shared Definitions
- Test Environments: this project supports working in sandbox, scratch orgs, or dev orgs. Certain use cases for production orgs are valid, but should be attempted with caution and care!
- Data Set: a collection of related records that include multiple objects and fields.
- Valid test data: the resulting data set produced through these methods should meet user-defined critiera, including matching org schema, volume of records, support for record types, custom fields, records related to other records etc.
The Data Generation Toolkit Project consists of several sub-projects, each focused on one area of the larger challenge we are addressing.

#### Example Use Cases We Aim to Support:
- Quality assurance (QA) testing an org populated with permutations of all, or nearly all, the types of data that are relevant to the project.
- Be able to reliably and easily load sample data into any connected Salesforce org
- Have a data set for demos, potentially with the ability to add specific sets of data that could be used for story based training materials.
- Have a data set that ensures the privacy of people represented in the dataset (for example, not real names).
- Have data sets at scales that allow for testing bulk data processing.
### Snowfakery Recipes

### Ongoing Task Streams:
Our largest project is our collection of Snowfakery recipes meant to be used for learning and extensions. There are recipes to generate test data for Salesforce Core, NPSP, EDA, and several open source commons projects.

Currently the project team has several major efforts for our work:
### Guides

- [Maintaining a repository of community-sourced Snowfakery recipes](https://github.com/SFDO-Community-Sprints/Snowfakery-Recipe-Templates)
- Documenting declarative methods and tools to seed Sandboxes with valid data
- Evangelizing Snowfakery through [public events, blog posts,](https://github.com/SFDO-Community-Sprints/DataGenerationToolkit/wiki/Talks---Webinars)
We have generated several guides to help support admins and others use need to seed sandboxes or use Snowfakery for other projects.

- On hold: Creation of _Snowmakery_ a community supported tool that will empower easy creation of data sets tailored to an org's needs.
You can see all the guides on our [guide index](https://sfdo-community-sprints.github.io/DataGenerationToolkit/).

### Project Accomplishments:
* [Meet Alex](https://sfdo-community-sprints.github.io/DataGenerationToolkit/DataGenGuide.html) (Alex's user story)
* [Guide to getting data into your Salesforce Sandbox!](https://sfdo-community-sprints.github.io/DataGenerationToolkit/DataGenGuide)* Snowfakery Best Practices (Planned)
* Snowfakery cheat sheet (planned)
* Using Recipes (planned)
* [Contributing Recipes](https://github.com/SFDO-Community-Sprints/Snowfakery-Recipe-Templates/tree/main)

- [Project Personas](https://raw.githubusercontent.com/SFDO-Community-Sprints/DataGenerationToolkit/master/Assets/DataGenPersonas_202102.pdf): a collection of people we target our documentation about and to.
- [Meet Alex](https://sfdo-community-sprints.github.io/DataGenerationToolkit/DataGenGuide): a guide to populating sandboxes with useful data
- [Snowfakery Example Library](https://github.com/SFDO-Community-Sprints/Snowfakery-Recipe-Templates): a growing collection of Snowfakery data generator recipes for Salesforce.

We keep our most recent full notes from meetings and Sprints in the [project wiki](https://github.com/SFDO-Community-Sprints/DataGenerationToolkit/wiki).
### Education Fakery Provider

#### Proof of Concept Code:
A [Python-based Faker project](https://pypi.org/project/faker-edu/) to create higher education fake data. Included in Snowfakery since 3.6. Also available as a Python extension that can be used with the main Faker library. [Code for this library is found on Github.](https://github.com/SFDO-Community-Sprints/Snowfakery-Edu)

During the course of this project some ideas have been tested in code, and to ensure that code isn't lost and doesn't confuse other efforts in this repo they are kept in the proofs-of-concept directory.
### Nonprofit Fakery Provider

Currently those include:
A [Python-based Faker project to create nonprofit-related fake data](https://pypi.org/project/faker-nonprofit/). Included in Snowfakery since 3.6. Also available as a Python extension that can be used with the main Faker library. [Code for this library is found on Github.](https://github.com/SFDO-Community-Sprints/Snowfakery-nonprofit)

1. [A CumulusCI Task](proofs-of-concept/OriginalCciTask) created during this project's first sprint to generate permutations of data.
1. [A web-based Snowfakery recipe editor](proofs-of-concept/SnowmakeryEditor) created during the fall 2020 virtual sprint. This was the second tool named Snowmakery and largely confirmed that this project's UI will bare that name.
## General Project Vision/Goals

- Open source and free tools facilitate not only data generation, but also sharing best practices *about* data generation between orgs
- High quality documentation guides users to the appropriate method for their data generation goals and how to execute that method
- We support both declarative and developer focused solutions

Every Admin, Developer, and Consultant will have:
- ability to generate data sets that are fake but realistic-seeming
- ability to generate data sets that follow a story
- ability to generate data sets that are valid for small, medium, and large data volumes (including LDV) for testing purposes
- access to test environments populated with valid data


### Working Shared Definitions

- *Valid test data*: the resulting data set produced through these methods should meet user-defined critiera, including matching org schema, volume of records, support for record types, custom fields, records related to other records, etc.
- *Test Environments*: this project supports working in sandbox, scratch orgs, or dev orgs. Certain use cases for production orgs are valid, but should be attempted with caution and care!
- *Data Set*: a collection of related records that include multiple objects and fields.

### Project Meetings

The main project team meets monthly to maintain momentum between sprints. You can contact us through the [Trailblazer Community](https://trailhead.salesforce.com/trailblazer-community/groups/0F94S000000kHjVSAU) to get details for joining those meetings. Notes from project meetings are recorded on the [wiki for this repository](https://github.com/SFDO-Community-Sprints/DataGenerationToolkit/wiki).
The main project team meets monthly to maintain momentum between sprints. You can contact us through the [Trailblazer Community](https://trailhead.salesforce.com/trailblazer-community/groups/0F94S000000kHjVSAU) to get details for joining those meetings. Notes from project meetings and Sprints are recorded in the [wiki for this repository](https://github.com/SFDO-Community-Sprints/DataGenerationToolkit/wiki).

### Project Audience

Expand All @@ -81,47 +76,19 @@ To help people identify with specific audiences while creating documentation we

Help us give you the thanks you deserve and ensure future contributors know who to contact if they have questions! Please ensure that all contributing members of the team are included.

- Team Leader(s):
- Aaron Crosman (Attain)
Team Leader(s):
- Aaron Crosman (Coastal Cloud)
- Eilleen Kapp
- Allison Letts (Attain Partners)
- Jung Mun
- Samantha Shain (William Penn Foundation)
- Cassie Supilowski (OneGoal)
- Salesforce Liaison:
- Paul Prescod (Salesforce.org)

We also keep a [complete list of contributors](https://github.com/SFDO-Community-Sprints/DataGenerationToolkit/wiki/Project-Contributors). Please add yourself to the list!

### Past Project Accomplishments

- *Between Feb '21 Virtual Sprint and present*
- Moved Snowfakery recipe cookbook into an independent, community maintained repo
- Designed Meeting in a Box for User Group and Conference presentations on sandbox seeding
- Defined use case for seeding a sandbox with a Flow
- Scheduled quarterly, public Snowfakery trainings
- *Between Sept '20 Virtual Sprint and Feb '21 Virtual Sprint*
- Completed first full draft of Data Generation Guide document; including technical edit
- Kicked the tires on Snowfakery, for real
- Made several public presentations
- Implemented Project Boards to track issues and discussion topics
- Survived despite pandemic, fascism, etc.
- *9/23/20-9/24/20 Virtual Sprint*
- Drafted architecture diagrams for a UI to sit on top of Snowfakery. This will be called _Snowmakery!_ There are two proof of concepts that have been initiated. One is inside of the org, the other is in Heroku.
- Analyzed survey data from 75 community members; updated relevant Personas based on survey insights
- Established document outline and Admin Story for documentation project that describes how to Generate Data and Move Data between orgs
- Documented limitations and considerations for Partial Data sampling algorithm and manual steps, third party apps, and code for creating mock data records
- Documented steps for an Admin-audience to use CCI to move data records from one Dev Sandbox to another Dev Sandbox (or any two persistent orgs)
- QA-ed documentation steps for CCI steps (referenced above)
- *3/31/20 -> 4/1/20 virtual sprint*
- Determined that Snowfakery accomplishes many of the original requirements brainstormed at the Philly Sprint (fall 2019) (namely: ability to generate mock data with related tables, random names and values, standard and custom objects, datasets of any size/scale, ability to populate Salesforce orgs)
- Socialized Snowfakery to community members
- Onboarded project leadership from multiple orgs and began application for inclusion in Open Source Commons program
- Refined use cases and differentiated Snowfakery from (1) existing tools in the market (2) proprietary tools at Salesforce (3) Full Sandbox product
- Brainstormed 2+ potential directions for extending Snowfakery to include an admin-friendly web interface
- Reviewed documentation and install steps
- Overhauled ReadMe file

## Additional Useful References

- [Project Meeting Notes](https://github.com/SFDO-Community-Sprints/DataGenerationToolkit/wiki).
## Project References

- [Project Meeting Notes](https://github.com/SFDO-Community-Sprints/DataGenerationToolkit/wiki)
- [Snowfakery](https://github.com/SFDO-Tooling/Snowfakery)
- [Main Docs](https://snowfakery.readthedocs.io/en/docs/)
- [Paul's SF Architechs blog on snowfakery](https://medium.com/salesforce-architects/generate-realistic-datasets-with-snowfakery-5349225b033d)
Expand All @@ -137,3 +104,4 @@ We also keep a [complete list of contributors](https://github.com/SFDO-Community
- [NPSP Data Dictionary](https://attain-projects.quip.com/yD1wAsdz1m1Q/NPSP-Public-Data-Dictionary)
- [Wave Data Generator](https://github.com/ttse-sfdc/sfdc-wave-data-generator) (generates data for Salesforce org, and builds linkages between objects)
- [JSON/YAML Editor](https://json-editor.github.io/json-editor/)
- [Project History](docs/ProjectHistory.md)
37 changes: 37 additions & 0 deletions docs/ProjectHistory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
### Proof of Concept Code:

During the course of this project some ideas have been tested in code, and to ensure that code isn't lost and doesn't confuse other efforts in this repo they are kept in the proofs-of-concept directory.

Currently those include:

1. [A CumulusCI Task](proofs-of-concept/OriginalCciTask) created during this project's first sprint to generate permutations of data.
1. [A web-based Snowfakery recipe editor](proofs-of-concept/SnowmakeryEditor) created during the fall 2020 virtual sprint. This was the second tool named Snowmakery and largely confirmed that this project's UI will bare that name.

### Past Project Accomplishments

- *Between Feb '21 Virtual Sprint and present*
- Moved Snowfakery recipe cookbook into an independent, community maintained repo
- Designed Meeting in a Box for User Group and Conference presentations on sandbox seeding
- Defined use case for seeding a sandbox with a Flow
- Scheduled quarterly, public Snowfakery trainings
- *Between Sept '20 Virtual Sprint and Feb '21 Virtual Sprint*
- Completed first full draft of Data Generation Guide document; including technical edit
- Kicked the tires on Snowfakery, for real
- Made several public presentations
- Implemented Project Boards to track issues and discussion topics
- Survived despite pandemic, fascism, etc.
- *9/23/20-9/24/20 Virtual Sprint*
- Drafted architecture diagrams for a UI to sit on top of Snowfakery. This will be called _Snowmakery!_ There are two proof of concepts that have been initiated. One is inside of the org, the other is in Heroku.
- Analyzed survey data from 75 community members; updated relevant Personas based on survey insights
- Established document outline and Admin Story for documentation project that describes how to Generate Data and Move Data between orgs
- Documented limitations and considerations for Partial Data sampling algorithm and manual steps, third party apps, and code for creating mock data records
- Documented steps for an Admin-audience to use CCI to move data records from one Dev Sandbox to another Dev Sandbox (or any two persistent orgs)
- QA-ed documentation steps for CCI steps (referenced above)
- *3/31/20 -> 4/1/20 virtual sprint*
- Determined that Snowfakery accomplishes many of the original requirements brainstormed at the Philly Sprint (fall 2019) (namely: ability to generate mock data with related tables, random names and values, standard and custom objects, datasets of any size/scale, ability to populate Salesforce orgs)
- Socialized Snowfakery to community members
- Onboarded project leadership from multiple orgs and began application for inclusion in Open Source Commons program
- Refined use cases and differentiated Snowfakery from (1) existing tools in the market (2) proprietary tools at Salesforce (3) Full Sandbox product
- Brainstormed 2+ potential directions for extending Snowfakery to include an admin-friendly web interface
- Reviewed documentation and install steps
- Overhauled ReadMe file