Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation of new IDs for entities #312

Open
silvester-pari opened this issue Jan 27, 2025 · 5 comments · May be fixed by ESA-EarthCODE/open-science-catalog-validation#18
Open

Generation of new IDs for entities #312

silvester-pari opened this issue Jan 27, 2025 · 5 comments · May be fixed by ESA-EarthCODE/open-science-catalog-validation#18

Comments

@silvester-pari
Copy link
Collaborator

Currently, contributors are required to add an id manually to files; this is prone to errors as the ID could e.g. be not unique. As a remedy, we should auto-generate IDs on behalf of the contributor. Possible options:

  • generate id from title property:
    • pro: gives a hint about the file contents (project, product etc.)
    • contra: if the title is later changed, the id might diverge substantially from it, possibly causing confusion
  • generate id randomly (UUID):
    • pro: guaranteed to be unique
    • contra: gives no hint at all about the file contents

In general, we should remind ourselves that the way to browse the catalog is by using the OSC GUI (powered by STAC Browser) and not the repository itself - this might influence our decision.

@edobrowolska
Copy link
Collaborator

From my perspective, the user should be guided about giving the title to the dataset (e.g. limited lenght of the title, reference to version, date of the dataset). The id from my perspective as it is for now is confusing (for users not familiar with the approach), and I would suggest going for the 2nd option, to prevent having the user struggles with assignment of unique id. Since it is not something that influences the accessibility and findability of the product (through OSC GUI) then I would suggest, that UUID assigned randomly would be better option.
@aapopescu what are your thoughts?

@silvester-pari
Copy link
Collaborator Author

After feedback gathering with @edobrowolska and @aapopescu, the second option (UUID) is the preferred one.

@silvester-pari
Copy link
Collaborator Author

silvester-pari commented Feb 14, 2025

There is a doubt arising, since an enforcement/validation of the UUID format would cause all existing entities to be invalid; see discussion at ESA-EarthCODE/open-science-catalog-validation#18

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 20, 2025

I find UUIDs very user unfriendly, especially as they bubble up to the user through the various osc: fields, e.g. "osc:experiment": "e59e411c-20ed-4dc7-ba85-e2df001e9f0b",. If you just slugify the title and ensure that there's no folder with the same name already (otherwise add a suffix or so), you can generate more userfriendly IDs and then it would be "osc:experiment": "polaris" instead.

Here's example code how a unique ID could be generated based on a given folder (assuming git-clerk does it on the server-side):

const fs = require('fs');
const path = require('path');

const FOLDER = './projects';
const TITLE = 'polaris';

function slugify(value) {
  return String(value)
    .replace(/[^a-z0-9]+/gi, '-') // Replace all sequences of non-alphanumeric characters with a dash
    .replace(/^-+|-+$/g, '') // Trim leading/trailing dashes
    .toLowerCase();
}

function generateId(folder, title, childFileName) {
  let slug = slugify(title);
  let i = 2;
  let id = slug;
  while (fs.existsSync(path.join(folder, id, childFileName))) { // this should be made async in production
    id = slug + '-' + i++;
  }
  return id;
}

console.log(generateId(FOLDER, TITLE, 'collection.json'));

// Generates e.g. polaris-2 as ID is polaris already exists

@Schpidi
Copy link
Member

Schpidi commented Mar 3, 2025

I tend to agree with you @m-mohr and changed the ids in the new workflow and experiment examples accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants