25 Feb 21:53

kenlhlui

fe51828

v0.1.4 Latest

Latest

1. Feature updates

Added counting deaccession/draft datasets being crawled into the log.
Added end of crawling message (✅ Crawling process completed successfully.)

2. Bug fixes

Removed deaccession/draft datasets metadata from failed_metadata_uris_yyyymmdd-HHMMSS.json. These metdata record will now only showed in pid_dict_dd_yyyymmdd-HHMMSS.json.
Removed non-created JSON file output listed in the log.

Full Changelog: v0.1.3...v0.1.4

Assets 2

04 Feb 22:10

kenlhlui

v0.1.3

1b559a5

v0.1.3

1. Feature updates

Change example.ipynb to colud_cli.ipynb to better represent the use of the notebook.
Updated colud_cli.ipynb to support interactive BASE_URL and API_KEY input, for creating the .env file

2. Others

Updated the poetry-export_dependencies.yml (GitHub workflow file) to update the requirements.txt and poetry.lock files in a CI/CD manner.

Full Changelog: v0.1.2...v0.1.3

Assets 2

03 Feb 16:34

kenlhlui

v0.1.2

d0b024d

v0.1.2

1. Feature updates

Added example.ipynb for launching the tool in - no Git or Python install required.
Updated handling of checking connection. If the API_KEY input by the user is invalid, the tool will now fall back to using unauthenticated connection for crawling.

2. Others

Changed defining headers for making GET requests to MetaDataCrawler.

Full Changelog: v0.1.1...v0.1.2

Assets 2

28 Jan 22:17

kenlhlui

v0.1.1

7c60b04

v0.1.1

1. Schema changes

The key for ds_metadata in the dataset will now use dataset IDs (unique identifiers for each dataset version in the Dataverse system). Example:

# Old version
  "doi:10.5072/FK2/DUGFC4": {  # datasetPersistentId
    "status": "OK",
    "data": {
      "id": 850,
      "datasetId": 2663,
      "datasetPersistentId": "doi:10.5072/FK2/DUGFC4",
...

# New version
{
  "2663": {  # datasetId
    "status": "OK",
    "data": {
      "id": 850,
      "datasetId": 2663,
      "datasetPersistentId": "doi:10.5072/FK2/DUGFC4",
...

ds_metadata_yyyymmdd-HHMMSS.json now contains data, path_info and permission_info at the second-level.

{
  ...
    "status": "OK",
    "data": {
    ...
    },
    "path_info": {
    ...
    },
   "permission_info": {
   ...
 },

Changes to the following fields in path_info for consistency with the new schema:

collection_alias -> CollectionAlias
collection_id -> CollectionID
pid -> datasetPersistentId
ds_id -> datasetId
path_ids -> path_ids

# Old version
...
    "path_info": {
      "collection_alias": "toronto",
      "collection_id": 22,
      "pid": "doi:10.5072/FK2/DUGFC4",
      "ds_id": 2663,
      "path": "/Nick Field Dataverse",
      "path_ids": [
        2641
      ]
    }

# New  version
...
    "path_info": {
      "CollectionAlias": "toronto",
      "CollectionID": 22,
      "datasetPersistentId": "doi:10.5072/FK2/DUGFC4",
      "datasetId": 2663,
      "path": "/Nick Field Dataverse",
      "pathIds": [
        2641
      ]
    }

2. Feature updates

Comibed the representation (-d) and permission (-p) metadata into ds_metadata_yyyymmdd-HHMMSS.json as a single JSON file.
Added the following permission roles count of dataset (DS_Collab, DS_Admin, DS_Contrib, DS_ContribPlus, DS_Curator, DS_FileDown, DS_Member) for spreadsheet output - Only available if -p is enabled

3. Bug Fixes

Corrected spelling mistakes in the README file.
Restored missing fields for representation metadata in the spreadsheet:

TermsOfUse
CM_AuthorAff
CM_TimeEnd
CM_CollectionStart
CM_CollectionEnd

Fixed handling -f responses with None objects.

Assets 2

28 Jan 21:52

kenlhlui

v0.1.0

3ef6dab

v0.1.0

Inital release

Full Changelog: https://github.com/scholarsportal/dataverse-metadata-crawler/commits/v0.1.0

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. Feature updates

2. Bug fixes

1. Feature updates

2. Others

1. Feature updates

2. Others

1. Schema changes

2. Feature updates

3. Bug Fixes

Releases: scholarsportal/dataverse-metadata-crawler

v0.1.4

1. Feature updates

2. Bug fixes

v0.1.3

1. Feature updates

2. Others

v0.1.2

1. Feature updates

2. Others

v0.1.1

1. Schema changes

2. Feature updates

3. Bug Fixes

v0.1.0