-
-
Notifications
You must be signed in to change notification settings - Fork 68
Implementation: Doaj fetch script for open access journals #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Migrate from DOAJ API v3 to v4 for enhanced metadata access - Add comprehensive CC license analysis for academic journals - Implement publisher and geographic distribution analysis - Add programmatic ISO 3166-1 alpha-2 country code generation - Include automatic dependency resolution and error handling - Apply date filtering (default ≥2002) to prevent false positives - Generate 5 CSV files plus provenance for comprehensive analysis - Ensure static analysis compliance and comprehensive testing This integration enables quantification of institutional commitment to Creative Commons licensing in the scholarly publishing ecosystem.
|
@TimidRobot , Hello I have attempted to implement the fetch script to collect CC license information from the doaj datasource using its API. To eliminate false positives, the API fetches a license from a field, which is the actual journal licenses. I have also set a |
scripts/1-fetch/doaj_fetch.py
Outdated
| LOGGER.error(f"Failed to generate country codes file: {e}") | ||
| raise shared.QuantifyingException( | ||
| f"Critical error generating country codes: {e}", exit_code=1 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to log an error here, as the raised exception will log a message to the terminal
| if not license_info: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering, wouldn't it be better to log a warning here saying that you skipped this journal because there is no CC license?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will depend on how many warnings are generated. If the minority of log messages are warnings, I think they'll be helpful. If the majority are, then it becomes noise.
dev/generate_country_codes.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this script and use pycountry instead (which will also require updated pipenv files)
|
The data returned appears to focus primarily on articles. Given the lack of licensing information on the articles, I think the focus should be on the journals with article information providing context. Even though a lot of the data currently returned is really interesting, I think it is out of scope for this project. |
@TimidRobot, The script actually focuses on Journals, as this is the only available records with license fields. Articles in the DOAJ database do not have license fields, and doing a full |
Fixes
Description
This PR adds comprehensive DOAJ API v4 integration to the quantifying commons project, enabling collection and analysis of Creative Commons licensed academic journals. The implementation includes two main components:
scripts/1-fetch/doaj_fetch.py- Main data collection script for DOAJ journalsdev/generate_country_codes.py- Utility for programmatic ISO country code generationKey Features
Useful Links
Articles:
Journals:
Technical details
API Integration
Data Quality Measures
--date-back=2002to avoid retroactive CC license false positivesOutput Files Generated
Query Strategy
License Extraction
Date Filtering Implementation
Publisher Analysis
Auto-Dependency Resolution
Tests
Basic Code Execution
Data Quality Note
Please Note: DOAJ data represents journal-level licensing policies, not individual article licenses. This data should be interpreted as indicators of institutional commitment to CC licensing rather than precise counts of CC-licensed articles. The
--date-back=2002default prevents false positives from journals that retroactively adopted CC licenses.Checklist
Update index.md).mainormaster).visible errors.
Developer Certificate of Origin
For the purposes of this DCO, "license" is equivalent to "license or public domain dedication," and "open source license" is equivalent to "open content license or public domain dedication."
Developer Certificate of Origin