Skip to content

JOSS paper preparation#1249

Draft
danielfromearth wants to merge 23 commits into
mainfrom
joss-paper
Draft

JOSS paper preparation#1249
danielfromearth wants to merge 23 commits into
mainfrom
joss-paper

Conversation

@danielfromearth

@danielfromearth danielfromearth commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

Manuscript draft

This PR is intended for revisions and improvements to the manuscript draft being prepared for submission to the Journal of Open Source Software (JOSS).

Paper format: The manuscript is prepared as a Markdown (paper.md) file with references in a paper.bib file, following the JOSS formatting guidelines.

For a PDF preview: With docker installed locally, a PDF preview of the draft manuscript can be generated, by running the following from the earthaccess root directory (as described in the JOSS guidelines's docker section):

docker run --rm \
    --volume $PWD/paper:/data \
    --user $(id -u):$(id -g) \
    --env JOURNAL=joss \
    openjournals/inara

📚 Documentation preview 📚: https://earthaccess--1249.org.readthedocs.build/en/1249/

@github-actions

github-actions Bot commented Mar 5, 2026

Copy link
Copy Markdown

Binder 👈 Launch a binder notebook on this branch for commit 16fb7b9

I will automatically update this comment whenever this PR is modified

Binder 👈 Launch a binder notebook on this branch for commit 38cad6a

Binder 👈 Launch a binder notebook on this branch for commit 6af0701

Binder 👈 Launch a binder notebook on this branch for commit 767ad52

Binder 👈 Launch a binder notebook on this branch for commit dce192c

Binder 👈 Launch a binder notebook on this branch for commit ae74db7

Binder 👈 Launch a binder notebook on this branch for commit 05f7616

Binder 👈 Launch a binder notebook on this branch for commit bb5fd2f

Binder 👈 Launch a binder notebook on this branch for commit db3a969

Binder 👈 Launch a binder notebook on this branch for commit cf0f975

Binder 👈 Launch a binder notebook on this branch for commit 5852fa8

Binder 👈 Launch a binder notebook on this branch for commit 1b479c5

Binder 👈 Launch a binder notebook on this branch for commit 5029e59

Binder 👈 Launch a binder notebook on this branch for commit 2f8cab3

Binder 👈 Launch a binder notebook on this branch for commit 0e89b5e

Binder 👈 Launch a binder notebook on this branch for commit 5691bf8

Binder 👈 Launch a binder notebook on this branch for commit 1af1724

Binder 👈 Launch a binder notebook on this branch for commit 81b1384

Binder 👈 Launch a binder notebook on this branch for commit 13a9c14

Binder 👈 Launch a binder notebook on this branch for commit ce0ada4

Binder 👈 Launch a binder notebook on this branch for commit 77428e6

Binder 👈 Launch a binder notebook on this branch for commit 06ce30b

@jules32 jules32 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Great work on this Danny! A few commits and some suggestions to consider.

Comment thread paper/paper.md Outdated
Comment thread paper/paper.md
Comment thread paper/paper.md Outdated
Comment thread paper/paper.md Outdated
Comment thread paper/paper.md
Comment thread paper/paper.md Outdated
Comment thread paper/paper.md

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could symlink this in to our docs!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfisher87, want to create an issue for it?

Comment thread paper/paper.md Outdated
Comment thread paper/paper.md Outdated
@mfisher87

Copy link
Copy Markdown
Member

after the v1.0.0 release

I would say let's not wait. We've demonstrated impact and I think that matters more.

Alternatively, let's just go 1.0.0 in the short term and be OK with quickly moving to a 2.0.0 release with breaking changes.

I think both are fine, but the latter sets more a precedent of maintainers taking the user impact of breaking changes too lightly.

Co-authored-by: Matt Fisher <3608264+mfisher87@users.noreply.github.com>
@danielfromearth danielfromearth changed the title Joss paper JOSS paper preparation Mar 6, 2026
@danielfromearth

Copy link
Copy Markdown
Contributor Author

after the v1.0.0 release

I would say let's not wait. We've demonstrated impact and I think that matters more.

Alternatively, let's just go 1.0.0 in the short term and be OK with quickly moving to a 2.0.0 release with breaking changes.

I think both are fine, but the latter sets more a precedent of maintainers taking the user impact of breaking changes too lightly.

I'm fine with either too. I also think the decision could be on hold until one of the two things – (i) co-author reviews/revisions, (ii) development for v1.0.0 – is completely ready-to-go.

Comment thread paper/paper.md Outdated
JessicaS11 and others added 3 commits April 1, 2026 09:44
Co-authored-by: Amy Steiker <47193922+asteiker@users.noreply.github.com>
Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
Co-authored-by: Amy Steiker <47193922+asteiker@users.noreply.github.com>
@danielfromearth

danielfromearth commented Apr 17, 2026

Copy link
Copy Markdown
Contributor Author

Hey all, it's been a couple weeks since activity here, so pinging to keep this moving. Would be great to have a complete draft ready to submit before Northern Hemisphere summer!

If there's not a specific note next to your username, a general read-through and comments are welcome:

@andypbarrett
@betolink (there's also this specific question about region-detection)
@chuckwondo (there's also this specific question about region-detection)
@jhkennedy
@jrbourbeau
@battistowx
@Sherwin-14
@jules32 (there's a placeholder to fill in the Openscapes award number)
@asteiker (there are a few follow-ups to your previous comments to address)
@JessicaS11 (there are a few follow-ups to your previous comments to address)

Comment thread paper/paper.md Outdated
Comment thread paper/paper.md Outdated

@jules32 jules32 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @danielfromearth ! I've added the award number. Thanks for leading this!

asteiker
asteiker previously approved these changes Apr 17, 2026
Comment thread paper/paper.md Outdated
Comment thread paper/paper.md Outdated
Comment thread paper/paper.md Outdated
Comment thread paper/paper.md

**Peer-reviewed publications.** `earthaccess` has been used in published research,
including studies on multi-sensor drought observations in forested environments
[@andreadis2024] and tidal bore detection using SWOT satellite data [@arildsen2025].

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks!

Comment thread paper/paper.md Outdated
Co-authored-by: Daniel Kaufman <114174502+danielfromearth@users.noreply.github.com>
Co-authored-by: Julia Stewart Lowndes <julia@openscapes.org>
Co-authored-by: Amy Steiker <47193922+asteiker@users.noreply.github.com>
Comment thread paper/paper.md Outdated
JessicaS11
JessicaS11 previously approved these changes Apr 22, 2026
@danielfromearth

danielfromearth commented Apr 22, 2026

Copy link
Copy Markdown
Contributor Author

Friendly ping for co-authors who haven't yet had a chance to review (or at least, approve): @andypbarrett @jhkennedy @jrbourbeau @battistowx @Sherwin-14 @betolink @chuckwondo

Things have been coming together and I think we are getting close to a complete draft that's ready. Would be great to have everyone's eyes on it, even briefly, before we finalize. Could you each take a look in the next week or two?

In particular, please confirm your name, affiliation, and ORCID are correct in the author list. And of course, all other comments welcome.

If timing doesn't work, just comment as such so we know where things stand. Thanks!

Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
itcarroll
itcarroll previously approved these changes Apr 22, 2026
@mfisher87

Copy link
Copy Markdown
Member

Related: https://earthaccess.zulipchat.com/#narrow/channel/480557-general/topic/JOSS.3F/with/590557057

We're considering / planning going for pyopensci review first, which will give us a stronger review and expedite the JOSS acceptance process if accepted to pyopensci.

Thanks @sampottinger for sharing this with me :)

betolink
betolink previously approved these changes Apr 29, 2026

@betolink betolink left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments and suggestions but nothing major, I think this is a good draft so I'm approving as is. Thanks for leading the effort @danielfromearth

Comment thread paper/paper.md Outdated
Comment thread paper/paper.md

3. **Access**: Attempts to detect at runtime whether the process is running within AWS `us-west-2`
and automatically selects the optimal access path -- direct S3 reads for in-region
access or HTTPS downloads otherwise. Users can manually specify an access path if needed. Files can be opened as `fsspec`-compatible

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the concise way of presenting this, maybe we can add that being format-agnostic and python file-like object compatible makes the library interoperable with the rest of the scientific python ecosystem (aka Pydata/Pangeo)

Comment thread paper/paper.md
open-source tools -- `python-cmr` for search, `fsspec` and `s3fs` for file I/O,
VirtualiZarr and kerchunk for virtual datasets -- rather than reimplementing their
functionality. The library's unique contribution is the NASA-specific integration
layer that binds these tools together.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the awesomeness, integrating and simplifying the steps a scientist usually do when working with NASA data. Maybe adding an example of time to science reduction both in lines of code and speed through performance optimizations via fsspec and virtualizarr. Tempo or ICESat-2 can be used for this, before N minutes, now N seconds. Before 10 lines of code, now 1.

Comment thread paper/paper.md
[@andreadis2024] and tidal bore detection using SWOT satellite data [@arildsen2025].

**Community adoption.** The library is a dependency of 230 public GitHub
repositories (as of 5 March 2026), spanning data analysis workflows, Jupyter-based tutorials, and

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mention machine learning projects here, some of the projects using earthaccess do AI or ML workflows even at production scale.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these projects didn't have to reinvent the wheel to access NASA Earth data.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nevermind, is mentioned below

@jhkennedy

Copy link
Copy Markdown
Contributor

Friendly ping for co-authors who haven't yet had a chance to review (or at least, approve): @andypbarrett @jhkennedy @jrbourbeau @battistowx @Sherwin-14 @betolink @chuckwondo

Things have been coming together and I think we are getting close to a complete draft that's ready. Would be great to have everyone's eyes on it, even briefly, before we finalize. Could you each take a look in the next week or two?

In particular, please confirm your name, affiliation, and ORCID are correct in the author list. And of course, all other comments welcome.

If timing doesn't work, just comment as such so we know where things stand. Thanks!

Thanks for the ping @danielfromearth! I've been traveling the last few weeks and will finally make it in back to the office on Monday. I'll have a look ASAP, but I suspect it's already in good shape judging from my quick glance here.

Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>
@danielfromearth danielfromearth dismissed stale reviews from betolink and itcarroll via 06ce30b May 1, 2026 18:10

@jhkennedy jhkennedy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it turns out I do have a bit of feedback 😊 . I think it's in a very good place, and really would be fine to submit with or without my feedback.

Other than the specific things discussed below, I have a pretty big concern around publishing this discussing the automatic cloud-detection logic. That's something we know is technically infeasible to do reliably and we've decided to rip out:
https://github.com/earthaccess-dev/earthaccess/blob/main/docs/governance/decisions/231-aws-us-west-2-checking-method.md

So I'd like to either not mention it or abstract that away in the manuscript language.

Since I have a lot of feedback, I could open a PR into this PR with how I'd resolve my comments, if that's easier. Just let me know.

Comment thread paper/paper.md
- name: "Booz Allen Hamilton, Inc., McLean, VA, USA"
index: 8
ror: 051rcp357
- name: "University of Alaska Fairbanks, Fairbanks, AK, USA"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: "University of Alaska Fairbanks, Fairbanks, AK, USA"
- name: "Alaska Satellite Facility, Geophysical Institute, University of Alaska Fairbanks, Fairbanks, AK, USA"

Comment thread paper/paper.md
must now contend with two possible access paradigms, traditional HTTPS downloads and S3-based
access. These both may even occur within a single analysis workflow. During workshops organized by NASA
Openscapes [@nasa_openscapes; @lowndes2019], the need for simpler tools became evident.
`earthaccess` was created to address this gap: it provides uniform access to NASA

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`earthaccess` was created to address this gap: it provides uniform access to NASA
`earthaccess` is a community project that was created to address this gap: it provides uniform access to NASA

We don't mention community at all until the end of the Software Design section, and we don't talk about the community aspect of developing this library at all, which I think is pretty integral to it's success and would be nice to represent somewhere in the introduction.

Comment thread paper/paper.md
error, and DAAC-specific configurations further compound the challenge.

NASA's ongoing migration to the Earthdata Cloud adds further complexity, as researchers
must now contend with two possible access paradigms, traditional HTTPS downloads and S3-based

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence should be moved up into the previous paragraph before (5) and (6), or part of a stand alone paragraph with (5) and (6).

Comment thread paper/paper.md
and decision-makers globally [@nasa_esds_data_metrics]. However, the complexity of the underlying data infrastructure
presents a significant barrier to scientific productivity. A typical data access workflow
requires a researcher to: (1) authenticate with NASA Earthdata Login; (2) discover
relevant datasets and granules through the CMR API; (3) parse metadata to obtain download

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
relevant datasets and granules through the CMR API; (3) parse metadata to obtain download
relevant datasets and granules through the CMR API; (3) parse metadata to obtain access

This is true for downloading, in-place HTTP access, or S3 "direct" access"

Comment thread paper/paper.md
Comment on lines +135 to +137
URLs; (4) manage HTTP sessions with tokens and redirect handling; (5) determine whether
data are hosted on-premises or in the Earthdata Cloud; and (6) obtain temporary AWS S3
credentials when accessing cloud-hosted data. Each step introduces opportunities for

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need to do (5) if you're doing (6)...

Comment thread paper/paper.md
Comment on lines +166 to +170
- **python-cmr** [@python_cmr] provides a Python wrapper around the CMR API for dataset
and granule queries. `earthaccess` builds on `python-cmr`, extending it with
DAAC-aware provider resolution, cloud-hosting filters, and rich result objects that
encapsulate metadata. However, `python-cmr` does not handle authentication, data
download, or cloud access -- the areas where researchers face many workflow difficulties.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also call out asf_search -- it's in between python_CMR and Earthaccess, focused on search and discovery but handles auth/etc. It is however, primarily focused on SAR data so has domain-specific tools/functionality added to it.

It was started 2 months before Earthaccess and came out of the same need/problems but with a different focus

Comment thread paper/paper.md
Comment on lines +181 to +183
- **earthdatalogin** [@earthdatalogin_r] provides similar authentication and access
functionality for the R programming ecosystem. The two projects share a common motivation and
serve as complementary tools for their respective language communities.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 are there other R/Julia things we should call out?

Comment thread paper/paper.md
NASA's Earth science data archive is one of the largest and most diverse collections of
Earth observation data in the world, used by over ten million researchers, educators,
and decision-makers globally [@nasa_esds_data_metrics]. However, the complexity of the underlying data infrastructure
presents a significant barrier to scientific productivity. A typical data access workflow

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like how we've ordered the "data access workflow". Right now we have:

  1. auth
  2. "discover"
  3. parse metadata
  4. sessions + redirects
  5. is cloud?
  6. S3 credentials

I think (1) and (4) should be combined and indeed that's how we discuss it on L196
https://github.com/earthaccess-dev/earthaccess/pull/1249/changes#diff-e504eb580b095a7e65428b098183a581e475f0fb316db95287eacd7d4f344424R196

Similarly, (5) and (6) are also optional and only for in-place cloud access with performance constraints or if you want to use S3 aware tools, and really, fit into (1) and (4) as well, which is also discussed this way on L196.

I also think (2) is better described as "search" and (2) + (3) is what I would call discovery. At least for me, I am always parsing metadata as part of what I'd call discovery -- typically searching broadly and then refining with sensor/bands/variable/etc, so that I end up with the actual set of granules I want to use in my workflow. I don't really see why getting the access URLs are special compared to getting any of the other metadata along the way.

We don't talk about data preparation at all, except as features of Harmony and Icepyx, which seems like a missed opportunity.

I think I'd restructure this like:

  1. Discovery
  2. Auth (EDL, S3, Sessions + redirects)
  3. Access
  4. Data prep (includes virtual datasets and transformations)

which is similar to the Software Design section. Note, I've put auth after discovery since you generally only need it to access data, unless you're trying to discover restricted datasets... so It could go before or after discovery, I think it just flows a little better narrative-ly after, but 🤷 .

Comment thread paper/paper.md

3. **Access**: Attempts to detect at runtime whether the process is running within AWS `us-west-2`
and automatically selects the optimal access path -- direct S3 reads for in-region
access or HTTPS downloads otherwise. Users can manually specify an access path if needed. Files can be opened as `fsspec`-compatible

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
access or HTTPS downloads otherwise. Users can manually specify an access path if needed. Files can be opened as `fsspec`-compatible
access or HTTPS access otherwise. Users can manually specify an access path if needed. Files can be opened as `fsspec`-compatible

You can download or stream via HTTPS

Comment thread paper/paper.md

# AI usage disclosure

No generative AI tools were used in the development of the `earthaccess` software; all architectural and design decisions were made exclusively by the authors and contributors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this true anymore? @betolink have you been using Claude for the virtulizarr work?

I wonder if we need to adopt an AI policy and say something like "...developers may use AI tools but are responsible for their contributions...".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.