JOSS paper preparation by danielfromearth · Pull Request #1249 · earthaccess-dev/earthaccess

danielfromearth · 2026-03-05T21:51:40Z

Manuscript draft

This PR is intended for revisions and improvements to the manuscript draft being prepared for submission to the Journal of Open Source Software (JOSS).

Paper format: The manuscript is prepared as a Markdown (paper.md) file with references in a paper.bib file, following the JOSS formatting guidelines.

For a PDF preview: With docker installed locally, a PDF preview of the draft manuscript can be generated, by running the following from the earthaccess root directory (as described in the JOSS guidelines's docker section):

docker run --rm \
    --volume $PWD/paper:/data \
    --user $(id -u):$(id -g) \
    --env JOURNAL=joss \
    openjournals/inara

📚 Documentation preview 📚: https://earthaccess--1249.org.readthedocs.build/en/1249/

…JOSS)

github-actions · 2026-03-05T21:51:51Z

👈 Launch a binder notebook on this branch for commit 16fb7b9

I will automatically update this comment whenever this PR is modified

👈 Launch a binder notebook on this branch for commit 38cad6a

👈 Launch a binder notebook on this branch for commit 6af0701

👈 Launch a binder notebook on this branch for commit 767ad52

👈 Launch a binder notebook on this branch for commit dce192c

👈 Launch a binder notebook on this branch for commit ae74db7

👈 Launch a binder notebook on this branch for commit 05f7616

👈 Launch a binder notebook on this branch for commit bb5fd2f

👈 Launch a binder notebook on this branch for commit db3a969

👈 Launch a binder notebook on this branch for commit cf0f975

👈 Launch a binder notebook on this branch for commit 5852fa8

👈 Launch a binder notebook on this branch for commit 1b479c5

👈 Launch a binder notebook on this branch for commit 5029e59

👈 Launch a binder notebook on this branch for commit 2f8cab3

👈 Launch a binder notebook on this branch for commit 0e89b5e

👈 Launch a binder notebook on this branch for commit 5691bf8

👈 Launch a binder notebook on this branch for commit 1af1724

👈 Launch a binder notebook on this branch for commit 81b1384

👈 Launch a binder notebook on this branch for commit 13a9c14

👈 Launch a binder notebook on this branch for commit ce0ada4

👈 Launch a binder notebook on this branch for commit 77428e6

👈 Launch a binder notebook on this branch for commit 06ce30b

jules32

Hi! Great work on this Danny! A few commits and some suggestions to consider.

Co-authored-by: Julia Stewart Lowndes <julia@openscapes.org>

mfisher87 · 2026-03-05T22:10:04Z

We could symlink this in to our docs!

@mfisher87, want to create an issue for it?

mfisher87 · 2026-03-06T16:06:10Z

after the v1.0.0 release

I would say let's not wait. We've demonstrated impact and I think that matters more.

Alternatively, let's just go 1.0.0 in the short term and be OK with quickly moving to a 2.0.0 release with breaking changes.

I think both are fine, but the latter sets more a precedent of maintainers taking the user impact of breaking changes too lightly.

Co-authored-by: Matt Fisher <3608264+mfisher87@users.noreply.github.com>

danielfromearth · 2026-03-06T21:48:24Z

after the v1.0.0 release

I would say let's not wait. We've demonstrated impact and I think that matters more.

Alternatively, let's just go 1.0.0 in the short term and be OK with quickly moving to a 2.0.0 release with breaking changes.

I think both are fine, but the latter sets more a precedent of maintainers taking the user impact of breaking changes too lightly.

I'm fine with either too. I also think the decision could be on hold until one of the two things – (i) co-author reviews/revisions, (ii) development for v1.0.0 – is completely ready-to-go.

done

Co-authored-by: Amy Steiker <47193922+asteiker@users.noreply.github.com> Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>

Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com> Co-authored-by: Amy Steiker <47193922+asteiker@users.noreply.github.com>

danielfromearth · 2026-04-17T14:27:04Z

Hey all, it's been a couple weeks since activity here, so pinging to keep this moving. Would be great to have a complete draft ready to submit before Northern Hemisphere summer!

If there's not a specific note next to your username, a general read-through and comments are welcome:

@andypbarrett
@betolink (there's also this specific question about region-detection)
@chuckwondo (there's also this specific question about region-detection)
@jhkennedy
@jrbourbeau
@battistowx
@Sherwin-14
@jules32 (there's a placeholder to fill in the Openscapes award number)
@asteiker (there are a few follow-ups to your previous comments to address)
@JessicaS11 (there are a few follow-ups to your previous comments to address)

jules32

Thank you @danielfromearth ! I've added the award number. Thanks for leading this!

asteiker · 2026-04-17T22:48:22Z

+
+**Peer-reviewed publications.** `earthaccess` has been used in published research,
+including studies on multi-sensor drought observations in forested environments
+[@andreadis2024] and tidal bore detection using SWOT satellite data [@arildsen2025].


Cool, thanks!

Co-authored-by: Daniel Kaufman <114174502+danielfromearth@users.noreply.github.com> Co-authored-by: Julia Stewart Lowndes <julia@openscapes.org> Co-authored-by: Amy Steiker <47193922+asteiker@users.noreply.github.com>

danielfromearth · 2026-04-22T14:42:19Z

Friendly ping for co-authors who haven't yet had a chance to review (or at least, approve): @andypbarrett @jhkennedy @jrbourbeau @battistowx @Sherwin-14 @betolink @chuckwondo

Things have been coming together and I think we are getting close to a complete draft that's ready. Would be great to have everyone's eyes on it, even briefly, before we finalize. Could you each take a look in the next week or two?

In particular, please confirm your name, affiliation, and ORCID are correct in the author list. And of course, all other comments welcome.

If timing doesn't work, just comment as such so we know where things stand. Thanks!

Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>

mfisher87 · 2026-04-23T22:24:06Z

Related: https://earthaccess.zulipchat.com/#narrow/channel/480557-general/topic/JOSS.3F/with/590557057

We're considering / planning going for pyopensci review first, which will give us a stronger review and expedite the JOSS acceptance process if accepted to pyopensci.

Thanks @sampottinger for sharing this with me :)

betolink

I left some comments and suggestions but nothing major, I think this is a good draft so I'm approving as is. Thanks for leading the effort @danielfromearth

betolink · 2026-04-22T18:58:32Z

+
+3. **Access**: Attempts to detect at runtime whether the process is running within AWS `us-west-2`
+   and automatically selects the optimal access path -- direct S3 reads for in-region
+   access or HTTPS downloads otherwise. Users can manually specify an access path if needed.  Files can be opened as `fsspec`-compatible


I like the concise way of presenting this, maybe we can add that being format-agnostic and python file-like object compatible makes the library interoperable with the rest of the scientific python ecosystem (aka Pydata/Pangeo)

betolink · 2026-04-22T19:03:11Z

+open-source tools -- `python-cmr` for search, `fsspec` and `s3fs` for file I/O,
+VirtualiZarr and kerchunk for virtual datasets -- rather than reimplementing their
+functionality. The library's unique contribution is the NASA-specific integration
+layer that binds these tools together.


This is the awesomeness, integrating and simplifying the steps a scientist usually do when working with NASA data. Maybe adding an example of time to science reduction both in lines of code and speed through performance optimizations via fsspec and virtualizarr. Tempo or ICESat-2 can be used for this, before N minutes, now N seconds. Before 10 lines of code, now 1.

betolink · 2026-04-22T19:04:49Z

+[@andreadis2024] and tidal bore detection using SWOT satellite data [@arildsen2025].
+
+**Community adoption.** The library is a dependency of 230 public GitHub
+repositories (as of 5 March 2026), spanning data analysis workflows, Jupyter-based tutorials, and


Let's mention machine learning projects here, some of the projects using earthaccess do AI or ML workflows even at production scale.

Each of these projects didn't have to reinvent the wheel to access NASA Earth data.

Oh nevermind, is mentioned below

jhkennedy · 2026-04-30T12:48:05Z

Friendly ping for co-authors who haven't yet had a chance to review (or at least, approve): @andypbarrett @jhkennedy @jrbourbeau @battistowx @Sherwin-14 @betolink @chuckwondo

Things have been coming together and I think we are getting close to a complete draft that's ready. Would be great to have everyone's eyes on it, even briefly, before we finalize. Could you each take a look in the next week or two?

In particular, please confirm your name, affiliation, and ORCID are correct in the author list. And of course, all other comments welcome.

If timing doesn't work, just comment as such so we know where things stand. Thanks!

Thanks for the ping @danielfromearth! I've been traveling the last few weeks and will finally make it in back to the office on Monday. I'll have a look ASAP, but I suspect it's already in good shape judging from my quick glance here.

Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>

jhkennedy

Well, it turns out I do have a bit of feedback 😊 . I think it's in a very good place, and really would be fine to submit with or without my feedback.

Other than the specific things discussed below, I have a pretty big concern around publishing this discussing the automatic cloud-detection logic. That's something we know is technically infeasible to do reliably and we've decided to rip out:
https://github.com/earthaccess-dev/earthaccess/blob/main/docs/governance/decisions/231-aws-us-west-2-checking-method.md

So I'd like to either not mention it or abstract that away in the manuscript language.

Since I have a lot of feedback, I could open a PR into this PR with how I'd resolve my comments, if that's easier. Just let me know.

jhkennedy · 2026-05-05T18:25:57Z

+ - name: "Booz Allen Hamilton, Inc., McLean, VA, USA"
+   index: 8
+   ror: 051rcp357
+ - name: "University of Alaska Fairbanks, Fairbanks, AK, USA"


Suggested change

- name: "University of Alaska Fairbanks, Fairbanks, AK, USA"

- name: "Alaska Satellite Facility, Geophysical Institute, University of Alaska Fairbanks, Fairbanks, AK, USA"

jhkennedy · 2026-05-05T18:28:58Z

+must now contend with two possible access paradigms, traditional HTTPS downloads and S3-based
+access. These both may even occur within a single analysis workflow. During workshops organized by NASA
+Openscapes [@nasa_openscapes; @lowndes2019], the need for simpler tools became evident.
+`earthaccess` was created to address this gap: it provides uniform access to NASA


Suggested change

`earthaccess` was created to address this gap: it provides uniform access to NASA

`earthaccess` is a community project that was created to address this gap: it provides uniform access to NASA

We don't mention community at all until the end of the Software Design section, and we don't talk about the community aspect of developing this library at all, which I think is pretty integral to it's success and would be nice to represent somewhere in the introduction.

jhkennedy · 2026-05-05T18:31:05Z

+error, and DAAC-specific configurations further compound the challenge.
+
+NASA's ongoing migration to the Earthdata Cloud adds further complexity, as researchers
+must now contend with two possible access paradigms, traditional HTTPS downloads and S3-based


I think this sentence should be moved up into the previous paragraph before (5) and (6), or part of a stand alone paragraph with (5) and (6).

jhkennedy · 2026-05-05T18:31:50Z

+and decision-makers globally [@nasa_esds_data_metrics]. However, the complexity of the underlying data infrastructure
+presents a significant barrier to scientific productivity. A typical data access workflow
+requires a researcher to: (1) authenticate with NASA Earthdata Login; (2) discover
+relevant datasets and granules through the CMR API; (3) parse metadata to obtain download


Suggested change

relevant datasets and granules through the CMR API; (3) parse metadata to obtain download

relevant datasets and granules through the CMR API; (3) parse metadata to obtain access

This is true for downloading, in-place HTTP access, or S3 "direct" access"

jhkennedy · 2026-05-05T18:33:00Z

+URLs; (4) manage HTTP sessions with tokens and redirect handling; (5) determine whether
+data are hosted on-premises or in the Earthdata Cloud; and (6) obtain temporary AWS S3
+credentials when accessing cloud-hosted data. Each step introduces opportunities for


You only need to do (5) if you're doing (6)...

jhkennedy · 2026-05-05T18:46:34Z

+- **python-cmr** [@python_cmr] provides a Python wrapper around the CMR API for dataset
+  and granule queries. `earthaccess` builds on `python-cmr`, extending it with
+  DAAC-aware provider resolution, cloud-hosting filters, and rich result objects that
+  encapsulate metadata. However, `python-cmr` does not handle authentication, data
+  download, or cloud access -- the areas where researchers face many workflow difficulties.


We should also call out asf_search -- it's in between python_CMR and Earthaccess, focused on search and discovery but handles auth/etc. It is however, primarily focused on SAR data so has domain-specific tools/functionality added to it.

It was started 2 months before Earthaccess and came out of the same need/problems but with a different focus

jhkennedy · 2026-05-05T18:47:48Z

+- **earthdatalogin** [@earthdatalogin_r] provides similar authentication and access
+  functionality for the R programming ecosystem. The two projects share a common motivation and
+  serve as complementary tools for their respective language communities.


🤔 are there other R/Julia things we should call out?

jhkennedy · 2026-05-05T19:07:43Z

+NASA's Earth science data archive is one of the largest and most diverse collections of
+Earth observation data in the world, used by over ten million researchers, educators,
+and decision-makers globally [@nasa_esds_data_metrics]. However, the complexity of the underlying data infrastructure
+presents a significant barrier to scientific productivity. A typical data access workflow


I'm not sure I like how we've ordered the "data access workflow". Right now we have:

auth

"discover"

parse metadata

sessions + redirects

is cloud?

S3 credentials

I think (1) and (4) should be combined and indeed that's how we discuss it on L196
https://github.com/earthaccess-dev/earthaccess/pull/1249/changes#diff-e504eb580b095a7e65428b098183a581e475f0fb316db95287eacd7d4f344424R196

Similarly, (5) and (6) are also optional and only for in-place cloud access with performance constraints or if you want to use S3 aware tools, and really, fit into (1) and (4) as well, which is also discussed this way on L196.

I also think (2) is better described as "search" and (2) + (3) is what I would call discovery. At least for me, I am always parsing metadata as part of what I'd call discovery -- typically searching broadly and then refining with sensor/bands/variable/etc, so that I end up with the actual set of granules I want to use in my workflow. I don't really see why getting the access URLs are special compared to getting any of the other metadata along the way.

We don't talk about data preparation at all, except as features of Harmony and Icepyx, which seems like a missed opportunity.

I think I'd restructure this like:

Discovery

Auth (EDL, S3, Sessions + redirects)

Access

Data prep (includes virtual datasets and transformations)

which is similar to the Software Design section. Note, I've put auth after discovery since you generally only need it to access data, unless you're trying to discover restricted datasets... so It could go before or after discovery, I think it just flows a little better narrative-ly after, but 🤷 .

jhkennedy · 2026-05-05T20:07:07Z

+
+3. **Access**: Attempts to detect at runtime whether the process is running within AWS `us-west-2`
+   and automatically selects the optimal access path -- direct S3 reads for in-region
+   access or HTTPS downloads otherwise. Users can manually specify an access path if needed.  Files can be opened as `fsspec`-compatible


Suggested change

access or HTTPS downloads otherwise. Users can manually specify an access path if needed. Files can be opened as `fsspec`-compatible

access or HTTPS access otherwise. Users can manually specify an access path if needed. Files can be opened as `fsspec`-compatible

You can download or stream via HTTPS

jhkennedy · 2026-05-05T20:13:53Z

+
+# AI usage disclosure
+
+No generative AI tools were used in the development of the `earthaccess` software; all architectural and design decisions were made exclusively by the authors and contributors.


Hmm, is this true anymore? @betolink have you been using Claude for the virtulizarr work?

I wonder if we need to adopt an AI policy and say something like "...developers may use AI tools but are responsible for their contributions...".

danielfromearth added 2 commits March 3, 2026 14:52

add initial manuscript drafting for Journal of Open Source Software (…

e709155

…JOSS)

add content and revise manuscript draft

16fb7b9

jules32 approved these changes Mar 5, 2026

View reviewed changes

Comment thread paper/paper.md Outdated

Comment thread paper/paper.md

Comment thread paper/paper.md Outdated

Comment thread paper/paper.md Outdated

Comment thread paper/paper.md

Comment thread paper/paper.md Outdated

danielfromearth and others added 4 commits March 6, 2026 09:48

Apply suggestions from code review

38cad6a

Co-authored-by: Julia Stewart Lowndes <julia@openscapes.org>

Apply revisions from @jules32 review comments

6af0701

add icepack example

767ad52

add blurb about contributing upstream

dce192c

mfisher87 previously requested changes Mar 6, 2026

View reviewed changes

Merge branch 'main' into joss-paper

ae74db7

danielfromearth mentioned this pull request Mar 6, 2026

Add entries to citation file #1254

Merged

Update affiliations per code review

05f7616

Co-authored-by: Matt Fisher <3608264+mfisher87@users.noreply.github.com>

danielfromearth changed the title ~~Joss paper~~ JOSS paper preparation Mar 6, 2026

add entries to authors and affiliations

bb5fd2f

danielfromearth requested a review from andypbarrett March 10, 2026 19:07

danielfromearth assigned betolink, itcarroll, chuckwondo, jhkennedy, jrbourbeau, JessicaS11, battistowx, asteiker and Sherwin-14 and unassigned betolink, itcarroll and chuckwondo Mar 10, 2026

JessicaS11 reviewed Apr 1, 2026

View reviewed changes

Comment thread paper/paper.md Outdated

JessicaS11 and others added 3 commits April 1, 2026 09:44

minor updates to word choice in paper

0e89b5e

Co-authored-by: Amy Steiker <47193922+asteiker@users.noreply.github.com> Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>

fix names of things and apply a few more suggested phrasing edits

5691bf8

Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com> Co-authored-by: Amy Steiker <47193922+asteiker@users.noreply.github.com>

docs(paper): add citation for NASA ESDS data metrics claim

1af1724

danielfromearth requested review from JessicaS11 and asteiker April 16, 2026 14:23

jules32 reviewed Apr 17, 2026

View reviewed changes

Comment thread paper/paper.md Outdated

jules32 reviewed Apr 17, 2026

View reviewed changes

Comment thread paper/paper.md Outdated

jules32 approved these changes Apr 17, 2026

View reviewed changes

asteiker previously approved these changes Apr 17, 2026

View reviewed changes

Apply suggestions from code review

81b1384

Co-authored-by: Daniel Kaufman <114174502+danielfromearth@users.noreply.github.com> Co-authored-by: Julia Stewart Lowndes <julia@openscapes.org> Co-authored-by: Amy Steiker <47193922+asteiker@users.noreply.github.com>

danielfromearth dismissed asteiker’s stale review via 81b1384 April 20, 2026 19:28

JessicaS11 reviewed Apr 22, 2026

View reviewed changes

Comment thread paper/paper.md Outdated

JessicaS11 previously approved these changes Apr 22, 2026

View reviewed changes

Update contributors listing per @JessicaS11 comment

13a9c14

Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>

danielfromearth dismissed JessicaS11’s stale review via 13a9c14 April 22, 2026 14:43

itcarroll previously approved these changes Apr 22, 2026

View reviewed changes

Merge branch 'main' into joss-paper

ce0ada4

asteiker mentioned this pull request Apr 22, 2026

Consider creating and adding an "output register" to Readthedocs #1216

Open

Sherwin-14 approved these changes Apr 23, 2026

View reviewed changes

Merge branch 'main' into joss-paper

77428e6

betolink previously approved these changes Apr 29, 2026

View reviewed changes

Apply suggestions from code review

06ce30b

Co-authored-by: Jessica Scheick <JessicaS11@users.noreply.github.com>

danielfromearth dismissed stale reviews from betolink and itcarroll via 06ce30b May 1, 2026 18:10

jhkennedy reviewed May 5, 2026

View reviewed changes

	- name: "University of Alaska Fairbanks, Fairbanks, AK, USA"
	- name: "Alaska Satellite Facility, Geophysical Institute, University of Alaska Fairbanks, Fairbanks, AK, USA"

	`earthaccess` was created to address this gap: it provides uniform access to NASA
	`earthaccess` is a community project that was created to address this gap: it provides uniform access to NASA

	relevant datasets and granules through the CMR API; (3) parse metadata to obtain download
	relevant datasets and granules through the CMR API; (3) parse metadata to obtain access

	access or HTTPS downloads otherwise. Users can manually specify an access path if needed. Files can be opened as `fsspec`-compatible
	access or HTTPS access otherwise. Users can manually specify an access path if needed. Files can be opened as `fsspec`-compatible


		# AI usage disclosure

		No generative AI tools were used in the development of the `earthaccess` software; all architectural and design decisions were made exclusively by the authors and contributors.

Conversation

danielfromearth commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Manuscript draft

Uh oh!

github-actions Bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jules32 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mfisher87 commented Mar 6, 2026

Uh oh!

danielfromearth commented Mar 6, 2026

Uh oh!

Uh oh!

danielfromearth commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jules32 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

danielfromearth commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfisher87 commented Apr 23, 2026

Uh oh!

betolink left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhkennedy commented Apr 30, 2026

Uh oh!

jhkennedy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

danielfromearth commented Mar 5, 2026 •

edited

Loading

github-actions Bot commented Mar 5, 2026 •

edited

Loading

danielfromearth commented Apr 17, 2026 •

edited

Loading

danielfromearth commented Apr 22, 2026 •

edited

Loading