Skip to content

Static diagnostics for library() and require() calls #870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jul 29, 2025

Conversation

lionel-
Copy link
Contributor

@lionel- lionel- commented Jul 11, 2025

Addresses posit-dev/positron#1325
Progress towards posit-dev/positron#2321

This PR partially fixes the issue of "unknown symbols" diagnotics in fresh console sessions by moving towards static analysis of library() and require() calls:

  • We analyse DESCRIPTION (for the Depends: field) and NAMESPACE (for export() directives) files.

  • Exported symbols are put in scope at the call site of library() and require() calls.

This takes care of most unused symbol diagnostics but not all:

Since this mechanism is currently limited and is a new approach, we still use the current dynamic approach as a fallback. This means the same gestures Positron users currently use to silence diagnostics (such as evaluating library() calls) still work as before.

This also means we are in a weird in-between state where diagnostics are not fully static, unless the session is 100% fresh. Once the limitations of the static diagnostics have been lifted, I think we should remove the dynamic fallback. The UX consequences of removing this fallback are discussed in posit-dev/positron#2321 (comment).

Approach:

We now examine package files installed in the session's library paths.

  • New Description and Namespace structs with parse() methods. For DESCRIPTION we implement our own DCF parser. For NAMESPACE we use a TS query for convenience, using the TSQuery helper implemented in Emit R6Class methods as workspace symbols #861.

  • New Libary and Package structs with load() methods. A library is loaded from a set of library paths, and a package is loaded from a single library path.

    The packages in a library are loaded lazily and cached in the library. For simplicity, packages are not invalidated when installed files change. In the future, once we have Salsa and the VFS infrastructure from Rust-Analyzer, we will be able to watch for changes and automatically cache updates in a simple and efficient way.

  • .libPaths() is called at the start of the session. This is a static value that doesn't change throughout the session. When the LSP is decoupled we'll call R to get the lib paths and this will be static as well. If the lib paths change, the LSP must be restarted.

    Side note: I'm realising that the decoupled LSP will generally require an R binary in order to work well. This is similar to Rust-Analyzer requiring cargo to e.g. fetch metadata, so I no longer think this is a problem.

  • When a library() or require() call is encountered, we get the package from the library. This causes to load if not loaded yet. We get the exports from the namespace file to put them in scope at that point in the file, and the depends field from the description file to attach other needed packages.

  • The symbols exported by a package are stored in a BTreeMap keyed by sorted positions in the file. When we lookup whether a symbol is defined, we simply discard exports whose position is greater than the symbol. We don't need to take masking or package ordering into account as we currently only need to check for existence of the symbol, not its type.

Note that {tidyverse} and {tidymodels} don't declare packages in Depends:, instead they attach packages from .onAttach(). I've hard-coded them for now but in the longer term we need to nudge package authors towards an explicit declaration in DESCRIPTION, such as Config/Needs/attach:. I've opened an issue about this in tidyverse/tidyverse#359.

QA Notes

With:

mutate(mtcars) # Not in scope

library(dplyr)

mutate(mtcars) # In scope


ggplot() # Not in scope

# Attach handled specially
library(tidyverse)

ggplot() # In scope


plan() # Not in scope

# `future` attached via `Depends`
library(furrr)

plan() # In scope

You should see:

Screenshot 2025-07-15 at 14 19 50

When you evaluate one of the library() calls, the corresponding diagnostics about unknown symbols before the library call should disappear. That would ideally not be the case, but for now we allow this as an escape hatch to work around shortcomings of the new system.

Edit: This should also work without any diagnostics (exported S4 classes and generics):

library(terra)
SpatExtent
rast()
add_legend()

We have backend tests for these various cases.

@@ -0,0 +1,16 @@
//
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't end up using this here but will use it in the next PR for package imports.

Comment on lines 94 to 99
fn children_of(node: Self) -> impl Iterator<Item = Self>;
fn next_siblings(&self) -> impl Iterator<Item = Self>;
fn arguments(&self) -> impl Iterator<Item = (Option<Self>, Option<Self>)>;
fn arguments_values(&self) -> impl Iterator<Item = Self>;
fn arguments_names(&self) -> impl Iterator<Item = Self>;
fn arguments_names_as_string(&self, contents: &ropey::Rope) -> impl Iterator<Item = String>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch of new helpers for iteration over TS nodes using the cursor API. I didn't end up using all of them but they could be useful later.

Comment on lines +210 to +217
// We'd ideally use the cursor API here too but
// `ts_tree_cursor_goto_parent()` doesn't behave like
// `ts_node_parent()`: the latter traverses `ERROR` nodes but not the
// former. So for now we accept the performance hit of tree traversal at
// each `parent()` call.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a bummer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this matter to you? I do not trust tree-sitter's error recovered tree enough to try and do anything useful when there are ERRORs in the tree. i.e. if there are syntax errors in the file, bail entirely for right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mattered to unit tests and I did not want to dig in and see what assumptions/behaviour were broken. I'm sure nothing of importance is actually broken though. But since this is all going to be replaced by rowan, I did not bother looking into this further.

@lionel-
Copy link
Contributor Author

lionel- commented Jul 15, 2025

Another thing to consider in the future is transitive imports.

File A:

library(ggplot2)

File B:

source("file_a.R")
ggplot() # In scope

The ggplot symbol is imported via source().

For now this will have to be worked around by evaluating the source() call or the relevant library() calls.

Copy link
Contributor

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a full deep code review and it seems quite nice and quite easy to understand!

I haven't done much interactive playing around with this in scripts yet, but I will and will report back if there are issues

Comment on lines +187 to +200
// FIXME: We shouldn't call R code in the kernel to figure this out
if let Err(err) = crate::r_task(|| -> anyhow::Result<()> {
let paths: Vec<String> = harp::RFunction::new("base", ".libPaths")
.call()?
.try_into()?;

log::info!("Using library paths: {paths:#?}");
let paths: Vec<PathBuf> = paths.into_iter().map(PathBuf::from).collect();
state.world.library = Library::new(paths);

Ok(())
}) {
log::error!("Can't evaluate `libPaths()`: {err:?}");
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you made this a KernelNotification that the kernel sends the LSP in refresh_lsp()?

Then you'd get updates to .libPaths() as well, like if a user changes it after startup, which is totally possible.

(This assumes you'd have a way to throw out and rebuild any information that utilized the libpaths when they change, but I figure we would have this)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea ok after reviewing this in more detail, I feel like all we'd have to do after getting the kernel notif is update the WorldState's Library with a fresh new empty one created with Library::new(paths) and we'd be good to go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also really like the idea of having clear-ish boundaries about what information comes from the Kernel, rather than having the LSP try and query it itself)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially had the same thought as you, but I'm no longer sure that would be the right place.

With an independent kernel, what should ideally happen is that the LSP, not the kernel, would invoke R to figure out the .libPaths(). It's true that we could potentially also make it a dynamic LSP input from the kernel to get updates when run inside Positron. But it's probably not worth the complication. I think we should first see how whether we can get away with a static libPath that is a constant for the duration of the session.

So the r_task() querying implemented here for convenience really stands for "invoke R to get .libPaths()". Squinting a bit you can compare this to rust-analyzer calling cargo to get information about the project.

Does that make sense?

Copy link
Contributor

@DavisVaughan DavisVaughan Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With an independent kernel, what should ideally happen is that the LSP, not the kernel, would invoke R to figure out the .libPaths().

That seems impossible to me, as .libPaths() can be set dynamically by the user of the current R session at any time

IIUC you are suggesting that the LSP fire up an independent side car R session to get this info, which I think is not right, as it won't reflect the current state of the world.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC you are suggesting that the LSP fire up an independent side car R session to get this info, which I think is not right, as it won't reflect the current state of the world.

yep I think this is exactly what it should do. It could also get updates from a running session, and that would be reasonable, but that shouldn't be required. If a project happens to have specific libPaths requirements that are not fulfilled by the R in the current environment, then this should be set in a way that can be statically figured out. Otherwise our static LSP will not be able to make sense of the project.

It sounds like that would be a good topic to discuss one on one.

Comment on lines +33 to +37
// Only consider libraries that have a folder named after the
// requested package and that contains a description file
if !description_path.is_file() {
return Ok(None);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to also require a NAMESPACE below, so bail early here too?

Copy link
Contributor Author

@lionel- lionel- Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm actually we probably should be robust to missing NAMESPACE files?

Edit: I went ahead and made NAMESPACE optional, with an info-level message when it's missing.

Comment on lines +210 to +217
// We'd ideally use the cursor API here too but
// `ts_tree_cursor_goto_parent()` doesn't behave like
// `ts_node_parent()`: the latter traverses `ERROR` nodes but not the
// former. So for now we accept the performance hit of tree traversal at
// each `parent()` call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this matter to you? I do not trust tree-sitter's error recovered tree enough to try and do anything useful when there are ERRORs in the tree. i.e. if there are syntax errors in the file, bail entirely for right now.

Comment on lines +900 to +904
fn insert_package_exports(
package_name: &str,
attach_pos: Point,
context: &mut DiagnosticContext,
) -> anyhow::Result<Arc<Package>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I'm fairly certain we can and should just return anyhow::Result<&Package>? And that get() method really should return a reference, not an Arc.

We'd still clone out the package.namespace.exports, which is fine and seems required, but otherwise Package is immutable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do that for the reason mentioned in the other comment (sharing across threads).

Just to be clear, Package is immutable. An Arc confers shared ownership in the sense that it's only deallocated when all instances go out of scope. But the ownership is not in terms of being able to mutate it. Essentially all instances are like non-mutable references that you can share across threads without lifetime issues.

Comment on lines +119 to +125
// Check all symbols exported by `library()` calls before the given position
for (library_position, exports) in self.library_symbols.iter() {
if *library_position > start_position {
break;
}
if exports.contains(name) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEAT

@@ -5,8 +5,10 @@
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

library(terra)
rast(r)

This doesn't work right now due to the custom namespace in terra

https://github.com/rspatial/terra/blob/master/NAMESPACE

  • export() is a comma separated list
  • exportClasses() is also a comma separated list for S4

Copy link
Contributor Author

@lionel- lionel- Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like multiple args in export(), import(), and importFrom() were already supported with the existing query!

I've added support for exportClasses() and exportMethods(). Surprisingly the latter also exports the generic!

@lionel- lionel- force-pushed the feature/static-imports branch 3 times, most recently from 409190a to 8a2c58b Compare July 25, 2025 10:25
pub(crate) fn all_captures<'tree, 'query>(
&'query mut self,
node: tree_sitter::Node<'tree>,
contents: &'query [u8],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the contents not have their own 'contents lifetime? It's tied to the TSQuery object?

@lionel- lionel- force-pushed the feature/static-imports branch from 4b260ff to 411a694 Compare July 25, 2025 16:34
@lionel- lionel- force-pushed the feature/static-imports branch from 411a694 to 9f8277e Compare July 29, 2025 17:11
@lionel- lionel- merged commit 1e3b986 into main Jul 29, 2025
6 checks passed
@lionel- lionel- deleted the feature/static-imports branch July 29, 2025 17:12
@github-actions github-actions bot locked and limited conversation to collaborators Jul 29, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants