Static diagnostics for `library()` and `require()` calls #870

lionel- · 2025-07-11T16:31:22Z

Addresses posit-dev/positron#1325
Progress towards posit-dev/positron#2321

This PR partially fixes the issue of "unknown symbols" diagnotics in fresh console sessions by moving towards static analysis of library() and require() calls:

We analyse DESCRIPTION (for the Depends: field) and NAMESPACE (for export() directives) files.
Exported symbols are put in scope at the call site of library() and require() calls.

This takes care of most unused symbol diagnostics but not all:

If the symbol is used before the library() call, it's still unkown and will still cause a diagnostic (this is expected behaviour).
exportPattern() directives are not supported yet (see Static analysis of exportPattern() directives positron#8520)
Exported data sets are not supported yet (see Static analysis of data/ exports in R packages positron#8521)

Since this mechanism is currently limited and is a new approach, we still use the current dynamic approach as a fallback. This means the same gestures Positron users currently use to silence diagnostics (such as evaluating library() calls) still work as before.

This also means we are in a weird in-between state where diagnostics are not fully static, unless the session is 100% fresh. Once the limitations of the static diagnostics have been lifted, I think we should remove the dynamic fallback. The UX consequences of removing this fallback are discussed in posit-dev/positron#2321 (comment).

Approach:

We now examine package files installed in the session's library paths.

New Description and Namespace structs with parse() methods. For DESCRIPTION we implement our own DCF parser. For NAMESPACE we use a TS query for convenience, using the TSQuery helper implemented in Emit R6Class methods as workspace symbols #861.
New Libary and Package structs with load() methods. A library is loaded from a set of library paths, and a package is loaded from a single library path.

The packages in a library are loaded lazily and cached in the library. For simplicity, packages are not invalidated when installed files change. In the future, once we have Salsa and the VFS infrastructure from Rust-Analyzer, we will be able to watch for changes and automatically cache updates in a simple and efficient way.
.libPaths() is called at the start of the session. This is a static value that doesn't change throughout the session. When the LSP is decoupled we'll call R to get the lib paths and this will be static as well. If the lib paths change, the LSP must be restarted.

Side note: I'm realising that the decoupled LSP will generally require an R binary in order to work well. This is similar to Rust-Analyzer requiring cargo to e.g. fetch metadata, so I no longer think this is a problem.
When a library() or require() call is encountered, we get the package from the library. This causes to load if not loaded yet. We get the exports from the namespace file to put them in scope at that point in the file, and the depends field from the description file to attach other needed packages.
The symbols exported by a package are stored in a BTreeMap keyed by sorted positions in the file. When we lookup whether a symbol is defined, we simply discard exports whose position is greater than the symbol. We don't need to take masking or package ordering into account as we currently only need to check for existence of the symbol, not its type.

Note that {tidyverse} and {tidymodels} don't declare packages in Depends:, instead they attach packages from .onAttach(). I've hard-coded them for now but in the longer term we need to nudge package authors towards an explicit declaration in DESCRIPTION, such as Config/Needs/attach:. I've opened an issue about this in tidyverse/tidyverse#359.

QA Notes

With:

mutate(mtcars) # Not in scope

library(dplyr)

mutate(mtcars) # In scope


ggplot() # Not in scope

# Attach handled specially
library(tidyverse)

ggplot() # In scope


plan() # Not in scope

# `future` attached via `Depends`
library(furrr)

plan() # In scope

You should see:

When you evaluate one of the library() calls, the corresponding diagnostics about unknown symbols before the library call should disappear. That would ideally not be the case, but for now we allow this as an escape hatch to work around shortcomings of the new system.

Edit: This should also work without any diagnostics (exported S4 classes and generics):

library(terra)
SpatExtent
rast()
add_legend()

We have backend tests for these various cases.

lionel- · 2025-07-15T12:26:30Z

crates/ark/src/lsp/inputs/source_root.rs

@@ -0,0 +1,16 @@
+//


I didn't end up using this here but will use it in the next PR for package imports.

lionel- · 2025-07-15T12:28:17Z

crates/ark/src/lsp/traits/node.rs

+    fn children_of(node: Self) -> impl Iterator<Item = Self>;
+    fn next_siblings(&self) -> impl Iterator<Item = Self>;
+    fn arguments(&self) -> impl Iterator<Item = (Option<Self>, Option<Self>)>;
+    fn arguments_values(&self) -> impl Iterator<Item = Self>;
+    fn arguments_names(&self) -> impl Iterator<Item = Self>;
+    fn arguments_names_as_string(&self, contents: &ropey::Rope) -> impl Iterator<Item = String>;


Bunch of new helpers for iteration over TS nodes using the cursor API. I didn't end up using all of them but they could be useful later.

lionel- · 2025-07-15T12:28:40Z

crates/ark/src/lsp/traits/node.rs

+        // We'd ideally use the cursor API here too but
+        // `ts_tree_cursor_goto_parent()` doesn't behave like
+        // `ts_node_parent()`: the latter traverses `ERROR` nodes but not the
+        // former. So for now we accept the performance hit of tree traversal at
+        // each `parent()` call.


That was a bummer.

Why does this matter to you? I do not trust tree-sitter's error recovered tree enough to try and do anything useful when there are ERRORs in the tree. i.e. if there are syntax errors in the file, bail entirely for right now.

It mattered to unit tests and I did not want to dig in and see what assumptions/behaviour were broken. I'm sure nothing of importance is actually broken though. But since this is all going to be replaced by rowan, I did not bother looking into this further.

lionel- · 2025-07-15T13:54:00Z

Another thing to consider in the future is transitive imports.

File A:

library(ggplot2)

File B:

source("file_a.R")
ggplot() # In scope

The ggplot symbol is imported via source().

For now this will have to be worked around by evaluating the source() call or the relevant library() calls.

DavisVaughan

Did a full deep code review and it seems quite nice and quite easy to understand!

I haven't done much interactive playing around with this in scripts yet, but I will and will report back if there are issues

DavisVaughan · 2025-07-17T16:42:11Z

crates/ark/src/lsp/main_loop.rs

+        // FIXME: We shouldn't call R code in the kernel to figure this out
+        if let Err(err) = crate::r_task(|| -> anyhow::Result<()> {
+            let paths: Vec<String> = harp::RFunction::new("base", ".libPaths")
+                .call()?
+                .try_into()?;
+
+            log::info!("Using library paths: {paths:#?}");
+            let paths: Vec<PathBuf> = paths.into_iter().map(PathBuf::from).collect();
+            state.world.library = Library::new(paths);
+
+            Ok(())
+        }) {
+            log::error!("Can't evaluate `libPaths()`: {err:?}");
+        };


What if you made this a KernelNotification that the kernel sends the LSP in refresh_lsp()?

Then you'd get updates to .libPaths() as well, like if a user changes it after startup, which is totally possible.

(This assumes you'd have a way to throw out and rebuild any information that utilized the libpaths when they change, but I figure we would have this)

Yea ok after reviewing this in more detail, I feel like all we'd have to do after getting the kernel notif is update the WorldState's Library with a fresh new empty one created with Library::new(paths) and we'd be good to go?

(I also really like the idea of having clear-ish boundaries about what information comes from the Kernel, rather than having the LSP try and query it itself)

I initially had the same thought as you, but I'm no longer sure that would be the right place.

With an independent kernel, what should ideally happen is that the LSP, not the kernel, would invoke R to figure out the .libPaths(). It's true that we could potentially also make it a dynamic LSP input from the kernel to get updates when run inside Positron. But it's probably not worth the complication. I think we should first see how whether we can get away with a static libPath that is a constant for the duration of the session.

So the r_task() querying implemented here for convenience really stands for "invoke R to get .libPaths()". Squinting a bit you can compare this to rust-analyzer calling cargo to get information about the project.

Does that make sense?

With an independent kernel, what should ideally happen is that the LSP, not the kernel, would invoke R to figure out the .libPaths().

That seems impossible to me, as .libPaths() can be set dynamically by the user of the current R session at any time

IIUC you are suggesting that the LSP fire up an independent side car R session to get this info, which I think is not right, as it won't reflect the current state of the world.

IIUC you are suggesting that the LSP fire up an independent side car R session to get this info, which I think is not right, as it won't reflect the current state of the world.

yep I think this is exactly what it should do. It could also get updates from a running session, and that would be reasonable, but that shouldn't be required. If a project happens to have specific libPaths requirements that are not fulfilled by the R in the current environment, then this should be set in a way that can be statically figured out. Otherwise our static LSP will not be able to make sense of the project.

It sounds like that would be a good topic to discuss one on one.

crates/ark/src/lsp/inputs/package.rs

DavisVaughan · 2025-07-17T16:48:47Z

crates/ark/src/lsp/inputs/package.rs

+        // Only consider libraries that have a folder named after the
+        // requested package and that contains a description file
+        if !description_path.is_file() {
+            return Ok(None);
+        }


You seem to also require a NAMESPACE below, so bail early here too?

hmm actually we probably should be robust to missing NAMESPACE files?

Edit: I went ahead and made NAMESPACE optional, with an info-level message when it's missing.

crates/ark/src/lsp/inputs/package.rs

DavisVaughan · 2025-07-17T16:55:56Z

crates/ark/src/lsp/traits/node.rs

+        // We'd ideally use the cursor API here too but
+        // `ts_tree_cursor_goto_parent()` doesn't behave like
+        // `ts_node_parent()`: the latter traverses `ERROR` nodes but not the
+        // former. So for now we accept the performance hit of tree traversal at
+        // each `parent()` call.


Why does this matter to you? I do not trust tree-sitter's error recovered tree enough to try and do anything useful when there are ERRORs in the tree. i.e. if there are syntax errors in the file, bail entirely for right now.

crates/ark/src/lsp/inputs/library.rs