-
Notifications
You must be signed in to change notification settings - Fork 17
Static diagnostics for library()
and require()
calls
#870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3600378
to
391138e
Compare
@@ -0,0 +1,16 @@ | |||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't end up using this here but will use it in the next PR for package imports.
crates/ark/src/lsp/traits/node.rs
Outdated
fn children_of(node: Self) -> impl Iterator<Item = Self>; | ||
fn next_siblings(&self) -> impl Iterator<Item = Self>; | ||
fn arguments(&self) -> impl Iterator<Item = (Option<Self>, Option<Self>)>; | ||
fn arguments_values(&self) -> impl Iterator<Item = Self>; | ||
fn arguments_names(&self) -> impl Iterator<Item = Self>; | ||
fn arguments_names_as_string(&self, contents: &ropey::Rope) -> impl Iterator<Item = String>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bunch of new helpers for iteration over TS nodes using the cursor API. I didn't end up using all of them but they could be useful later.
// We'd ideally use the cursor API here too but | ||
// `ts_tree_cursor_goto_parent()` doesn't behave like | ||
// `ts_node_parent()`: the latter traverses `ERROR` nodes but not the | ||
// former. So for now we accept the performance hit of tree traversal at | ||
// each `parent()` call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a bummer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this matter to you? I do not trust tree-sitter's error recovered tree enough to try and do anything useful when there are ERRORs in the tree. i.e. if there are syntax errors in the file, bail entirely for right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It mattered to unit tests and I did not want to dig in and see what assumptions/behaviour were broken. I'm sure nothing of importance is actually broken though. But since this is all going to be replaced by rowan, I did not bother looking into this further.
Another thing to consider in the future is transitive imports. File A: library(ggplot2) File B: source("file_a.R")
ggplot() # In scope The For now this will have to be worked around by evaluating the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a full deep code review and it seems quite nice and quite easy to understand!
I haven't done much interactive playing around with this in scripts yet, but I will and will report back if there are issues
// FIXME: We shouldn't call R code in the kernel to figure this out | ||
if let Err(err) = crate::r_task(|| -> anyhow::Result<()> { | ||
let paths: Vec<String> = harp::RFunction::new("base", ".libPaths") | ||
.call()? | ||
.try_into()?; | ||
|
||
log::info!("Using library paths: {paths:#?}"); | ||
let paths: Vec<PathBuf> = paths.into_iter().map(PathBuf::from).collect(); | ||
state.world.library = Library::new(paths); | ||
|
||
Ok(()) | ||
}) { | ||
log::error!("Can't evaluate `libPaths()`: {err:?}"); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you made this a KernelNotification
that the kernel sends the LSP in refresh_lsp()
?
Then you'd get updates to .libPaths()
as well, like if a user changes it after startup, which is totally possible.
(This assumes you'd have a way to throw out and rebuild any information that utilized the libpaths when they change, but I figure we would have this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea ok after reviewing this in more detail, I feel like all we'd have to do after getting the kernel notif is update the WorldState
's Library
with a fresh new empty one created with Library::new(paths)
and we'd be good to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I also really like the idea of having clear-ish boundaries about what information comes from the Kernel, rather than having the LSP try and query it itself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially had the same thought as you, but I'm no longer sure that would be the right place.
With an independent kernel, what should ideally happen is that the LSP, not the kernel, would invoke R
to figure out the .libPaths()
. It's true that we could potentially also make it a dynamic LSP input from the kernel to get updates when run inside Positron. But it's probably not worth the complication. I think we should first see how whether we can get away with a static libPath that is a constant for the duration of the session.
So the r_task()
querying implemented here for convenience really stands for "invoke R to get .libPaths()
". Squinting a bit you can compare this to rust-analyzer calling cargo
to get information about the project.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With an independent kernel, what should ideally happen is that the LSP, not the kernel, would invoke R to figure out the .libPaths().
That seems impossible to me, as .libPaths()
can be set dynamically by the user of the current R session at any time
IIUC you are suggesting that the LSP fire up an independent side car R session to get this info, which I think is not right, as it won't reflect the current state of the world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC you are suggesting that the LSP fire up an independent side car R session to get this info, which I think is not right, as it won't reflect the current state of the world.
yep I think this is exactly what it should do. It could also get updates from a running session, and that would be reasonable, but that shouldn't be required. If a project happens to have specific libPaths requirements that are not fulfilled by the R
in the current environment, then this should be set in a way that can be statically figured out. Otherwise our static LSP will not be able to make sense of the project.
It sounds like that would be a good topic to discuss one on one.
// Only consider libraries that have a folder named after the | ||
// requested package and that contains a description file | ||
if !description_path.is_file() { | ||
return Ok(None); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to also require a NAMESPACE below, so bail early here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm actually we probably should be robust to missing NAMESPACE
files?
Edit: I went ahead and made NAMESPACE
optional, with an info-level message when it's missing.
// We'd ideally use the cursor API here too but | ||
// `ts_tree_cursor_goto_parent()` doesn't behave like | ||
// `ts_node_parent()`: the latter traverses `ERROR` nodes but not the | ||
// former. So for now we accept the performance hit of tree traversal at | ||
// each `parent()` call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this matter to you? I do not trust tree-sitter's error recovered tree enough to try and do anything useful when there are ERRORs in the tree. i.e. if there are syntax errors in the file, bail entirely for right now.
fn insert_package_exports( | ||
package_name: &str, | ||
attach_pos: Point, | ||
context: &mut DiagnosticContext, | ||
) -> anyhow::Result<Arc<Package>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I'm fairly certain we can and should just return anyhow::Result<&Package>
? And that get()
method really should return a reference, not an Arc
.
We'd still clone out the package.namespace.exports
, which is fine and seems required, but otherwise Package
is immutable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't do that for the reason mentioned in the other comment (sharing across threads).
Just to be clear, Package
is immutable. An Arc
confers shared ownership in the sense that it's only deallocated when all instances go out of scope. But the ownership is not in terms of being able to mutate it. Essentially all instances are like non-mutable references that you can share across threads without lifetime issues.
// Check all symbols exported by `library()` calls before the given position | ||
for (library_position, exports) in self.library_symbols.iter() { | ||
if *library_position > start_position { | ||
break; | ||
} | ||
if exports.contains(name) { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEAT
@@ -5,8 +5,10 @@ | |||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
library(terra)
rast(r)
This doesn't work right now due to the custom namespace in terra
https://github.com/rspatial/terra/blob/master/NAMESPACE
export()
is a comma separated listexportClasses()
is also a comma separated list for S4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like multiple args in export()
, import()
, and importFrom()
were already supported with the existing query!
I've added support for exportClasses()
and exportMethods()
. Surprisingly the latter also exports the generic!
409190a
to
8a2c58b
Compare
crates/ark/src/treesitter.rs
Outdated
pub(crate) fn all_captures<'tree, 'query>( | ||
&'query mut self, | ||
node: tree_sitter::Node<'tree>, | ||
contents: &'query [u8], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the contents not have their own 'contents
lifetime? It's tied to the TSQuery
object?
4b260ff
to
411a694
Compare
411a694
to
9f8277e
Compare
Addresses posit-dev/positron#1325
Progress towards posit-dev/positron#2321
This PR partially fixes the issue of "unknown symbols" diagnotics in fresh console sessions by moving towards static analysis of
library()
andrequire()
calls:We analyse
DESCRIPTION
(for theDepends:
field) andNAMESPACE
(forexport()
directives) files.Exported symbols are put in scope at the call site of
library()
andrequire()
calls.This takes care of most unused symbol diagnostics but not all:
If the symbol is used before the
library()
call, it's still unkown and will still cause a diagnostic (this is expected behaviour).exportPattern()
directives are not supported yet (see Static analysis ofexportPattern()
directives positron#8520)Exported data sets are not supported yet (see Static analysis of
data/
exports in R packages positron#8521)Since this mechanism is currently limited and is a new approach, we still use the current dynamic approach as a fallback. This means the same gestures Positron users currently use to silence diagnostics (such as evaluating
library()
calls) still work as before.This also means we are in a weird in-between state where diagnostics are not fully static, unless the session is 100% fresh. Once the limitations of the static diagnostics have been lifted, I think we should remove the dynamic fallback. The UX consequences of removing this fallback are discussed in posit-dev/positron#2321 (comment).
Approach:
We now examine package files installed in the session's library paths.
New
Description
andNamespace
structs withparse()
methods. For DESCRIPTION we implement our own DCF parser. For NAMESPACE we use a TS query for convenience, using theTSQuery
helper implemented in Emit R6Class methods as workspace symbols #861.New
Libary
andPackage
structs withload()
methods. A library is loaded from a set of library paths, and a package is loaded from a single library path.The packages in a library are loaded lazily and cached in the library. For simplicity, packages are not invalidated when installed files change. In the future, once we have Salsa and the VFS infrastructure from Rust-Analyzer, we will be able to watch for changes and automatically cache updates in a simple and efficient way.
.libPaths()
is called at the start of the session. This is a static value that doesn't change throughout the session. When the LSP is decoupled we'll callR
to get the lib paths and this will be static as well. If the lib paths change, the LSP must be restarted.Side note: I'm realising that the decoupled LSP will generally require an
R
binary in order to work well. This is similar to Rust-Analyzer requiringcargo
to e.g. fetch metadata, so I no longer think this is a problem.When a
library()
orrequire()
call is encountered, we get the package from the library. This causes to load if not loaded yet. We get the exports from the namespace file to put them in scope at that point in the file, and the depends field from the description file to attach other needed packages.The symbols exported by a package are stored in a
BTreeMap
keyed by sorted positions in the file. When we lookup whether a symbol is defined, we simply discard exports whose position is greater than the symbol. We don't need to take masking or package ordering into account as we currently only need to check for existence of the symbol, not its type.Note that {tidyverse} and {tidymodels} don't declare packages in
Depends:
, instead they attach packages from.onAttach()
. I've hard-coded them for now but in the longer term we need to nudge package authors towards an explicit declaration in DESCRIPTION, such asConfig/Needs/attach:
. I've opened an issue about this in tidyverse/tidyverse#359.QA Notes
With:
You should see:
When you evaluate one of the
library()
calls, the corresponding diagnostics about unknown symbols before the library call should disappear. That would ideally not be the case, but for now we allow this as an escape hatch to work around shortcomings of the new system.Edit: This should also work without any diagnostics (exported S4 classes and generics):
We have backend tests for these various cases.