Skip to content

Commit bfea120

Browse files
committed
Lower memory usage for lazily-loaded files and faster
go-to-implementation
1 parent 3e07c35 commit bfea120

7 files changed

Lines changed: 115 additions & 110 deletions

File tree

docs/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
5252
- **Diagnostic delivery model.** Editors that support pull diagnostics now get diagnostics on first file open without waiting for a debounce timer. Updates from external tools no longer re-run the entire native diagnostic pipeline.
5353
- **Virtual member resolution.** Mixins and virtual accessors are now resolved completely on every class, eliminating cases where they were missing after edits.
5454
- **Diagnostic code identifiers.** All diagnostic codes now use a consistent `snake_case` noun-phrase scheme: `unknown_variable`, `type_mismatch_argument`, `argument_count_mismatch`, `deprecated_usage`, `missing_implementation`. Users with editor filters matching on these codes will need to update them.
55+
- **Lower memory usage for lazily-loaded files.** Vendor and stub files no longer store per-file import tables and namespace maps after parsing, and go-to-implementation uses a dedicated reverse-inheritance index instead of scanning all parsed files.
5556
- **Lower memory usage for variable type tracking.**
5657
- **Updated embedded phpstorm-stubs.**
5758

docs/todo.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,6 @@ within the same impact tier.
2525

2626
| # | Item | Impact | Effort |
2727
| --- | ----------------------------------------------------------------------------------------------------------------------- | ----------- | ------ |
28-
| P13 | [Tiered storage: drop per-file maps for non-open files](todo/performance.md#p13-tiered-storage-drop-per-file-maps-for-non-open-files) | Medium-High | Medium-High |
29-
| P10 | [Redundant `parse_and_cache_file` from multiple threads](todo/performance.md#p10-redundant-parse_and_cache_file-from-multiple-threads) | Medium | Low |
3028
| D10 | [PHPMD diagnostic proxy](todo/diagnostics.md#d10-phpmd-diagnostic-proxy) | Low | Medium |
3129
| | **Release 0.8.0** | | |
3230

@@ -165,7 +163,6 @@ unlikely to move the needle for most users.
165163
| E7 | [Stub-based framework patches](todo/external-stubs.md#e7-stub-based-framework-patches) | Medium | Medium |
166164
| | **[Performance](todo/performance.md) / [Eager Resolution](todo/eager-resolution.md)** | | |
167165
| ER5 | [Mago-style separated metadata](todo/eager-resolution.md#er5--mago-style-separated-metadata) | High | High |
168-
| P13 | [Tiered storage: drop per-file maps for non-open files](todo/performance.md#p13-tiered-storage-drop-per-file-maps-for-non-open-files) | Medium-High | Medium-High |
169166
| P14 | [Eager docblock parsing into structured fields](todo/performance.md#p14-eager-docblock-parsing-into-structured-fields) | Medium | Medium |
170167
| P9 | [`resolved_class_cache` generic-arg specialisation](todo/performance.md#p9-resolved_class_cache-generic-arg-specialisation) | Medium | Medium |
171168
| P11 | [Uncached base-resolution in `build_scope_methods_for_builder`](todo/performance.md#p11-uncached-base-resolution-in-build_scope_methods_for_builder) | Low-Medium | Low |

docs/todo/performance.md

Lines changed: 1 addition & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -263,83 +263,6 @@ and eliminates the blocking fallback during interactive use.
263263

264264
---
265265

266-
## P13. Tiered storage: drop per-file maps for non-open files
267-
268-
**Impact: Medium-High · Effort: Medium-High**
269-
270-
> **Note.** This item needs refinement when we work on it. The
271-
> codebase and feature set may change significantly before then.
272-
273-
Not every file needs the same data at runtime. Storage should be
274-
split into three tiers based on how the file is used:
275-
276-
| Data | Open files | Closed user files | Vendor files |
277-
| ----------------- | ---------- | ------------------- | ------------------- |
278-
| ClassInfo (full) | keep | keep (via fqn_index)| keep (via fqn_index)|
279-
| SymbolMap | keep | drop (on-demand for find-refs) | never |
280-
| use_map | keep | drop after index | drop after index |
281-
| namespace_map | keep | drop after index | drop after index |
282-
| parse_errors | keep | never | never |
283-
| ast_map entry | keep | drop (redundant with fqn_index) | drop |
284-
| fqn_index | keep | keep | keep |
285-
| class_index | keep | keep | keep |
286-
| GTI index (new) | keep | keep | keep |
287-
288-
**Key observations:**
289-
290-
- **SymbolMap is the biggest win.** Each SymbolMap stores a
291-
SymbolSpan for every symbol reference in the file, plus
292-
VarDefSite, CallSite, and scope data. A typical file with
293-
100-500 symbols is several KB. Across thousands of files this
294-
adds up to tens or hundreds of MB.
295-
296-
- **ast_map entries are redundant with fqn_index** once indexing
297-
is complete. The slow linear fallback in `find_class_in_ast_map`
298-
should not fire when fqn_index is fully populated. Go-to-definition
299-
can re-parse on demand using the file path from class_index.
300-
301-
- **Vendor files are rarely edited but can be diagnosed.** Users
302-
working in monorepos or with `--prefer-source` packages edit
303-
vendor files directly, and diagnostics run on any file open in
304-
the editor. Tiered storage must still keep enough data to
305-
support diagnostics for open vendor files, but non-open vendor
306-
files only need ClassInfo for type resolution and class_index
307-
for go-to-definition file lookup.
308-
309-
- **Go-to-implementation currently scans all ast_map entries.**
310-
A dedicated GTI index (parent FQN to list of child FQNs, built
311-
during indexing) would decouple it from ast_map and allow
312-
ast_map entries for non-open files to be dropped without
313-
breaking implementation search. GTI needs vendor data (to find
314-
chains through vendor classes) but only the parent/child
315-
relationship, not the full per-file maps.
316-
317-
- **Find-references only needs SymbolMaps for user code.** These
318-
could be built on demand (parse, scan, drop) rather than kept
319-
resident.
320-
321-
- **Analyse mode benefits from laziness.** It never loads vendor
322-
files that are not referenced by any user chain. LSP mode with
323-
full vendor indexing would load everything since it cannot
324-
predict what the user will type next. This makes the tiered
325-
cleanup more important for LSP than for analyse.
326-
327-
### Implementation sketch
328-
329-
1. Track which URIs are "open" (already done via `open_files`).
330-
2. On `did_close`, drop the SymbolMap, use_map, namespace_map,
331-
parse_errors, and ast_map entries for that URI. The fqn_index
332-
entry (Arc\<ClassInfo\>) stays.
333-
3. For vendor files, use `parse_and_cache_content` (not
334-
`update_ast`) so SymbolMaps are never created. After indexing,
335-
sweep vendor URIs out of ast_map/use_map/namespace_map.
336-
4. Build a dedicated GTI index during indexing so that
337-
`find_implementors` does not need ast_map.
338-
5. For find-references, build SymbolMaps on demand by re-parsing
339-
from disk.
340-
341-
---
342-
343266
## P14. Eager docblock parsing into structured fields
344267

345268
**Impact: Medium · Effort: Medium**
@@ -439,7 +362,7 @@ a `PhpVersion` parameter or build the filtered maps inline.
439362
Low priority. The current `RwLock` overhead is unmeasurable in
440363
practice (~10-20 ns per completion request). Worth revisiting if
441364
the stub indexes grow significantly or if `Backend` construction
442-
is restructured for other reasons (e.g. P13 tiered storage).
365+
is restructured for other reasons.
443366

444367
---
445368

src/definition/implementation.rs

Lines changed: 41 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -625,28 +625,49 @@ impl Backend {
625625
// Track by FQN to avoid short-name collisions across namespaces.
626626
let mut seen_fqns: HashSet<String> = HashSet::new();
627627

628-
// ── Phase 1: scan ast_map ───────────────────────────────────────
629-
// Collect all candidate classes first, then drop the lock before
630-
// calling class_loader (which may re-lock ast_map).
631-
let ast_candidates: Vec<ClassInfo> = {
632-
let map = self.ast_map.read();
633-
map.values()
634-
.flat_map(|classes| classes.iter().map(|c| ClassInfo::clone(c)))
635-
.collect()
628+
// ── Phase 1: GTI index lookup ───────────────────────────────────
629+
// Use the reverse inheritance index for O(1) lookup of classes
630+
// that directly extend/implement/use the target. Then
631+
// recursively collect transitive children.
632+
let gti_candidates: Vec<String> = {
633+
let gti = self.gti_index.read();
634+
if direct_only {
635+
gti.get(target_fqn).cloned().unwrap_or_default()
636+
} else {
637+
// Transitive: BFS collecting all descendants.
638+
let mut all_children: Vec<String> = Vec::new();
639+
let mut queue: Vec<String> = vec![target_fqn.to_string()];
640+
let mut visited: HashSet<String> = HashSet::new();
641+
visited.insert(target_fqn.to_string());
642+
while let Some(parent) = queue.pop() {
643+
if let Some(children) = gti.get(&parent) {
644+
for child in children {
645+
if visited.insert(child.clone()) {
646+
all_children.push(child.clone());
647+
queue.push(child.clone());
648+
}
649+
}
650+
}
651+
}
652+
all_children
653+
}
636654
};
637655

638-
for cls in &ast_candidates {
639-
let cls_fqn = crate::util::build_fqn(&cls.name, cls.file_namespace.as_deref());
640-
if self.class_implements_or_extends(
641-
cls,
642-
target_short,
643-
target_fqn,
644-
class_loader,
645-
include_abstract,
646-
direct_only,
647-
) && seen_fqns.insert(cls_fqn)
648-
{
649-
result.push(cls.clone());
656+
for child_fqn in &gti_candidates {
657+
if seen_fqns.contains(child_fqn) {
658+
continue;
659+
}
660+
if let Some(cls) = class_loader(child_fqn) {
661+
if !direct_only {
662+
if cls.kind == ClassLikeKind::Interface {
663+
continue;
664+
}
665+
if cls.is_abstract && !include_abstract {
666+
continue;
667+
}
668+
}
669+
seen_fqns.insert(child_fqn.clone());
670+
result.push(Arc::unwrap_or_clone(cls));
650671
}
651672
}
652673

src/lib.rs

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,16 @@ pub struct Backend {
376376
/// inside `ClassInfo.methods`; future phases will make this the
377377
/// authoritative source and shrink `ClassInfo.methods` to just names.
378378
pub(crate) method_store: types::MethodStore,
379+
/// Reverse inheritance index: parent FQN → list of child FQNs.
380+
///
381+
/// For each class/interface/trait, maps the FQNs of its parents
382+
/// (parent_class, interfaces, used_traits) to the child's FQN.
383+
/// Used by `find_implementors` for O(1) lookup of direct children
384+
/// instead of scanning all `ast_map` entries.
385+
///
386+
/// Populated incrementally in `update_ast_inner` and
387+
/// `parse_and_cache_content_versioned` as files are parsed.
388+
pub(crate) gti_index: Arc<RwLock<HashMap<String, Vec<String>>>>,
379389
/// Embedded PHP stubs for built-in functions (e.g. `array_map`,
380390
/// `str_contains`, …). Maps function name → raw PHP source code.
381391
///
@@ -651,6 +661,7 @@ impl Backend {
651661
stub_constant_index: RwLock::new(stubs::build_stub_constant_index()),
652662
resolved_class_cache: virtual_members::new_resolved_class_cache(),
653663
method_store: Arc::new(RwLock::new(HashMap::new())),
664+
gti_index: Arc::new(RwLock::new(HashMap::new())),
654665
php_version: Mutex::new(types::PhpVersion::default()),
655666
diag_version: Arc::new(AtomicU64::new(0)),
656667
diag_notify: Arc::new(tokio::sync::Notify::new()),
@@ -727,6 +738,7 @@ impl Backend {
727738
stub_constant_index: RwLock::new(HashMap::new()),
728739
resolved_class_cache: virtual_members::new_resolved_class_cache(),
729740
method_store: Arc::new(RwLock::new(HashMap::new())),
741+
gti_index: Arc::new(RwLock::new(HashMap::new())),
730742
php_version: Mutex::new(types::PhpVersion::default()),
731743
diag_version: Arc::new(AtomicU64::new(0)),
732744
diag_notify: Arc::new(tokio::sync::Notify::new()),
@@ -977,6 +989,57 @@ impl Backend {
977989
}
978990
}
979991

992+
/// Populate the GTI (go-to-implementation) reverse inheritance index
993+
/// for the given classes. For each class, inserts the class's FQN
994+
/// into the child list of every parent (parent_class, interfaces,
995+
/// used_traits).
996+
pub(crate) fn populate_gti_index(&self, classes: &[Arc<ClassInfo>]) {
997+
let mut gti = self.gti_index.write();
998+
for cls in classes {
999+
if cls.name.starts_with("__anonymous@") {
1000+
continue;
1001+
}
1002+
let child_fqn = cls.fqn().to_string();
1003+
1004+
if let Some(ref parent) = cls.parent_class {
1005+
let parent_str = parent.to_string();
1006+
let children = gti.entry(parent_str).or_default();
1007+
if !children.contains(&child_fqn) {
1008+
children.push(child_fqn.clone());
1009+
}
1010+
}
1011+
for iface in &cls.interfaces {
1012+
let iface_str = iface.to_string();
1013+
let children = gti.entry(iface_str).or_default();
1014+
if !children.contains(&child_fqn) {
1015+
children.push(child_fqn.clone());
1016+
}
1017+
}
1018+
for tr in &cls.used_traits {
1019+
let tr_str = tr.to_string();
1020+
let children = gti.entry(tr_str).or_default();
1021+
if !children.contains(&child_fqn) {
1022+
children.push(child_fqn.clone());
1023+
}
1024+
}
1025+
}
1026+
}
1027+
1028+
/// Remove all GTI entries where `child_fqn` appears as a child.
1029+
/// Called before re-populating when a file is re-parsed.
1030+
pub(crate) fn evict_gti_for_fqns(&self, fqns: &[String]) {
1031+
if fqns.is_empty() {
1032+
return;
1033+
}
1034+
let fqn_set: HashSet<&str> = fqns.iter().map(|s| s.as_str()).collect();
1035+
let mut gti = self.gti_index.write();
1036+
for children in gti.values_mut() {
1037+
children.retain(|child| !fqn_set.contains(child.as_str()));
1038+
}
1039+
// Remove empty entries to avoid unbounded growth.
1040+
gti.retain(|_, v| !v.is_empty());
1041+
}
1042+
9801043
/// Create a shallow clone of this `Backend` that shares every
9811044
/// `Arc`-wrapped field with the original.
9821045
///
@@ -1023,6 +1086,7 @@ impl Backend {
10231086
stub_index: RwLock::new(self.stub_index.read().clone()),
10241087
resolved_class_cache: Arc::clone(&self.resolved_class_cache),
10251088
method_store: Arc::clone(&self.method_store),
1089+
gti_index: Arc::clone(&self.gti_index),
10261090
stub_function_index: RwLock::new(self.stub_function_index.read().clone()),
10271091
stub_constant_index: RwLock::new(self.stub_constant_index.read().clone()),
10281092
php_version: Mutex::new(self.php_version()),

src/parser/ast_update.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -519,8 +519,10 @@ impl Backend {
519519

520520
// Populate the global method store for O(1) method lookup.
521521
self.evict_methods_for_fqns(&old_fqns);
522+
self.evict_gti_for_fqns(&old_fqns);
522523
if let Some(arc_classes) = self.ast_map.read().get(&uri_string) {
523524
self.populate_method_store(arc_classes);
525+
self.populate_gti_index(arc_classes);
524526
}
525527

526528
self.symbol_maps

src/resolution.rs

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -443,15 +443,11 @@ impl Backend {
443443
self.ast_map
444444
.write()
445445
.insert(uri.to_owned(), arc_classes.clone());
446-
self.use_map.write().insert(uri.to_owned(), file_use_map);
447-
self.namespace_map.write().insert(
448-
uri.to_owned(),
449-
vec![crate::types::NamespaceSpan {
450-
namespace: file_namespace.clone(),
451-
start: 0,
452-
end: content.len() as u32,
453-
}],
454-
);
446+
// NOTE: use_map and namespace_map are intentionally NOT stored
447+
// for lazily-loaded files (vendor, stubs, PSR-4). These maps
448+
// are only needed for files open in the editor (populated by
449+
// update_ast_inner). Skipping them reduces memory usage across
450+
// thousands of vendor files. See P13 (tiered storage).
455451

456452
// Populate the fqn_index so that `find_class_in_ast_map` can
457453
// resolve these classes via O(1) hash lookup.
@@ -476,6 +472,7 @@ impl Backend {
476472
self.evict_methods_for_fqns(&fqns);
477473
}
478474
self.populate_method_store(&arc_classes);
475+
self.populate_gti_index(&arc_classes);
479476

480477
// Remove newly-discovered FQNs from the negative-result cache.
481478
{

0 commit comments

Comments
 (0)