-
-
Notifications
You must be signed in to change notification settings - Fork 23.5k
Description
Godot has longer compile times than we'd like. According to previous analyses, the main cause of this is parsing of header files.
Workaround
If you're just looking to decrease your personal compile time during development, use scu_build=yes with SCons:
scons scu_build=yesIf you're looking to help decrease the compile time for everyone, keep reading.
Methodology
To most effectively reduce the amount of cross includes between our headers, we need metrics. I came up with two:
- Fresh compile cost: An estimation of the from-scratch compile time caused the header. It is calculated from adding self time across all compile units' time traces, and partial responsibility over includees.
- Optimizing this is useful for all contributors, and anyone else compiling the engine.
- Recompile cost: An estimation of the potential of each file to cause recompiles. It is calculated from self time across all compile units' time traces, weighed by its size (to estimate churn potential), and partial responsibility over includees.
- Optimizing this is useful for regular contributors and CI.
Both are mainly based on clang --ftime-trace and include trees (while the second one also uses file size to estimate churn potential). I've consolidated my efforts into a public tool:
https://github.com/Ivorforce/clang-project-profiler
I also used betweenness centrality in an earlier version. You can find the source code for that here:
measure_includes_centrality.py
#!/usr/bin/env python3
import io
import json
import subprocess
import os
import sys
import pathlib
import shlex
from collections import deque, defaultdict
import concurrent.futures
import multiprocessing
def graph_from_compile_entry(idx: int, entry: dict):
directory = entry.get("directory", os.getcwd())
command = entry.get("command")
arguments = entry.get("arguments")
# Prefer 'arguments' if present, otherwise split 'command'
if arguments:
cmd = arguments
else:
cmd = shlex.split(command)
if idx % 10 == 0:
print(f"{idx:04d}", cmd[2])
cmd = [cmd[0]] + cmd[3:] + ["--trace-includes"]
result = subprocess.run(cmd, cwd=directory, check=True, capture_output=True, text=True)
# Not sure why but it prints to stderr
graph = parse_graph(result.stderr)
return graph
def parse_graph(input_str: str) -> dict[str, set[str]]:
graph: dict[str, set[str]] = defaultdict(set)
stack: list[tuple[int, str]] = []
for line in input_str.strip().splitlines():
depth = len(line) - len(line.lstrip('.'))
node = line[depth:].strip()
# Adjust the stack to the current depth
while stack and stack[-1][0] >= depth:
stack.pop()
# If there is a parent node, link it
if stack:
graph[stack[-1][1]].add(node)
stack.append((depth, node))
return dict(graph)
def betweenness_centrality(graph: dict[str, set[str]], directed: bool = False) -> dict[str, float]:
# Include isolated nodes
all_nodes = set(graph.keys()) | {n for nbrs in graph.values() for n in nbrs}
graph = {n: graph.get(n, set()) for n in all_nodes}
centrality = dict.fromkeys(graph.keys(), 0.0)
for s in graph:
stack = []
predecessors = {v: [] for v in graph}
sigma = dict.fromkeys(graph, 0.0)
sigma[s] = 1.0
distance = dict.fromkeys(graph, -1)
distance[s] = 0
queue = deque([s])
while queue:
v = queue.popleft()
stack.append(v)
for w in graph[v]:
if distance[w] < 0:
queue.append(w)
distance[w] = distance[v] + 1
if distance[w] == distance[v] + 1:
sigma[w] += sigma[v]
predecessors[w].append(v)
delta = dict.fromkeys(graph, 0.0)
while stack:
w = stack.pop()
for v in predecessors[w]:
delta[v] += (sigma[v] / sigma[w]) * (1.0 + delta[w])
if w != s:
centrality[w] += delta[w]
# Normalise
n = len(graph)
if n > 2:
scale = 1.0 / ((n - 1) * (n - 2))
if not directed:
scale *= 0.5
for v in centrality:
centrality[v] *= scale
return centrality
def main():
# Default filename
filename = "compile_commands.json"
if len(sys.argv) > 1:
filename = sys.argv[1]
# Read the compile_commands.json file
try:
with open(filename, "r") as f:
compile_commands = json.load(f)
except Exception as e:
print(f"Error reading {filename}: {e}")
sys.exit(1)
print(f"Starting {len(compile_commands)} commands...")
all_graph: dict[str, set[str]] = {}
failure_count = 0
executor = concurrent.futures.ProcessPoolExecutor(multiprocessing.cpu_count())
futures = [executor.submit(graph_from_compile_entry, *item) for item in enumerate(compile_commands)]
completed, not_completed = concurrent.futures.wait(futures)
for future in completed:
try:
graph = future.result()
for filename, targets in graph.items():
prev_targets: set[str] = all_graph.setdefault(filename, set())
prev_targets.update(targets)
except:
failure_count += 1
centrality = betweenness_centrality(all_graph)
centrality = { key: value for key, value in centrality.items() if key.startswith("./") and value > 0 }
centrality_list = list(centrality.items())
centrality_list.sort(key=lambda kv: kv[1], reverse=True)
final_string = "\n".join(f"{kv[0]}: {kv[1]}" for kv in centrality_list)
print(f"Done. {failure_count} object files failed to analyze.")
result_path = pathlib.Path("./betweenness-centrality.txt")
result_path.write_text(final_string)
print(result_path.absolute())
if __name__ == "__main__":
main()Tracker
Below is the top 100 list of headers that should be investigated. The size indicates how many bytes this header causes, in total, to be parsed across all compile units.
Please remember that not all of these files can be 'fixed'. Also, note that this list was gathered on macOS. On other systems, libcpp entries will evaluate differently.
last update: 2025-10-13
Fresh compile cost
core/object/class_db.h: 378s
core/object/ref_counted.h: 358s
core/object/object.h: 316s
<libcpp>/string: 293s
<libcpp>/system_error: 272s
<libcpp>/mutex: 246s
core/os/mutex.h: 239s
core/object/message_queue.h: 237s
core/io/resource.h: 230s
core/os/thread_safe.h: 225s
core/variant/binder_common.h: 197s
servers/rendering/rendering_server.h: 190s
<libcpp>/algorithm: 180s
<libcpp>/__system_error/error_category.h: 168s
scene/gui/control.h: 151s
scene/main/canvas_item.h: 131s
core/variant/variant.h: 123s
core/object/gdvirtual.gen.inc: 120s
core/object/script_instance.h: 119s
core/object/method_bind.h: 117s
core/object/callable_method_pointer.h: 103s
<libcpp>/string_view: 103s
<libcpp>/__system_error/error_condition.h: 98s
<libcpp>/__system_error/error_code.h: 93s
core/typedefs.h: 89s
editor/plugins/editor_plugin.h: 86s
scene/main/node.h: 86s
<libcpp>/utility: 84s
core/variant/typed_dictionary.h: 82s
scene/3d/node_3d.h: 79s
core/variant/typed_array.h: 79s
servers/rendering/rendering_device.h: 79s
servers/display/display_server.h: 75s
scene/resources/texture.h: 71s
<libcpp>/memory: 71s
<libcpp>/__system_error/system_error.h: 69s
core/string/ustring.h: 67s
<libcpp>/iterator: 67s
scene/gui/container.h: 66s
core/templates/safe_refcount.h: 66s
scene/resources/material.h: 65s
scene/3d/camera_3d.h: 64s
editor/editor_node.h: 62s
<libcpp>/atomic: 61s
core/os/rw_lock.h: 59s
core/math/geometry_3d.h: 57s
<libcpp>/shared_mutex: 57s
<libcpp>/functional: 55s
core/io/resource_uid.h: 55s
scene/resources/3d/world_3d.h: 55sRecompile cost
core/object/class_db.h: 469s
core/object/ref_counted.h: 397s
core/object/object.h: 344s
core/variant/variant.h: 294s
core/variant/binder_common.h: 272s
servers/rendering/rendering_server.h: 257s
core/io/resource.h: 252s
scene/gui/control.h: 210s
scene/main/canvas_item.h: 186s
core/object/method_bind.h: 170s
core/object/callable_method_pointer.h: 154s
servers/rendering/rendering_device.h: 138s
core/object/script_instance.h: 137s
core/object/gdvirtual.gen.inc: 133s
core/string/ustring.h: 130s
servers/display/display_server.h: 111s
scene/main/node.h: 110s
scene/gui/container.h: 94s
scene/3d/node_3d.h: 91s
core/extension/gdextension_interface.h: 91s
editor/plugins/editor_plugin.h: 83s
core/variant/typed_array.h: 71s
scene/resources/material.h: 71s
editor/editor_node.h: 68s
scene/main/scene_tree.h: 66s
scene/resources/texture.h: 65s
core/object/message_queue.h: 64s
scene/resources/font.h: 63s
scene/resources/3d/world_3d.h: 62s
core/io/image.h: 62s
core/io/resource_uid.h: 61s
core/variant/variant_internal.h: 61s
core/os/os.h: 61s
core/variant/callable_bind.h: 59s
core/string/char_utils.h: 58s
core/string/string_name.h: 54s
scene/3d/camera_3d.h: 54s
scene/resources/theme.h: 54s
scene/resources/environment.h: 54s
core/string/char_range.inc: 53s
core/variant/method_ptrcall.h: 53s
scene/resources/mesh.h: 48s
servers/rendering/rendering_device_driver.h: 47s
core/input/input.h: 47s
core/math/vector3.h: 46s
scene/resources/sky.h: 45s
servers/rendering/rendering_device_graph.h: 44s
scene/2d/node_2d.h: 42s
core/error/error_macros.h: 42s
core/io/logger.h: 37sHow to contribute
You can contribute by picking out any of the above files, and investigating its contents, includes, and includers. Some ideas:
- Can any include be removed? Prioritize includees that also appear on the list, otherwise your change might not be impactful.
- Can some includers be changed not to include this file? Prioritize includers that also appear on the list, otherwise your change might not be impactful.
- Can this file be split into independent headers with less contents or includes?
If you believe there's nothing that can be improved about a header, please comment this on this tracker.
If you manage to remove the include, recompile the engine. It is likely that some files will error on compile, because they're missing includes. In small numbers, this is expected. However, if too many files fail to compile, it may be time to reconsider: This may mean that the include was logical after all, and most files that includes the header you were trying to improve also needs to include the other include you removed.
In some cases, you may want to guard against regressions. As a guideline, you should guard against a regression if it is illogical for the header to need to include the file / type (and you expect it to happen by accident). To do this, use STATIC_ASSERT_INCOMPLETE_TYPE in the associated .cpp file of the header. You can find some examples of how it's done in the repository.
Lines 453 to 458 in 6d33ad2
| /// Enforces the requirement that a class is not fully defined. | |
| /// This can be used to reduce include coupling and keep compile times low. | |
| /// The check must be made at the top of the corresponding .cpp file of a header. | |
| #define STATIC_ASSERT_INCOMPLETE_TYPE(m_keyword, m_type) \ | |
| m_keyword m_type; \ | |
| static_assert(!is_fully_defined_v<m_type>, #m_type " was unexpectedly fully defined. Please check the include hierarchy of '" __FILE__ "' and remove includes that resolve the " #m_keyword "."); |
Finally, if you feel comfortable with your changes, submit a PR, and link back to this tracker. With enough of these kinds of PRs, hopefully we can decrease Godot's compile time to a minimum.
Metadata
Metadata
Assignees
Type
Projects
Status