Skip to content

Commit a84c23a

Browse files
committed
[IMP] snippets: do not return unneeded data from mp worker
In `convert_html_columns()`, we select 100MiB worth of DB tuples and pass them to a ProcessPoolExecutor together with a converter callable. So far, the converter returns all tuples, changed or unchanged together with the information if it has changed something. All this is returned through IPC to the parent process. In the parent process, the caller only acts on the changed tuples, though, the rest is ignored. In any scenario I've seen, only a small proportion of the input tuples is actually changed, meaning that a large proportion is returned through IPC unnecessarily. What makes it worse is that processing of the converted results in the parent process is often slower than the conversion, leading to two effects: 1) The results of all workers sit in the parent process's memory, possibly leading to MemoryError (upg-2021031) 2) The parallel processing is being serialized on the feedback, defeating a large part of the intended performance gains This commit fixes (1) by only returning `None` for unchanged tuples. An improvement for (2) follows in the next commit.
1 parent 062d11a commit a84c23a

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/util/snippets.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -267,7 +267,7 @@ def __call__(self, row):
267267
changes[column] = new_content
268268
if has_changed:
269269
changes["id"] = res_id
270-
return changes
270+
return changes if "id" in changes else None
271271

272272

273273
def convert_html_columns(cr, table, columns, converter_callback, where_column="IS NOT NULL", extra_where="true"):
@@ -310,7 +310,7 @@ def convert_html_columns(cr, table, columns, converter_callback, where_column="I
310310
for query in log_progress(split_queries, logger=_logger, qualifier=f"{table} updates"):
311311
cr.execute(query)
312312
for data in executor.map(convert, cr.fetchall(), chunksize=1000):
313-
if "id" in data:
313+
if data:
314314
cr.execute(update_query, data)
315315

316316

0 commit comments

Comments
 (0)