Open
Description
After 52a17f1 we are keying cache entries in CachingHiveMetastore on set of columns (previously stats for all the columns were pulled from metastore).
As a result we may end up with more roundtrips to metastore for a query which happens to consult HiveMetastore multiple times for different set of columns of a single table.
In case communication with metastore is costly it causes performance regression.
Edit: actually the caching was on per-column basis already before 52a17f1 since #16203, yet 52a17f1 changes call pattern so we observe more calls to CachingHiveMetastore
sometimes. E.g. for query:
CREATE TABLE test_self_join_table AS SELECT 2 AS age, 0 parent, 3 AS id";
SELECT child.age, parent.age FROM test_self_join_table child JOIN test_self_join_table parent ON child.parent = parent.id";