Rework user-facing learning materials; fix Replay tutorial bug; remove package logging side-effects by Yurashku · Pull Request #42 · Yurashku/OffPolicyLab

Yurashku · 2026-04-14T14:35:11Z

Motivation

Fix an obvious tutorial bug where replay_value(...) was called with probabilities instead of policy-B actions and remove noisy package-level logging side-effects on import.
Improve onboarding and interpretation so users can run the library on their own data productively and understand when OPE outputs are trustworthy.
Reorganize examples and docs by user intent (quickstart, synthetic comparisons, interpretation guidance) while keeping estimator math and architecture unchanged.

Description

Remove logging.basicConfig(...) from src/policyscope/__init__.py so importing the package no longer mutates global logging configuration.
Fix legacy tutorial bug by changing replay usage to compute a_B = policyB.action_argmax(...) and call replay_value(logged.df, a_B, ...), and compute overlap from action matches.
Add a concise quickstart notebook examples/quickstart_own_data_ru.ipynb that demonstrates BanditSchema, LoggedBanditDataset, compare_policies(...), propensity-source modes, and how to read Delta, CI, p_value, diagnostics and trust_level.
Add a pedagogical synthetic notebook examples/compare_estimators_vs_oracle_ru.ipynb that compares Replay / IPS / SNIPS / DM / DR / SNDR / Switch-DR against oracle under healthy and poor-overlap scenarios and logged-vs-estimated propensity contrasts, and add a practical RU interpretation guide docs/how_to_interpret_ope_outputs_ru.md and README navigation updates.

Testing

Executed the two new notebooks end-to-end and saved clean outputs for examples/quickstart_own_data_ru.ipynb and examples/compare_estimators_vs_oracle_ru.ipynb (notebook execution completed successfully).
Ran docs consistency checks (tests/test_docs_consistency.py) under the repository Python path and they passed.
Ran the full test suite with PYTHONPATH=src pytest -q and 59 passed (note: an initial pytest without PYTHONPATH=src failed at collection due to import path configuration, which is addressed by the PYTHONPATH=src invocation used in CI/validation).

Codex Task

Yurashku · 2026-04-14T14:55:01Z

Updated plan from PR #42 as the current application point.

What PR #42 already closes

removes package-level logging side effects on import;
fixes the obvious tutorial Replay bug;
introduces a split user-facing structure:
- quickstart for own data,
- synthetic estimator comparison notebook,
- RU interpretation guide;
improves README navigation.

What still remains as the next true priority

These are now the most important remaining issues because they affect correctness/portability for real user datasets:

Target-type inference for outcome models
The outcome-model path still keys binary-vs-continuous behavior off the target name rather than target semantics/data.
This breaks the promised bring-your-own-data story for binary targets with names other than accept.
Action-label-safe behavior diagnostics
Behavior-model top-1 diagnostics should work correctly for arbitrary valid action labels, not only 0..k-1 integer labels.
Docs cleanup of migration-era artifacts
Some docs still read like they are in the middle of an architecture migration.
Now that the new architecture is already in place, docs should be rewritten in stable-present tense rather than migration-phase language.

Updated plan

Phase 1 — correctness / portability fixes

fix target-type inference for outcome models
fix action-label bug in nuisance diagnostics
add focused tests for both

Phase 2 — docs cleanup from migration-era wording

clean docs/architecture.md
remove or sharply reduce sections that are artifacts of past migration work:
- migration status
- migration strategy
- non-goals of this phase
- stale migration-note wording where it no longer helps
keep the docs describing the current stable architecture, not the past transition

Phase 3 — final repo polish

tighten canonical user path (compare_policies(...) as the main orchestration path)
keep OPEEvaluator as convenience wrapper unless explicitly expanded
re-check README / examples / docs cross-links after the above fixes

First next Codex prompt (Phase 1 only)

Fix the next correctness-and-portability issues in the Policyscope repository without changing estimator mathematics.

Important context:

PR Rework user-facing learning materials; fix Replay tutorial bug; remove package logging side-effects #42 already improved the user-facing learning materials, removed import-time logging side effects, and split the learning materials by user intent.
The next priority is not more notebook polish, but fixing real portability/correctness issues that still break the “bring your own data” story.
Keep estimator formulas unchanged.
Do not add new estimators or new architecture layers.
Keep scope focused.

Main goals:

Fix target-type inference for outcome models
Current outcome-model fitting still keys binary-vs-continuous behavior off the target name rather than the target semantics / data.

Update the outcome-model path so that binary-vs-continuous behavior is determined from the target values/data itself rather than a hard-coded target name.

Requirements:

a binary target with values like {0,1} or {0.0,1.0} should use the classification-style outcome path even if the column name is not "accept"
continuous targets should keep the regression path
keep backward compatibility where practical
update any related docs/examples only if necessary

Fix action-label handling in nuisance diagnostics
Current behavior-model top-1 diagnostics should work correctly even when action labels are not 0..k-1 integers (for example strings or arbitrary labels).

Requirements:

do not compare raw np.argmax(...) indices directly to original action labels
use the actual action-space / class labels consistently
make behavior diagnostics robust for arbitrary valid action labels
keep outputs structurally compatible where practical

Add focused tests
Add targeted tests covering:

binary target with a non-accept column name using the correct outcome-model path
continuous target still using the regression path
nuisance behavior top-1 diagnostics with nontrivial action labels (for example strings)
no regressions for existing standard integer-label workflows

Minimal docs sync only if needed
If any public-facing docs make claims that are now better expressed after these fixes, update them concisely.
Do not start a broad docs rewrite in this pass.

Non-goals:

no tutorial redesign in this pass
no new estimators
no trust-layer redesign
no validation-harness expansion
no broad docs cleanup yet

Deliverables:

corrected target-type inference behavior
corrected action-label-safe nuisance diagnostics
focused tests
minimal doc sync if needed

Expected next pass after that

After Phase 1 is merged, the next Codex pass should be a docs cleanup pass focused on removing migration-era wording and making the architecture docs describe the current stable state.

Yurashku · 2026-04-15T10:04:22Z

Next step after core portability fixes: focused docs cleanup.

Current status:

target-type inference fix is now present in estimators.py;
action-label-safe behavior diagnostics are now present in nuisance_diagnostics.py;
PR Rework user-facing learning materials; fix Replay tutorial bug; remove package logging side-effects #42 still contains migration-era wording in docs/architecture.md.

Next Codex prompt:

Perform a focused documentation cleanup pass for the Policyscope repository now that the major architecture migration is complete.

Important context:

The core architecture is already in place.
Recent passes already added:
- data contract layer
- official comparison/orchestration API
- inference/trust metadata
- user-facing quickstart and synthetic comparison notebooks
- interpretation guide
- target-type inference fixes
- action-label-safe diagnostics fixes
The problem now is that some docs still read like the repository is in the middle of an architecture migration, even though that migration is effectively complete.
This pass should clean wording and structure only.
Do not change estimator code.
Do not redesign notebooks.
Do not add new methodology features.

Main goals:

Clean docs/architecture.md
Rewrite it so it describes the repository in stable-present tense rather than migration-phase language.

Specifically remove or sharply reduce sections that are artifacts of past transition work, such as:

“Статус текущей миграции”
“Migration strategy”
“Non-goals этой фазы”
stale “migration note” wording that no longer helps users
wording like “в этой фазе добавлен”, “готов для поэтапного внедрения”, “текущий этап” when the feature is now part of the current architecture

Keep the useful architectural content:

repository purpose
domain model
layer boundaries
official orchestration path compare_policies(...)
scalar target abstraction / multi-target wrapper
inference and diagnostics
propensity source modes
nuisance diagnostics
cross-fitting support and limitations
validation harness role and limitations
recommended defaults / trust metadata

The document should read as: “this is how Policyscope is structured now”, not “this is how we migrated it”.

Light README cleanup
Review README.md and remove or tighten leftover transitional / legacy wording if any remains.

Keep README as a navigator:

quickstart notebook
synthetic comparison notebook
interpretation guide
architecture doc
validation harness doc
official high-level entrypoint

Do not turn README back into a long migration document.

Preserve methodological caution
Do not remove important caveats about:

OPE not being an automatic replacement for A/B tests
diagnostics/trust metadata being heuristic support signals, not guarantees
validation harness not guaranteeing correctness on arbitrary real-world logs

Keep scope minimal
Do not touch estimator math.
Do not touch notebook logic or outputs.
Do not move files.
Do not add new docs unless absolutely necessary.
Tests/checks
Keep docs consistency tests passing.
Update tests only if they explicitly check changed headings or links.

Deliverables:

cleaned docs/architecture.md
lightly cleaned README.md if needed
no migration-era narrative unless it still adds real user value
stable-present-tense documentation aligned with the current repository state

After this pass, the next review should focus on final consistency across README, docs, examples, and public API surface.

Yurashku · 2026-04-15T11:02:43Z

Practical UX gap to consider before/after merge:

A user may arrive with a dataframe that already contains:

customer features,
logged actions from policy A,
observed rewards,
and recommendations/actions from policy B (action_B),
without access to policyB.action_probs(...) or probabilities from the black-box policy.

In the current API this is still representable by wrapping action_B as a deterministic one-hot policyB, but the repo should make this pattern explicit in docs/examples.

Recommended follow-up:

add a small helper/adapter example such as DeterministicActionColumnPolicy in the quickstart or docs;
explain that if B only provides recommendations, pi_B(a|x) becomes one-hot;
clarify that this is valid to run, but overlap/ESS may become the limiting factor;
teach users to read replay overlap, ESS ratio, weight tails, warnings, and trust_level before trusting the result.

This is not necessarily a blocker for PR #42, but it is important for real bring-your-own-data usability.

Yurashku · 2026-04-15T11:09:23Z

Another practical docs/tutorial gap: explain propensity_A clearly.

Users need to understand that propensity_A is the probability with which the historical behavior/logging policy A selected the actually logged action for a given user/context: pi_A(a_A | x).

Recommended docs/example addition:

explain where propensity_A comes from in real product systems;
distinguish logged propensity from estimated propensity;
give examples:
1. randomized / epsilon-greedy logging policy where propensity is known by design;
2. ranker/recommender that can return action probabilities or randomized allocation probabilities;
3. deterministic historical policy where true propensity is unknown/unavailable;
explain that if true logged propensities are unavailable, propensity_source='estimated' or auto fallback can fit a behavior model, but this is weaker and should lower trust;
clarify that wrong or reconstructed propensities can bias IPS/DR-family estimates.

This should probably be added to the quickstart and/or docs/how_to_interpret_ope_outputs_ru.md.

Yurashku · 2026-04-15T13:19:12Z

Architectural note on where to place DeterministicActionColumnPolicy.

Recommended placement: create a new module src/policyscope/policy_adapters.py.

Rationale:

src/policyscope/policies.py currently contains synthetic/demo policies (GreedyUnknown, EpsilonGreedy, SoftmaxFromScores, make_policy). Putting real-data adapters there would blur synthetic policy examples with production-facing adapters.
src/policyscope/data.py should stay focused on dataset/schema validation (BanditSchema, LoggedBanditDataset), not target-policy behavior.
src/policyscope/comparison.py should remain orchestration logic, not input adapter definitions.
A dedicated policy_adapters.py makes the concept explicit: these are small user-facing adapter objects that turn real-world artifacts into the policyB.action_probs(df) contract.

Suggested public API:

from policyscope.policy_adapters import DeterministicActionColumnPolicy

policyB = DeterministicActionColumnPolicy(
    action_col="action_B",
    action_space=["card", "loan", "deposit"],
)

Implementation expectations:

action_probs(df) -> np.ndarray returns one-hot probabilities over action_space;
validates action_col exists;
validates all observed actions are present in action_space;
raises clear ValueError on unknown actions;
supports arbitrary labels, including strings;
action_argmax(df) should return real action labels, not column indices.

Do not auto-wrap action_B inside compare_policies(...) at this stage. Explicit adapter construction is more transparent and avoids magic behavior.

Optional export choice:

safe minimal: do not export from package root, import from policyscope.policy_adapters;
user-friendly later: export from root only if the package wants a broader top-level API surface.

Rework user-facing learning materials and quickstart flow

53db9f7

Yurashku added the codex label Apr 14, 2026 — with ChatGPT Codex Connector

Yurashku added 2 commits April 14, 2026 20:23

Fix binary outcome inference and label-safe behavior diagnostics

ac4e904

Harden auto feature inference against target leakage

e898152

Yurashku added 2 commits April 15, 2026 13:09

Clean architecture docs to stable present-tense wording

03624c7

Align docs around compare_policies as official path

dfba281

Yurashku added 2 commits April 15, 2026 15:34

Clarify real-world policy adapters and propensity guidance

1286cd1

Polish quickstart adapter validation and propensity caveat

6eca1a4

Add reusable deterministic action-column policy adapter

7c25e73

Yurashku merged commit cff2953 into main Apr 15, 2026
0 of 2 checks passed

Yurashku deleted the codex/new-task branch April 15, 2026 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework user-facing learning materials; fix Replay tutorial bug; remove package logging side-effects#42

Rework user-facing learning materials; fix Replay tutorial bug; remove package logging side-effects#42
Yurashku merged 8 commits into
mainfrom
codex/new-task

Yurashku commented Apr 14, 2026

Uh oh!

Yurashku commented Apr 14, 2026

Uh oh!

Yurashku commented Apr 15, 2026

Uh oh!

Yurashku commented Apr 15, 2026

Uh oh!

Yurashku commented Apr 15, 2026

Uh oh!

Yurashku commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Yurashku commented Apr 14, 2026

Motivation

Description

Testing

Uh oh!

Yurashku commented Apr 14, 2026

What PR #42 already closes

What still remains as the next true priority

Updated plan

Phase 1 — correctness / portability fixes

Phase 2 — docs cleanup from migration-era wording

Phase 3 — final repo polish

First next Codex prompt (Phase 1 only)

Expected next pass after that

Uh oh!

Yurashku commented Apr 15, 2026

Uh oh!

Yurashku commented Apr 15, 2026

Uh oh!

Yurashku commented Apr 15, 2026

Uh oh!

Yurashku commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant