Skip to content

Rework user-facing learning materials; fix Replay tutorial bug; remove package logging side-effects#42

Merged
Yurashku merged 8 commits into
mainfrom
codex/new-task
Apr 15, 2026
Merged

Rework user-facing learning materials; fix Replay tutorial bug; remove package logging side-effects#42
Yurashku merged 8 commits into
mainfrom
codex/new-task

Conversation

@Yurashku
Copy link
Copy Markdown
Owner

Motivation

  • Fix an obvious tutorial bug where replay_value(...) was called with probabilities instead of policy-B actions and remove noisy package-level logging side-effects on import.
  • Improve onboarding and interpretation so users can run the library on their own data productively and understand when OPE outputs are trustworthy.
  • Reorganize examples and docs by user intent (quickstart, synthetic comparisons, interpretation guidance) while keeping estimator math and architecture unchanged.

Description

  • Remove logging.basicConfig(...) from src/policyscope/__init__.py so importing the package no longer mutates global logging configuration.
  • Fix legacy tutorial bug by changing replay usage to compute a_B = policyB.action_argmax(...) and call replay_value(logged.df, a_B, ...), and compute overlap from action matches.
  • Add a concise quickstart notebook examples/quickstart_own_data_ru.ipynb that demonstrates BanditSchema, LoggedBanditDataset, compare_policies(...), propensity-source modes, and how to read Delta, CI, p_value, diagnostics and trust_level.
  • Add a pedagogical synthetic notebook examples/compare_estimators_vs_oracle_ru.ipynb that compares Replay / IPS / SNIPS / DM / DR / SNDR / Switch-DR against oracle under healthy and poor-overlap scenarios and logged-vs-estimated propensity contrasts, and add a practical RU interpretation guide docs/how_to_interpret_ope_outputs_ru.md and README navigation updates.

Testing

  • Executed the two new notebooks end-to-end and saved clean outputs for examples/quickstart_own_data_ru.ipynb and examples/compare_estimators_vs_oracle_ru.ipynb (notebook execution completed successfully).
  • Ran docs consistency checks (tests/test_docs_consistency.py) under the repository Python path and they passed.
  • Ran the full test suite with PYTHONPATH=src pytest -q and 59 passed (note: an initial pytest without PYTHONPATH=src failed at collection due to import path configuration, which is addressed by the PYTHONPATH=src invocation used in CI/validation).

Codex Task

Copy link
Copy Markdown
Owner Author

Updated plan from PR #42 as the current application point.

What PR #42 already closes

  • removes package-level logging side effects on import;
  • fixes the obvious tutorial Replay bug;
  • introduces a split user-facing structure:
    • quickstart for own data,
    • synthetic estimator comparison notebook,
    • RU interpretation guide;
  • improves README navigation.

What still remains as the next true priority

These are now the most important remaining issues because they affect correctness/portability for real user datasets:

  1. Target-type inference for outcome models
    The outcome-model path still keys binary-vs-continuous behavior off the target name rather than target semantics/data.
    This breaks the promised bring-your-own-data story for binary targets with names other than accept.

  2. Action-label-safe behavior diagnostics
    Behavior-model top-1 diagnostics should work correctly for arbitrary valid action labels, not only 0..k-1 integer labels.

  3. Docs cleanup of migration-era artifacts
    Some docs still read like they are in the middle of an architecture migration.
    Now that the new architecture is already in place, docs should be rewritten in stable-present tense rather than migration-phase language.

Updated plan

Phase 1 — correctness / portability fixes

  • fix target-type inference for outcome models
  • fix action-label bug in nuisance diagnostics
  • add focused tests for both

Phase 2 — docs cleanup from migration-era wording

  • clean docs/architecture.md
  • remove or sharply reduce sections that are artifacts of past migration work:
    • migration status
    • migration strategy
    • non-goals of this phase
    • stale migration-note wording where it no longer helps
  • keep the docs describing the current stable architecture, not the past transition

Phase 3 — final repo polish

  • tighten canonical user path (compare_policies(...) as the main orchestration path)
  • keep OPEEvaluator as convenience wrapper unless explicitly expanded
  • re-check README / examples / docs cross-links after the above fixes

First next Codex prompt (Phase 1 only)

Fix the next correctness-and-portability issues in the Policyscope repository without changing estimator mathematics.

Important context:

Main goals:

  1. Fix target-type inference for outcome models
    Current outcome-model fitting still keys binary-vs-continuous behavior off the target name rather than the target semantics / data.

Update the outcome-model path so that binary-vs-continuous behavior is determined from the target values/data itself rather than a hard-coded target name.

Requirements:

  • a binary target with values like {0,1} or {0.0,1.0} should use the classification-style outcome path even if the column name is not "accept"
  • continuous targets should keep the regression path
  • keep backward compatibility where practical
  • update any related docs/examples only if necessary
  1. Fix action-label handling in nuisance diagnostics
    Current behavior-model top-1 diagnostics should work correctly even when action labels are not 0..k-1 integers (for example strings or arbitrary labels).

Requirements:

  • do not compare raw np.argmax(...) indices directly to original action labels
  • use the actual action-space / class labels consistently
  • make behavior diagnostics robust for arbitrary valid action labels
  • keep outputs structurally compatible where practical
  1. Add focused tests
    Add targeted tests covering:
  • binary target with a non-accept column name using the correct outcome-model path
  • continuous target still using the regression path
  • nuisance behavior top-1 diagnostics with nontrivial action labels (for example strings)
  • no regressions for existing standard integer-label workflows
  1. Minimal docs sync only if needed
    If any public-facing docs make claims that are now better expressed after these fixes, update them concisely.
    Do not start a broad docs rewrite in this pass.

Non-goals:

  • no tutorial redesign in this pass
  • no new estimators
  • no trust-layer redesign
  • no validation-harness expansion
  • no broad docs cleanup yet

Deliverables:

  • corrected target-type inference behavior
  • corrected action-label-safe nuisance diagnostics
  • focused tests
  • minimal doc sync if needed

Expected next pass after that

After Phase 1 is merged, the next Codex pass should be a docs cleanup pass focused on removing migration-era wording and making the architecture docs describe the current stable state.

Copy link
Copy Markdown
Owner Author

Next step after core portability fixes: focused docs cleanup.

Current status:

Next Codex prompt:


Perform a focused documentation cleanup pass for the Policyscope repository now that the major architecture migration is complete.

Important context:

  • The core architecture is already in place.
  • Recent passes already added:
    • data contract layer
    • official comparison/orchestration API
    • inference/trust metadata
    • user-facing quickstart and synthetic comparison notebooks
    • interpretation guide
    • target-type inference fixes
    • action-label-safe diagnostics fixes
  • The problem now is that some docs still read like the repository is in the middle of an architecture migration, even though that migration is effectively complete.
  • This pass should clean wording and structure only.
  • Do not change estimator code.
  • Do not redesign notebooks.
  • Do not add new methodology features.

Main goals:

  1. Clean docs/architecture.md
    Rewrite it so it describes the repository in stable-present tense rather than migration-phase language.

Specifically remove or sharply reduce sections that are artifacts of past transition work, such as:

  • “Статус текущей миграции”
  • “Migration strategy”
  • “Non-goals этой фазы”
  • stale “migration note” wording that no longer helps users
  • wording like “в этой фазе добавлен”, “готов для поэтапного внедрения”, “текущий этап” when the feature is now part of the current architecture

Keep the useful architectural content:

  • repository purpose
  • domain model
  • layer boundaries
  • official orchestration path compare_policies(...)
  • scalar target abstraction / multi-target wrapper
  • inference and diagnostics
  • propensity source modes
  • nuisance diagnostics
  • cross-fitting support and limitations
  • validation harness role and limitations
  • recommended defaults / trust metadata

The document should read as: “this is how Policyscope is structured now”, not “this is how we migrated it”.

  1. Light README cleanup
    Review README.md and remove or tighten leftover transitional / legacy wording if any remains.

Keep README as a navigator:

  • quickstart notebook
  • synthetic comparison notebook
  • interpretation guide
  • architecture doc
  • validation harness doc
  • official high-level entrypoint

Do not turn README back into a long migration document.

  1. Preserve methodological caution
    Do not remove important caveats about:
  • OPE not being an automatic replacement for A/B tests
  • diagnostics/trust metadata being heuristic support signals, not guarantees
  • validation harness not guaranteeing correctness on arbitrary real-world logs
  1. Keep scope minimal
    Do not touch estimator math.
    Do not touch notebook logic or outputs.
    Do not move files.
    Do not add new docs unless absolutely necessary.

  2. Tests/checks
    Keep docs consistency tests passing.
    Update tests only if they explicitly check changed headings or links.

Deliverables:

  • cleaned docs/architecture.md
  • lightly cleaned README.md if needed
  • no migration-era narrative unless it still adds real user value
  • stable-present-tense documentation aligned with the current repository state

After this pass, the next review should focus on final consistency across README, docs, examples, and public API surface.

Copy link
Copy Markdown
Owner Author

Practical UX gap to consider before/after merge:

A user may arrive with a dataframe that already contains:

  • customer features,
  • logged actions from policy A,
  • observed rewards,
  • and recommendations/actions from policy B (action_B),
    without access to policyB.action_probs(...) or probabilities from the black-box policy.

In the current API this is still representable by wrapping action_B as a deterministic one-hot policyB, but the repo should make this pattern explicit in docs/examples.

Recommended follow-up:

  • add a small helper/adapter example such as DeterministicActionColumnPolicy in the quickstart or docs;
  • explain that if B only provides recommendations, pi_B(a|x) becomes one-hot;
  • clarify that this is valid to run, but overlap/ESS may become the limiting factor;
  • teach users to read replay overlap, ESS ratio, weight tails, warnings, and trust_level before trusting the result.

This is not necessarily a blocker for PR #42, but it is important for real bring-your-own-data usability.

Copy link
Copy Markdown
Owner Author

Another practical docs/tutorial gap: explain propensity_A clearly.

Users need to understand that propensity_A is the probability with which the historical behavior/logging policy A selected the actually logged action for a given user/context: pi_A(a_A | x).

Recommended docs/example addition:

  • explain where propensity_A comes from in real product systems;
  • distinguish logged propensity from estimated propensity;
  • give examples:
    1. randomized / epsilon-greedy logging policy where propensity is known by design;
    2. ranker/recommender that can return action probabilities or randomized allocation probabilities;
    3. deterministic historical policy where true propensity is unknown/unavailable;
  • explain that if true logged propensities are unavailable, propensity_source='estimated' or auto fallback can fit a behavior model, but this is weaker and should lower trust;
  • clarify that wrong or reconstructed propensities can bias IPS/DR-family estimates.

This should probably be added to the quickstart and/or docs/how_to_interpret_ope_outputs_ru.md.

Copy link
Copy Markdown
Owner Author

Architectural note on where to place DeterministicActionColumnPolicy.

Recommended placement: create a new module src/policyscope/policy_adapters.py.

Rationale:

  • src/policyscope/policies.py currently contains synthetic/demo policies (GreedyUnknown, EpsilonGreedy, SoftmaxFromScores, make_policy). Putting real-data adapters there would blur synthetic policy examples with production-facing adapters.
  • src/policyscope/data.py should stay focused on dataset/schema validation (BanditSchema, LoggedBanditDataset), not target-policy behavior.
  • src/policyscope/comparison.py should remain orchestration logic, not input adapter definitions.
  • A dedicated policy_adapters.py makes the concept explicit: these are small user-facing adapter objects that turn real-world artifacts into the policyB.action_probs(df) contract.

Suggested public API:

from policyscope.policy_adapters import DeterministicActionColumnPolicy

policyB = DeterministicActionColumnPolicy(
    action_col="action_B",
    action_space=["card", "loan", "deposit"],
)

Implementation expectations:

  • action_probs(df) -> np.ndarray returns one-hot probabilities over action_space;
  • validates action_col exists;
  • validates all observed actions are present in action_space;
  • raises clear ValueError on unknown actions;
  • supports arbitrary labels, including strings;
  • action_argmax(df) should return real action labels, not column indices.

Do not auto-wrap action_B inside compare_policies(...) at this stage. Explicit adapter construction is more transparent and avoids magic behavior.

Optional export choice:

  • safe minimal: do not export from package root, import from policyscope.policy_adapters;
  • user-friendly later: export from root only if the package wants a broader top-level API surface.

@Yurashku Yurashku merged commit cff2953 into main Apr 15, 2026
0 of 2 checks passed
@Yurashku Yurashku deleted the codex/new-task branch April 15, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant