Add Data Model #157

mali-git · 2024-12-12T10:16:09Z

This PR introduces a modular and extensible structure for configuring decoding strategies in a Pydantic-based framework. The changes include dedicated data classes for each decoding strategy (e.g., Greedy, Beam Search, Top-K, Top-P) with configurable parameters. These classes are seamlessly integrated into the existing statistical reporting framework.

Key Changes

Decoding Strategy Parameter Classes:
- Added distinct Pydantic models for the following decoding strategies:
  - GreedyParameters: No additional parameters beyond the strategy name.
  - BeamSearchParameters: Configurable fields:
    - num_beams: Number of beams for beam search.
    - early_stopping: Whether to stop early.
  - TopKParameters: Configurable fields:
    - top_k: Number of top candidates to consider.
    - temperature: Sampling temperature.
  - TopPParameters: Configurable fields:
    - top_p: Probability mass for nucleus sampling.
    - temperature: Sampling temperature.
Integration into DocumentInfo:
- The decoding_parameters field in the DocumentInfo class now accepts any of the decoding parameter classes (GreedyParameters, BeamSearchParameters, TopKParameters, TopPParameters).
- This allows each document to specify a decoding strategy with its parameters.
Statistical Reporting:
- StatisticReportintegrates DocumentInfo and statistical metrics (e.g., CorrelationMetrics and TTestResults)

le1nux

Left a few minor commands. Otherwise LGTM :)

le1nux · 2024-12-12T13:03:35Z

src/ml_filter/data_models.py

+
+    strategy: DecodingStrategy = Field(default=DecodingStrategy.TOP_K)
+    top_k: int = Field(..., gt=0, description="Number of top candidates to consider. Must be greater than 0.")
+    temperature: float = Field(..., gt=0, description="Sampling temperature. Must be greater than 0.")


do we really need temperature for top-k decoding?

From my understanding, temperature makes the probability more flattend or peaked.
If we apply top-k to a flattened or peaked probability distribution the result should be the same no?

For top k sampling temperature actually makes sense. The top k tokens you sample from do not change; however, the probabilty with which you then randomly sample one of these k tokens is affected significantly by the temperature

le1nux · 2024-12-12T13:08:16Z

src/ml_filter/data_models.py

+class GreedyParameters(DecodingParameters):
+    """Greedy decoding strategy parameters"""
+
+    strategy: DecodingStrategy = Field(default=DecodingStrategy.GREEDY)


Shouldn't greedy also have a temperature flag?

Temperature of 0 -> argmax (special case)
otherwise we would sample a single token from the probability distribution.

I think temperature does not affect the results when decoding greedy. Greedy decoding always selects the most likely token as the next one. The temperature flattens or steepens the distribution of the tokens, but the most likely one will always stay the most likely one, regardless of the chosen temperature

I guess what you mean is called temperature sampling

le1nux · 2024-12-12T13:09:11Z

src/ml_filter/data_models.py

+    prompt_lang: str
+    raw_data_path: str
+    model: str
+    decoding_parameters: Union[GreedyParameters, BeamSearchParameters, TopKParameters, TopPParameters]


Suggested change

decoding_parameters: Union[GreedyParameters, BeamSearchParameters, TopKParameters, TopPParameters]

decoding_parameters: GreedyParameters | BeamSearchParameters | TopKParameters | TopPParameters

le1nux · 2024-12-12T13:11:20Z

src/ml_filter/data_models.py

@@ -0,0 +1,86 @@
+from enum import Enum
+from typing import Dict, Union


old type annotations

le1nux · 2024-12-12T13:20:03Z

tests/test_data_models.py

+def test_greedy_parameters():
+    params = GreedyParameters()
+    assert params.strategy == DecodingStrategy.GREEDY
+
+
+def test_beam_search_parameters():
+    params = BeamSearchParameters(num_beams=10, early_stopping=False)
+    assert params.strategy == DecodingStrategy.BEAM_SEARCH
+    assert params.num_beams == 10
+    assert not params.early_stopping
+
+
+def test_top_k_parameters():
+    params = TopKParameters(top_k=30, temperature=0.7)
+    assert params.strategy == DecodingStrategy.TOP_K
+    assert params.top_k == 30
+    assert params.temperature == 0.7
+
+
+def test_top_p_parameters():
+    params = TopPParameters(top_p=0.85, temperature=0.9)
+    assert params.strategy == DecodingStrategy.TOP_P
+    assert params.top_p == 0.85
+    assert params.temperature == 0.9
+
+
+def test_invalid_decoding_parameters():
+    with pytest.raises(ValidationError):
+        BeamSearchParameters(num_beams=-1, early_stopping=False)  # Invalid num_beams
+    with pytest.raises(ValidationError):
+        TopKParameters(top_k=-5, temperature=0.7)  # Invalid top_k
+    with pytest.raises(ValidationError):
+        TopPParameters(top_p=1.5, temperature=0.8)  # Invalid top_p
+
+
+def test_document_info_with_greedy():
+    doc_info = DocumentInfo(
+        document_id="doc_001",
+        prompt="Asses the educational value of the text.",
+        prompt_lang="en",
+        raw_data_path="/path/to/raw_data.json",
+        model="test_model",
+        decoding_parameters=GreedyParameters(),
+    )
+    assert doc_info.document_id == "doc_001"
+    assert doc_info.decoding_parameters.strategy == DecodingStrategy.GREEDY
+
+
+def test_document_info_with_top_p():
+    doc_info = DocumentInfo(
+        document_id="doc_002",
+        prompt="Asses, whether the text contains adult content.",
+        prompt_lang="en",
+        raw_data_path="/path/to/raw_data.json",
+        model="test_model",
+        decoding_parameters=TopPParameters(top_p=0.8, temperature=0.6),
+    )
+    assert doc_info.document_id == "doc_002"
+    assert doc_info.decoding_parameters.top_p == 0.8
+    assert doc_info.decoding_parameters.temperature == 0.6
+
+
+def test_statistic_report():
+    doc_info = DocumentInfo(
+        document_id="doc_003",
+        prompt="Asses, whether the text contains chain of thoughts.",
+        prompt_lang="en",
+        raw_data_path="/path/to/raw_data.json",
+        model="test_model",
+        decoding_parameters=BeamSearchParameters(num_beams=5, early_stopping=True),
+    )
+    correlation_metrics = CorrelationMetrics(
+        correlation={
+            "average": {"pearson": 0.85, "spearman": 0.82},
+            "min": {"pearson": 0.75, "spearman": 0.72},
+        }
+    )
+    t_test_results = TTestResults(t_test_p_values={"average": 0.03, "min": 0.05})
+    report = StatisticReport(
+        document_info=doc_info,
+        correlation_metrics=correlation_metrics,
+        t_test_results=t_test_results,
+    )
+
+    assert report.document_info.document_id == "doc_003"
+    assert report.correlation_metrics.correlation["average"]["pearson"] == 0.85
+    assert report.t_test_results.t_test_p_values["average"] == 0.03


I think all of these tests don't really test any functionality and are a bit redundant.

rrutmann

The classes in general look good, despite minor things that I mentioned in the comments. However, if I see it correctly, here are only data classes defined, that are not used in the rest of the code, correct? The actual functionality to make use of these configurations is missing currently

rrutmann · 2024-12-12T14:30:47Z

src/ml_filter/data_models.py

+class GreedyParameters(DecodingParameters):
+    """Greedy decoding strategy parameters"""
+
+    strategy: DecodingStrategy = Field(default=DecodingStrategy.GREEDY)


I think temperature does not affect the results when decoding greedy. Greedy decoding always selects the most likely token as the next one. The temperature flattens or steepens the distribution of the tokens, but the most likely one will always stay the most likely one, regardless of the chosen temperature

rrutmann · 2024-12-12T14:33:59Z

src/ml_filter/data_models.py

+
+    strategy: DecodingStrategy = Field(default=DecodingStrategy.TOP_K)
+    top_k: int = Field(..., gt=0, description="Number of top candidates to consider. Must be greater than 0.")
+    temperature: float = Field(..., gt=0, description="Sampling temperature. Must be greater than 0.")


For top k sampling temperature actually makes sense. The top k tokens you sample from do not change; however, the probabilty with which you then randomly sample one of these k tokens is affected significantly by the temperature

rrutmann · 2024-12-12T14:47:14Z

tests/test_data_models.py

+def test_greedy_parameters():
+    params = GreedyParameters()
+    assert params.strategy == DecodingStrategy.GREEDY
+
+
+def test_beam_search_parameters():
+    params = BeamSearchParameters(num_beams=10, early_stopping=False)
+    assert params.strategy == DecodingStrategy.BEAM_SEARCH
+    assert params.num_beams == 10
+    assert not params.early_stopping
+
+
+def test_top_k_parameters():
+    params = TopKParameters(top_k=30, temperature=0.7)
+    assert params.strategy == DecodingStrategy.TOP_K
+    assert params.top_k == 30
+    assert params.temperature == 0.7
+
+
+def test_top_p_parameters():
+    params = TopPParameters(top_p=0.85, temperature=0.9)
+    assert params.strategy == DecodingStrategy.TOP_P
+    assert params.top_p == 0.85
+    assert params.temperature == 0.9
+
+
+def test_invalid_decoding_parameters():
+    with pytest.raises(ValidationError):
+        BeamSearchParameters(num_beams=-1, early_stopping=False)  # Invalid num_beams
+    with pytest.raises(ValidationError):
+        TopKParameters(top_k=-5, temperature=0.7)  # Invalid top_k
+    with pytest.raises(ValidationError):
+        TopPParameters(top_p=1.5, temperature=0.8)  # Invalid top_p
+
+
+def test_document_info_with_greedy():
+    doc_info = DocumentInfo(
+        document_id="doc_001",
+        prompt="Asses the educational value of the text.",
+        prompt_lang="en",
+        raw_data_path="/path/to/raw_data.json",
+        model="test_model",
+        decoding_parameters=GreedyParameters(),
+    )
+    assert doc_info.document_id == "doc_001"
+    assert doc_info.decoding_parameters.strategy == DecodingStrategy.GREEDY
+
+
+def test_document_info_with_top_p():
+    doc_info = DocumentInfo(
+        document_id="doc_002",
+        prompt="Asses, whether the text contains adult content.",
+        prompt_lang="en",
+        raw_data_path="/path/to/raw_data.json",
+        model="test_model",
+        decoding_parameters=TopPParameters(top_p=0.8, temperature=0.6),
+    )
+    assert doc_info.document_id == "doc_002"
+    assert doc_info.decoding_parameters.top_p == 0.8
+    assert doc_info.decoding_parameters.temperature == 0.6
+
+
+def test_statistic_report():
+    doc_info = DocumentInfo(
+        document_id="doc_003",
+        prompt="Asses, whether the text contains chain of thoughts.",
+        prompt_lang="en",
+        raw_data_path="/path/to/raw_data.json",
+        model="test_model",
+        decoding_parameters=BeamSearchParameters(num_beams=5, early_stopping=True),
+    )
+    correlation_metrics = CorrelationMetrics(
+        correlation={
+            "average": {"pearson": 0.85, "spearman": 0.82},
+            "min": {"pearson": 0.75, "spearman": 0.72},
+        }
+    )
+    t_test_results = TTestResults(t_test_p_values={"average": 0.03, "min": 0.05})
+    report = StatisticReport(
+        document_info=doc_info,
+        correlation_metrics=correlation_metrics,
+        t_test_results=t_test_results,
+    )
+
+    assert report.document_info.document_id == "doc_003"
+    assert report.correlation_metrics.correlation["average"]["pearson"] == 0.85
+    assert report.t_test_results.t_test_p_values["average"] == 0.03


rrutmann · 2024-12-12T14:51:07Z

src/ml_filter/data_models.py

+
+    strategy: DecodingStrategy = Field(default=DecodingStrategy.BEAM_SEARCH)
+    num_beams: int = Field(..., gt=0, description="Number of beams must be greater than 0.")
+    early_stopping: bool


We should also add a parameter for the total number of beams that are tracked in parallel, not just for each generated token (=num_beams)

rrutmann · 2024-12-12T15:04:49Z

src/ml_filter/data_models.py

+class DecodingParameters(BaseModel):
+    """Decoding strategy parameters"""
+
+    strategy: DecodingStrategy


Why is the strategy a parameter? This seems a little bit redundant when we then define a separate class for each decoding strategy anyway

rrutmann · 2024-12-12T15:48:36Z

src/ml_filter/data_models.py

+class CorrelationMetrics(BaseModel):
+    """Correlation metrics for performance evaluation"""
+
+    correlation: Dict[str, Dict[str, float]]  # Correlation per ground truth approach


What do the keys of the dict represent? The model (prompt + prompt_lang + llm) that generated the scores? This would mean that the correlation is always measured compared to the ground truth, correct? If we allow tuples of strings as the keys, we could also measure the correlation between different models

mali-git added 5 commits December 12, 2024 11:15

feat: Add data model

6ca5e0e

docs: add docstrings

ae7d627

refactor: add constraints

e6c3063

test: test data models

73065fa

test: fix test

3795efc

mali-git requested a review from le1nux December 12, 2024 10:58

le1nux requested changes Dec 12, 2024

View reviewed changes

rrutmann requested changes Dec 12, 2024

View reviewed changes

	decoding_parameters: Union[GreedyParameters, BeamSearchParameters, TopKParameters, TopPParameters]
	decoding_parameters: GreedyParameters \| BeamSearchParameters \| TopKParameters \| TopPParameters

		@@ -0,0 +1,86 @@
		from enum import Enum
		from typing import Dict, Union

Add Data Model #157

Are you sure you want to change the base?

Add Data Model #157

Uh oh!

Conversation

mali-git commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Uh oh!

le1nux left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rrutmann left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mali-git commented Dec 12, 2024 •

edited

Loading