InseeFrLab
diff --git a/‎README.md
Lines changed: 3 additions & 136 deletions b/‎README.md
Lines changed: 3 additions & 136 deletions
diff --git a/‎examples/using_additional_features.py
Lines changed: 4 additions & 4 deletions b/‎examples/using_additional_features.py
Lines changed: 4 additions & 4 deletions
diff --git a/‎notebooks/example.ipynb
Lines changed: 13 additions & 13 deletions b/‎notebooks/example.ipynb
Lines changed: 13 additions & 13 deletions
diff --git a/‎tests/test_core_functionality.py
Lines changed: 5 additions & 5 deletions b/‎tests/test_core_functionality.py
Lines changed: 5 additions & 5 deletions
@@ -2,8 +2,6 @@
 
 A unified, extensible framework for text classification built on [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/).
 
-
-
 ## 🚀 Features
 
 - **Unified API**: Consistent interface for different classifier wrappers
@@ -114,52 +112,6 @@ classifier.build(X_train, y_train)
 ```
 
 
-## 🔧 Advanced Usage
-
-### Custom Configuration
-
-```python
-from torchTextClassifiers import torchTextClassifiers
-from torchTextClassifiers.classifiers.fasttext.config import FastTextConfig
-from torchTextClassifiers.classifiers.fasttext.wrapper import FastTextWrapper
-
-# Create custom configuration
-config = FastTextConfig(
-    embedding_dim=200,
-    sparse=True,
-    num_tokens=20000,
-    min_count=3,
-    min_n=2,
-    max_n=8,
-    len_word_ngrams=3,
-    num_classes=5,
-    direct_bagging=False,  # Custom FastText parameter
-)
-
-# Create classifier with custom config
-wrapper = FastTextWrapper(config)
-classifier = torchTextClassifiers(wrapper)
-```
-
-### Using Pre-trained Tokenizers
-
-```python
-from torchTextClassifiers import build_fasttext_from_tokenizer
-
-# Assume you have a pre-trained tokenizer
-# my_tokenizer = ... (previously trained NGramTokenizer)
-
-classifier = build_fasttext_from_tokenizer(
-    tokenizer=my_tokenizer,
-    embedding_dim=100,
-    num_classes=3,
-    sparse=False
-)
-
-# Model and tokenizer are already built, ready for training
-classifier.train(X_train, y_train, X_val, y_val, ...)
-```
-
 ### Training Customization
 
 ```python
@@ -181,67 +133,6 @@ classifier.train(
 )
 ```
 
-## 📊 API Reference
-
-### Main Classes
-
-#### `torchTextClassifiers`
-The main classifier class providing a unified interface.
-
-**Key Methods:**
-- `build(X_train, y_train)`: Build text preprocessing and model
-- `train(X_train, y_train, X_val, y_val, ...)`: Train the model
-- `predict(X)`: Make predictions
-- `validate(X, Y)`: Evaluate on test data
-- `to_json(filepath)`: Save configuration
-- `from_json(filepath)`: Load configuration
-
-#### `BaseClassifierWrapper`
-Base class for all classifier wrappers. Each classifier implementation extends this class.
-
-#### `FastTextWrapper`
-Wrapper for FastText classifier implementation with tokenization-based preprocessing.
-
-### FastText Specific
-
-#### `create_fasttext(**kwargs)`
-Convenience function to create FastText classifiers.
-
-**Parameters:**
-- `embedding_dim`: Embedding dimension
-- `sparse`: Use sparse embeddings
-- `num_tokens`: Vocabulary size
-- `min_count`: Minimum token frequency
-- `min_n`, `max_n`: Character n-gram range
-- `len_word_ngrams`: Word n-gram length
-- `num_classes`: Number of output classes
-
-#### `build_fasttext_from_tokenizer(tokenizer, **kwargs)`
-Create FastText classifier from existing tokenizer.
-
-## 🏗️ Architecture
-
-The framework follows a wrapper-based architecture:
-
-```
-torchTextClassifiers/
-├── torchTextClassifiers.py      # Main classifier interface
-├── classifiers/
-│   ├── base.py                  # Abstract base wrapper classes
-│   ├── fasttext/                # FastText implementation
-│   │   ├── config.py            # Configuration
-│   │   ├── wrapper.py           # FastText wrapper (tokenization)
-│   │   ├── factory.py           # Convenience methods
-│   │   ├── tokenizer.py         # N-gram tokenizer
-│   │   ├── pytorch_model.py     # PyTorch model
-│   │   ├── lightning_module.py  # Lightning module
-│   │   └── dataset.py           # Dataset implementation
-│   └── simple_text_classifier.py # Example TF-IDF wrapper
-├── utilities/
-│   └── checkers.py              # Input validation utilities
-└── factories.py                 # Convenience factory functions
-```
-
 ## 🔬 Testing
 
 Run the test suite:
@@ -257,24 +148,6 @@ uv run pytest --cov=torchTextClassifiers
 uv run pytest tests/test_torchTextClassifiers.py -v
 ```
 
-## 🤝 Contributing
-
-We welcome contributions! See our [Developer Guide](docs/developer_guide.md) for information on:
-
-- Adding new classifier types
-- Code organization and patterns
-- Testing requirements
-- Documentation standards
-
-## 📄 License
-
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
-
-## 🙏 Acknowledgments
-
-- Built with [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/)
-- Inspired by [FastText](https://fasttext.cc/) for efficient text classification
-- Uses [uv](https://github.com/astral-sh/uv) for dependency management
 
 ## 📚 Examples
 
@@ -285,14 +158,8 @@ See the [examples/](examples/) directory for:
 - Custom classifier implementation
 - Advanced training configurations
 
-## 🐛 Support
 
-If you encounter any issues:
 
-1. Check the [examples](examples/) for similar use cases
-2. Review the API documentation above
-3. Open an issue on GitHub with:
-   - Python version
-   - Package versions (`uv tree` or `pip list`)
-   - Minimal reproduction code
-   - Error messages/stack traces
+## 📄 License
+
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
@@ -107,15 +107,15 @@ def train_and_evaluate_model(X, y, model_name, use_categorical=False, use_simple
         )
         wrapper = SimpleTextWrapper(simple_text_config)
         classifier = torchTextClassifiers(wrapper)
-        print(f"Classifier type: {type(classifier.classifier_wrapper).__name__}")
-        print(f"Uses tokenizer: {hasattr(classifier.classifier_wrapper, 'tokenizer')}")
-        print(f"Uses vectorizer: {hasattr(classifier.classifier_wrapper, 'vectorizer')}")
+        print(f"Classifier type: {type(classifier.classifier).__name__}")
+        print(f"Uses tokenizer: {hasattr(classifier.classifier, 'tokenizer')}")
+        print(f"Uses vectorizer: {hasattr(classifier.classifier, 'vectorizer')}")
 
         # Build the model (this will use TF-IDF vectorization instead of tokenization)
         print("\n🔨 Building model with TF-IDF preprocessing...")
         classifier.build(X_train, y_train)
         print("✅ Model built successfully!")
-        print(f"TF-IDF features: {len(classifier.classifier_wrapper.vectorizer.get_feature_names_out())}")
+        print(f"TF-IDF features: {len(classifier.classifier.vectorizer.get_feature_names_out())}")
 
         # Train the model
         print("\n🎯 Training model...")
 
@@ -900,7 +900,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": null,
    "id": "ebf5608b",
    "metadata": {},
    "outputs": [
@@ -916,7 +916,7 @@
     }
    ],
    "source": [
-    "type(model.classifier_wrapper)"
+    "type(model.classifier)"
    ]
   },
   {
@@ -1002,7 +1002,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": null,
    "id": "091024e6",
    "metadata": {},
    "outputs": [
@@ -1027,12 +1027,12 @@
     }
    ],
    "source": [
-    "model.classifier_wrapper.pytorch_model"
+    "model.classifier.pytorch_model"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": null,
    "id": "d983b113",
    "metadata": {},
    "outputs": [
@@ -1048,12 +1048,12 @@
     }
    ],
    "source": [
-    "model.classifier_wrapper.tokenizer"
+    "model.classifier.tokenizer"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": null,
    "id": "9b23f1ba",
    "metadata": {},
    "outputs": [
@@ -1082,7 +1082,7 @@
     }
    ],
    "source": [
-    "model.classifier_wrapper.lightning_module"
+    "model.classifier.lightning_module"
    ]
   },
   {
@@ -1097,7 +1097,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": null,
    "id": "00c077b0",
    "metadata": {},
    "outputs": [
@@ -1172,7 +1172,7 @@
    "source": [
     "from pprint import pprint \n",
     "sentence = [\"lorem ipsum dolor sit amet\"]\n",
-    "pprint(model.classifier_wrapper.tokenizer.tokenize(sentence)[2][0])"
+    "pprint(model.classifier.tokenizer.tokenize(sentence)[2][0])"
    ]
   },
   {
@@ -1208,7 +1208,7 @@
     "loaded_model = torchTextClassifiers.from_json('torchTextClassifiers_config.json')\n",
     "\n",
     "print(\"✅ Model loaded from JSON successfully!\")\n",
-    "print(f\"Loaded wrapper type: {type(loaded_model.classifier_wrapper).__name__}\")\n",
+    "print(f\"Loaded wrapper type: {type(loaded_model.classifier).__name__}\")\n",
     "print(f\"Config parameters: embedding_dim={loaded_model.config.embedding_dim}, sparse={loaded_model.config.sparse}\")\n",
     "\n",
     "# The loaded model needs to be built before use\n",
@@ -1296,7 +1296,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": null,
    "id": "g0rmedya9eb",
    "metadata": {},
    "outputs": [
@@ -1350,7 +1350,7 @@
     "direct_model.build(X_train, y_train, lightning=True, lr=parameters_train.get(\"lr\"))\n",
     "\n",
     "print(\"✅ Direct wrapper model created successfully!\")\n",
-    "print(f\"Model type: {type(direct_model.classifier_wrapper).__name__}\")\n",
+    "print(f\"Model type: {type(direct_model.classifier).__name__}\")\n",
     "print(f\"Config type: {type(direct_model.config).__name__}\")"
    ]
   },
 
@@ -79,7 +79,7 @@ def test_torchTextClassifiers_initialization_pattern():
     classifier = torchTextClassifiers(mock_wrapper)
 
     # Verify initialization
-    assert classifier.classifier_wrapper == mock_wrapper
+    assert classifier.classifier == mock_wrapper
     assert classifier.config == mock_config
 
 
@@ -123,7 +123,7 @@ def test_create_fasttext_classmethod():
 
     # Verify the result is a proper torchTextClassifiers instance
     assert isinstance(result, torchTextClassifiers)
-    assert isinstance(result.classifier_wrapper, FastTextWrapper)
+    assert isinstance(result.classifier, FastTextWrapper)
     assert result.config.embedding_dim == 50
     assert result.config.sparse == True
     assert result.config.num_tokens == 5000
@@ -135,17 +135,17 @@ def test_method_delegation_pattern():
 
     # Create a mock instance
     classifier = Mock(spec=torchTextClassifiers)
-    classifier.classifier_wrapper = Mock()
+    classifier.classifier = Mock()
 
     # Test predict delegation
     expected_result = np.array([1, 0, 1])
-    classifier.classifier_wrapper.predict.return_value = expected_result
+    classifier.classifier.predict.return_value = expected_result
 
     # Apply the real predict method to our mock
     sample_X = np.array(["test1", "test2", "test3"])
     result = torchTextClassifiers.predict(classifier, sample_X)
 
-    classifier.classifier_wrapper.predict.assert_called_once_with(sample_X)
+    classifier.classifier.predict.assert_called_once_with(sample_X)
     assert result is expected_result