This report summarizes and compares experimental results of four different image classification models:
- EfficientNet
- ResNet34
- Vision Transformer (ViT)
- YOLOv8n
| Parameter | EfficientNet | ResNet34 | Vision Transformer | YOLOv8n |
|---|---|---|---|---|
| Number of epochs | 15 | 100 | 100 | 10 |
| Learning rate | 0.001 (assumed) | 0.1β0.01β0.001β0.0001 | 0.00001 (fixed) | Increase-decrease schedule |
| Special techniques | Early stopping (patience=5) | Learning rate schedule | None | Learning rate schedule |
| Model | Highest Accuracy | On Dataset | Timing |
|---|---|---|---|
| EfficientNet | 97.04% | Test | Epoch 11 (best) |
| ResNet34 | 80.74% | Test | After 100 epochs |
| Vision Transformer | 74.81% | Test | After 100 epochs |
| YOLOv8n | 95.9% | Test | Epoch 10 |
| Model | Initial Loss | Final Loss | Characteristics |
|---|---|---|---|
| EfficientNet | 0.0717 | 0.0583 | Irregular fluctuations from 0.0330-0.1081 |
| ResNet34 | 4.41 | 0.4341 | Decreases in LR phases, stable at end |
| Vision Transformer | 1.073 | 0.0004 | Steady decrease, converges near 0 |
| YOLOv8n | 0.83442 | 0.07431 | Fast decrease and stable |
| Model | Training Time | Inference Performance | Fast Convergence |
|---|---|---|---|
| EfficientNet | Not specified | Not specified | Achieves >92% from first epoch |
| ResNet34 | Not specified | Not specified | Slow loss decrease, stable |
| Vision Transformer | Not specified | Not specified | Steady loss decrease, final convergence |
| YOLOv8n | Short (10 epochs) | 17.4ms/image | Achieves >90% after just 2 epochs |
| Class | Precision | Recall | F1-score |
|---|---|---|---|
| Adult | 1.0000 | 1.0000 | 1.0000 |
| Normal | 0.9528 | 0.9712 | 0.9619 |
| Violent | 0.9595 | 0.9342 | 0.9467 |
Predicted
adult normal violence
True Adult 90 0 0
True Normal 0 101 3
True Violent 0 5 71
- ResNet34: Only reports overall accuracy (80.74%), no detailed class-wise metrics
- Vision Transformer: Only reports overall accuracy (74.81%), no detailed class-wise metrics
- YOLOv8n: Reports Top-1 Accuracy (95.9%) and Top-5 Accuracy (100%)
- Advantages: Highest accuracy (97.04%), perfect classification for "Adult" class
- Characteristics: Validation accuracy fluctuates significantly, loss doesn't decrease evenly
- Advantages: Effective learning rate schedule
- Characteristics: Stable training, no sudden loss fluctuations
- Advantages: Training loss decreases to near 0 (0.0004)
- Characteristics: Low performance (74.81%) suggests possible overfitting
- Advantages: Fast convergence, high inference speed (17.4ms/image)
- Characteristics: High efficiency with few epochs (10)
| Criteria | EfficientNet | ResNet34 | Vision Transformer | YOLOv8n |
|---|---|---|---|---|
| Accuracy | β β β β β (97.04%) | β β β ββ (80.74%) | β β βββ (74.81%) | β β β β β (95.9%) |
| Convergence speed | β β β β β (15 epochs) | β β βββ (100 epochs) | β β βββ (100 epochs) | β β β β β (10 epochs) |
| Loss stability | β β βββ (fluctuating) | β β β β β (stable decrease) | β β β β β (steady decrease) | β β β β β (fast decrease) |
| Inference time | No information | No information | No information | β β β β β (17.4ms/image) |
| Overall rating | β β β β β | β β β ββ | β β βββ | β β β β β |
From the experimental results of all four models, the following conclusions can be drawn:
-
EfficientNet and YOLOv8n are the two most effective models with >95% accuracy on test set, suitable for content-based image classification tasks.
-
ResNet34 has moderate performance (80.74%) but stable training process, may need additional optimization.
-
Vision Transformer has the lowest performance (74.81%) on the MNIST dataset, suggesting that transformer architecture may not be suitable for simple data or needs hyperparameter adjustments.
-
YOLOv8n stands out with fast convergence capability (only 10 epochs) and high inference speed, suitable for practical applications and real-time processing.
- Enhance data for "Normal" and "Violent" classes to reduce confusion
- Apply learning rate scheduler to stabilize training process
- Optimize learning rate schedule
- Apply additional regularization techniques (dropout, weight decay)
- Experiment with larger architectures (ResNet50, ResNet101)
- Apply regularization to avoid overfitting
- Experiment with learning rate schedule instead of fixed rate
- Adjust hyperparameters (number of layers, attention heads, patch size)
- Optimize dataset (increase number of hard samples)
- Adjust hyperparameters to reduce val loss fluctuation
- Need highest accuracy: EfficientNet (97.04%)
- Need fast processing speed: YOLOv8n (17.4ms/image)
- Balanced practical application: YOLOv8n (95.9% accuracy, 10 epochs, fast inference)
- Complex data with many details: ResNet34 with learning rate schedule
- Complex spatial structure data: Vision Transformer (after optimization)
- Analysis report of EfficientNet model for three-label image classification
- Experimental results report of ResNet34 model
- Experimental results report of Vision Transformer model
- Performance analysis report of 16+ image classification project using YOLOv8n model
If you find any errors or areas for improvement, please don't hesitate to add an issue so the AI community can continue to grow :))