diff --git a/README.md b/README.md
index 9f3b3bc..b510cea 100644
--- a/README.md
+++ b/README.md
@@ -1,152 +1,35 @@
-# 📸 **BetterCam** 🚀
-![World's Best AI Aimbot Banner](images/banner.png)
+RTX 4070 + RYZEN 7 5800X 48GB RAM
 
-[![Pull Requests Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat)](http://makeapullrequest.com)
-> ***🌟 World's Fastest Python Screenshot Library for Windows 🐍***
+SCREEN CAPTURE RESOLUTION AT 1920X1080 
 
-```python
-import bettercam
-camera = bettercam.create()
-camera.grab()
-```
+TEST FROM : BetterCam/benchmarks/bettercam_capture.py
 
-## 🌈 Introduction
-BetterCam is the World's 🌏 Fastest Publicly available Python screenshot library for Windows, boasting 240Hz+ capturing using the Desktop Duplication API 🖥️💨. Born from [DXCam](https://github.com/ra1nty/DXcam), it shines in deep learning pipelines for FPS games, outpacing other Python solutions like [python-mss](https://github.com/BoboTiG/python-mss) and [D3DShot](https://github.com/SerpentAI/D3DShot/).
 
-BetterCam's superpowers include:
-- 🚅 Insanely fast screen capturing (> 240Hz)
-- 🎮 Capture from Direct3D exclusive full-screen apps without interruption, even during alt+tab.
-- 🔧 Auto-adjusts to scaled / stretched resolutions.
-- 🎯 Precise FPS targeting for Video output.
-- 👌 Smooth NumPy, OpenCV, PyTorch integration, etc.
+Screen Capture FPS: 640
+[BetterCam] Capture benchmark with NVIDIA GPU
+Elapsed time: 3.53 seconds
+Frames per second: 283.38 FPS
 
-> ***💞 Community contributions warmly invited!***
 
-## 🛠️ Installation
-### From PyPI:
-```bash
-pip install bettercam
-```
+Screen Capture FPS: 622
+[BetterCam] Capture benchmark with Torch CUDA
+Elapsed time: 3.36 seconds
+Frames per second: 297.91 FPS
 
-**Note:** 🧩 OpenCV is needed by BetterCam for color space conversion. Install it with `pip install opencv-python` if not yet available.
 
+Screen Capture FPS: 657
+[BetterCam] Capture benchmark without GPU acceleration
+Elapsed time: 3.50 seconds
+Frames per second: 286.06 FPS
 
-## 📚 Usage
-Each monitor is paired with a `BetterCam` instance.
-To get started:
-```python
-import bettercam
-camera = bettercam.create()  # Primary monitor's BetterCam instance
-```
-### 📷 Screenshot
-For a quick snap, call `.grab`:
-```python
-frame = camera.grab()
-```
-`frame` is a `numpy.ndarray` in the `(Height, Width, 3[RGB])` format by default. Note: `.grab` may return `None` if there's no update since the last `.grab`.
+Comparison Results:
+NVIDIA GPU - Elapsed time: 3.53 seconds, FPS: 283.38
+Torch CUDA - Elapsed time: 3.36 seconds, FPS: 297.91
+No GPU - Elapsed time: 3.50 seconds, FPS: 286.06
 
-To display your screenshot:
-```python
-from PIL import Image
-Image.fromarray(frame).show()
-```
-For a specific region, provide the `region` parameter with a tuple for the bounding box coordinates:
-```python
-left, top = (1920 - 640) // 2, (1080 - 640) // 2
-right, bottom = left + 640, top + 640
-region = (left, top, right, bottom)
-frame = camera.grab(region=region)  # A 640x640x3 numpy ndarray snapshot
-```
+camera = bettercam.create(output_idx=0, output_color="BGRA", nvidia_gpu=True) TO USE DIRECTLY CUPY_PROCESSOR
 
-### 📹 Screen Capture
-Start and stop screen capture with `.start` and `.stop`:
-```python
-camera.start(region=(left, top, right, bottom))  # Capture a region (optional)
-camera.is_capturing  # True
-# ... Your Code
-camera.stop()
-camera.is_capturing  # False
-```
+camera = bettercam.create(output_idx=0, output_color="BGRA", torch_cuda=True) TO USE DIRECTLY TORCH_CUDA_PROCESSOR
 
-### 🔄 Retrieving Captured Data
-When capturing, grab the latest frame with `.get_latest_frame`:
-```python
-camera.start()
-for i in range(1000):
-    image = camera.get_latest_frame()  # Waits for a new frame
-camera.stop()
-```
+camera = bettercam.create(output_idx=0, output_color="BGRA") TO USE DIRECTLY NUMPY_PROCESSOR
 
-## ⚙️ Advanced Usage & Notes
-### 🖥️ Multiple Monitors / GPUs
-```python
-cam1, cam2, cam3 = [bettercam.create(device_idx=d, output_idx=o) for d, o in [(0, 0), (0, 1), (1, 1)]]
-img1, img2, img3 = [cam.grab() for cam in (cam1, cam2, cam3)]
-```
-To list devices and outputs:
-```pycon
->>> import bettercam
->>> bettercam.device_info()
->>> bettercam.output_info()
-```
-
-### 🎨 Output Format
-Select your color mode when creating a BetterCam instance:
-```python
-bettercam.create(output_idx=0, output_color="BGRA")
-```
-We support "RGB", "RGBA", "BGR", "BGRA", "GRAY" (for grayscale). Right now only `numpy.ndarray` shapes are supported: `(Height, Width, Channels)`.
-
-### 🔄 Video Buffer
-Frames go into a fixed-size ring buffer. Customize its max length with `max_buffer_len` on creation:
-```python
-camera = bettercam.create(max_buffer_len=512)
-```
-
-### 🎥 Target FPS
-For precise FPS targeting, we use the high-resolution `CREATE_WAITABLE_TIMER_HIGH_RESOLUTION`:
-```python
-camera.start(target_fps=120)  # Ideally, not beyond 240Hz.
-```
-
-### 🔄 Video Mode
-For constant framerate video recording, use `video_mode=True` during `.start`:
-```python
-# Example: Record a 5-second, 120Hz video
-camera.start(target_fps=target_fps, video_mode=True)
-# ... Video writing code goes here
-```
-
-### 🛠️ Resource Management
-Call `.release` to stop captures and free resources. Manual deletion also possible:
-```python
-del camera
-```
-
-## 📊 Benchmarks
-### Max FPS Achievement:
-```python
-cam = bettercam.create()
-# ... Benchmarking code...
-```
-|         | BetterCam Nvidia GPU :checkered_flag: | BetterCam :checkered_flag: | DXCam  | python-mss | D3DShot |
-|---------|---------------------------------------|--------------------------|--------|------------|---------|
-| Avg FPS | 111.667                               | 123.667                  | 39     | 34.667     | N/A     |
-| Std Dev | 0.889                                 | 1.778                    | 1.333  | 2.222      | N/A     |
-
-### FPS Targeting:
-```python
-# ... Sample code to test target FPS ...
-```
-| Target/Result | BetterCam Nvidia GPU :checkered_flag: | BetterCam :checkered_flag:   | DXCam | python-mss | D3DShot |
-|---------------|---------------------------------------|--------------------------|-------|------------|---------|
-| 120fps        | 111.667, 0.889                        | 88.333, 2.444            | 36.667, 0.889   | N/A        | N/A     |
-| 60fps         | 60, 0                                 | 60, 0                    | 35, 5.3   | N/A        | N/A     |
-
-## 📝 Referenced Work
-- [DXCam](https://github.com/ra1nty/DXcam): Our origin story.
-- [D3DShot](https://github.com/SerpentAI/D3DShot/): Provided foundational ctypes.
-- [OBS Studio](https://github.com/obsproject/obs-studio): A treasure trove of knowledge.
-
-[^1]: [Preemption (computing)](https://en.wikipedia.org/wiki/Preemption_(computing))
-[^2]: [Time.sleep precision improvement](https://github.com/python/cpython/issues/65501)
\ No newline at end of file
diff --git a/benchmarks/bettercam_capture.py b/benchmarks/bettercam_capture.py
index 68a5a62..3508a3a 100644
--- a/benchmarks/bettercam_capture.py
+++ b/benchmarks/bettercam_capture.py
@@ -1,18 +1,103 @@
 import time
 import bettercam
 
+def benchmark_nvidia_gpu():
+    TOP = 0
+    LEFT = 0
+    RIGHT = 1920
+    BOTTOM = 1080
+    region = (LEFT, TOP, RIGHT, BOTTOM)
+    title = "[BetterCam] Capture benchmark with NVIDIA GPU"
+
+    camera = bettercam.create(output_idx=0, output_color="BGRA", nvidia_gpu=True)
+    camera.start(target_fps=0, video_mode=True)
+
+    start_time = time.time()
+
+    for i in range(1000):
+        image = camera.get_latest_frame()
+
+    end_time = time.time()
+    elapsed_time = end_time - start_time
+    fps = 1000 / elapsed_time
+
+    camera.stop()
+    del camera
+
+    print(f"{title}")
+    print(f"Elapsed time: {elapsed_time:.2f} seconds")
+    print(f"Frames per second: {fps:.2f} FPS")
+    return elapsed_time, fps
+
+def benchmark_torch_cuda():
+    TOP = 0
+    LEFT = 0
+    RIGHT = 1920
+    BOTTOM = 1080
+    region = (LEFT, TOP, RIGHT, BOTTOM)
+    title = "[BetterCam] Capture benchmark with Torch CUDA"
+
+    camera = bettercam.create(output_idx=0, output_color="BGRA", torch_cuda=True)
+    camera.start(target_fps=0, video_mode=True)
+
+    start_time = time.time()
+
+    for i in range(1000):
+        image = camera.get_latest_frame()
+
+    end_time = time.time()
+    elapsed_time = end_time - start_time
+    fps = 1000 / elapsed_time
+
+    camera.stop()
+    del camera
+
+    print(f"{title}")
+    print(f"Elapsed time: {elapsed_time:.2f} seconds")
+    print(f"Frames per second: {fps:.2f} FPS")
+    return elapsed_time, fps
+
+def benchmark_no_gpu():
+    TOP = 0
+    LEFT = 0
+    RIGHT = 1920
+    BOTTOM = 1080
+    region = (LEFT, TOP, RIGHT, BOTTOM)
+    title = "[BetterCam] Capture benchmark without GPU acceleration"
+
+    camera = bettercam.create(output_idx=0, output_color="BGRA")
+    camera.start(target_fps=0, video_mode=True)
+
+    start_time = time.time()
+
+    for i in range(1000):
+        image = camera.get_latest_frame()
+
+    end_time = time.time()
+    elapsed_time = end_time - start_time
+    fps = 1000 / elapsed_time
+
+    camera.stop()
+    del camera
+
+    print(f"{title}")
+    print(f"Elapsed time: {elapsed_time:.2f} seconds")
+    print(f"Frames per second: {fps:.2f} FPS")
+    return elapsed_time, fps
+
+# Benchmark with NVIDIA GPU
+nvidia_time, nvidia_fps = benchmark_nvidia_gpu()
+
+# Benchmark with Torch CUDA
+torch_time, torch_fps = benchmark_torch_cuda()
+
+# Benchmark without GPU acceleration
+no_gpu_time, no_gpu_fps = benchmark_no_gpu()
+
+# Print comparison results
+print("\nComparison Results:")
+print(f"NVIDIA GPU - Elapsed time: {nvidia_time:.2f} seconds, FPS: {nvidia_fps:.2f}")
+print(f"Torch CUDA - Elapsed time: {torch_time:.2f} seconds, FPS: {torch_fps:.2f}")
+print(f"No GPU - Elapsed time: {no_gpu_time:.2f} seconds, FPS: {no_gpu_fps:.2f}")
+
 
-TOP = 0
-LEFT = 0
-RIGHT = 1920
-BOTTOM = 1080
-region = (LEFT, TOP, RIGHT, BOTTOM)
-title = "[BetterCam] Capture benchmark"
-
-fps = 0
-camera = bettercam.create(output_idx=0, output_color="BGRA")
-camera.start(target_fps=60, video_mode=True)
-for i in range(1000):
-    image = camera.get_latest_frame()
-camera.stop()
-del camera
diff --git a/bettercam/__init__.py b/bettercam/__init__.py
index e90a205..819eee7 100644
--- a/bettercam/__init__.py
+++ b/bettercam/__init__.py
@@ -39,8 +39,9 @@ def create(
         device_idx: int = 0,
         output_idx: int = None,
         region: tuple = None,
-        output_color: str = "RGB",
+        output_color: str = "BGRA",
         nvidia_gpu: bool = False,
+        torch_cuda: bool = False,
         max_buffer_len: int = 64,
     ):
         device = self.devices[device_idx]
@@ -75,6 +76,7 @@ def create(
             region=region,
             output_color=output_color,
             nvidia_gpu=nvidia_gpu,
+            torch_cuda=torch_cuda,
             max_buffer_len=max_buffer_len,
         )
         self._camera_instances[instance_key] = camera
@@ -108,8 +110,9 @@ def create(
     device_idx: int = 0,
     output_idx: int = None,
     region: tuple = None,
-    output_color: str = "RGB",
+    output_color: str = "BGRA",
     nvidia_gpu: bool = False,
+    torch_cuda: bool = False,
     max_buffer_len: int = 64,
 ):
     return __factory.create(
@@ -118,6 +121,7 @@ def create(
         region=region,
         output_color=output_color,
         nvidia_gpu=nvidia_gpu,
+        torch_cuda=torch_cuda,
         max_buffer_len=max_buffer_len,
     )
 
@@ -128,3 +132,4 @@ def device_info():
 
 def output_info():
     return __factory.output_info()
+
diff --git a/bettercam/bettercam.py b/bettercam/bettercam.py
index 6d993e7..1b0fb5f 100644
--- a/bettercam/bettercam.py
+++ b/bettercam/bettercam.py
@@ -22,8 +22,9 @@ def __init__(
         output: Output,
         device: Device,
         region: Tuple[int, int, int, int],
-        output_color: str = "RGB",
+        output_color: str = "BGRA",
         nvidia_gpu: bool = False,
+        torch_cuda: bool = False,
         max_buffer_len=64,
     ) -> None:
         self._output: Output = output
@@ -35,13 +36,20 @@ def __init__(
             output=self._output, device=self._device
         )
         self.nvidia_gpu = nvidia_gpu
-        # if nvidia_gpu:
-        #     import cupy as np
-        self._processor: Processor = Processor(output_color=output_color, nvidia_gpu=nvidia_gpu)
+        self.torch_cuda = torch_cuda
+
+        # Set the rotation angle from the output device
+        self.rotation_angle: int = self._output.rotation_angle
+        # Initialize Processor with the rotation angle and backend
+        self._processor: Processor = Processor(
+            output_color=output_color, 
+            nvidia_gpu=nvidia_gpu,
+            torch_cuda=torch_cuda,
+            rotation_angle=self.rotation_angle
+        )
 
         self.width, self.height = self._output.resolution
         self.channel_size = len(output_color) if output_color != "GRAY" else 1
-        self.rotation_angle: int = self._output.rotation_angle
 
         self._region_set_by_user = region is not None
         self.region: Tuple[int, int, int, int] = region
@@ -66,14 +74,18 @@ def __init__(
 
         self.__frame_count = 0
         self.__capture_start_time = 0
-
+        
     def grab(self, region: Tuple[int, int, int, int] = None):
         if region is None:
             region = self.region
         else:
             self._validate_region(region)
         frame = self._grab(region)
-        return frame
+        if frame is not None:
+            return frame
+        else:
+            self._on_output_change()
+            return None
 
     def _grab(self, region: Tuple[int, int, int, int]):
         if self._duplicator.update_frame():
@@ -84,9 +96,7 @@ def _grab(self, region: Tuple[int, int, int, int]):
             )
             self._duplicator.release_frame()
             rect = self._stagesurf.map()
-            frame = self._processor.process(
-                rect, self.width, self.height, region, self.rotation_angle
-            )
+            frame = self._processor.process(rect, self.width, self.height, region)
             self._stagesurf.unmap()
             return frame
         else:
@@ -94,7 +104,7 @@ def _grab(self, region: Tuple[int, int, int, int]):
             return None
 
     def _on_output_change(self):
-        time.sleep(0.1)  # Wait for Display mode change (Access Lost)
+        time.sleep(0.05)  # Wait for Display mode change (Access Lost)
         self._duplicator.release()
         self._stagesurf.release()
         self._output.update_desc()
@@ -120,17 +130,16 @@ def start(
         video_mode=False,
         delay: int = 0,
     ):
-        if delay != 0:
-            time.sleep(delay)
-            self._on_output_change()
+        #if delay != 0:
+            #time.sleep(delay)
+            #self._on_output_change()
         if region is None:
             region = self.region
-        self._validate_region(region)
+        #self._validate_region(region)
         self.is_capturing = True
         frame_shape = (region[3] - region[1], region[2] - region[0], self.channel_size)
-        self.__frame_buffer = np.ndarray(
-            (self.max_buffer_len, *frame_shape), dtype=np.uint8
-        )
+        #self.__frame_buffer = np.ndarray((self.max_buffer_len, *frame_shape), dtype=np.uint8)
+        self.__frame_buffer = np.zeros((self.max_buffer_len, *frame_shape), dtype=np.uint8)
         self.__thread = Thread(
             target=self.__capture,
             name="BetterCam",
@@ -169,6 +178,7 @@ def __capture(
         self.__capture_start_time = time.perf_counter()
 
         capture_error = None
+        last_valid_frame = None  # Mantieni l'ultimo frame valido
 
         while not self.__stop_capture.is_set():
             if self.__timer_handle:
@@ -179,26 +189,18 @@ def __capture(
                     continue
             try:
                 frame = self._grab(region)
+                if frame is None and video_mode and last_valid_frame is not None:
+                    # Utilizza l'ultimo frame valido se in modalità video e nessun nuovo frame è disponibile
+                    frame = last_valid_frame
                 if frame is not None:
                     with self.__lock:
                         self.__frame_buffer[self.__head] = frame
-                        if self.__full:
-                            self.__tail = (self.__tail + 1) % self.max_buffer_len
                         self.__head = (self.__head + 1) % self.max_buffer_len
-                        self.__frame_available.set()
-                        self.__frame_count += 1
-                        self.__full = self.__head == self.__tail
-                elif video_mode:
-                    with self.__lock:
-                        self.__frame_buffer[self.__head] = np.array(
-                            self.__frame_buffer[(self.__head - 1) % self.max_buffer_len]
-                        )
-                        if self.__full:
+                        if self.__head == self.__tail:
                             self.__tail = (self.__tail + 1) % self.max_buffer_len
-                        self.__head = (self.__head + 1) % self.max_buffer_len
                         self.__frame_available.set()
                         self.__frame_count += 1
-                        self.__full = self.__head == self.__tail
+                        last_valid_frame = frame
             except Exception as e:
                 import traceback
 
@@ -216,6 +218,7 @@ def __capture(
             f"Screen Capture FPS: {int(self.__frame_count/(time.perf_counter() - self.__capture_start_time))}"
         )
 
+
     def _rebuild_frame_buffer(self, region: Tuple[int, int, int, int]):
         if region is None:
             region = self.region
@@ -254,4 +257,6 @@ def __repr__(self) -> str:
             self._output,
             self._stagesurf,
             self._duplicator,
-        )
\ No newline at end of file
+        )
+
+
diff --git a/bettercam/core/device.py b/bettercam/core/device.py
index cb9a4e2..8e424bd 100644
--- a/bettercam/core/device.py
+++ b/bettercam/core/device.py
@@ -79,3 +79,4 @@ def __repr__(self) -> str:
             self.desc.DedicatedVideoMemory // 1048576,
             self.desc.VendorId,
         )
+
diff --git a/bettercam/core/duplicator.py b/bettercam/core/duplicator.py
index 45135e1..355f7f3 100644
--- a/bettercam/core/duplicator.py
+++ b/bettercam/core/duplicator.py
@@ -55,3 +55,4 @@ def __repr__(self) -> str:
             self.__class__.__name__,
             self.duplicator is not None,
         )
+
diff --git a/bettercam/core/output.py b/bettercam/core/output.py
index 4ad88bd..62fb5f6 100644
--- a/bettercam/core/output.py
+++ b/bettercam/core/output.py
@@ -57,3 +57,4 @@ def __repr__(self) -> str:
             self.resolution,
             self.rotation_angle,
         )
+
diff --git a/bettercam/core/stagesurf.py b/bettercam/core/stagesurf.py
index f76cff3..25044da 100644
--- a/bettercam/core/stagesurf.py
+++ b/bettercam/core/stagesurf.py
@@ -66,3 +66,4 @@ def __repr__(self) -> str:
             (self.width, self.height),
             "DXGI_FORMAT_B8G8R8A8_UNORM",
         )
+
diff --git a/bettercam/processor/TorchCuda_Processor.py b/bettercam/processor/TorchCuda_Processor.py
new file mode 100644
index 0000000..e00bd83
--- /dev/null
+++ b/bettercam/processor/TorchCuda_Processor.py
@@ -0,0 +1,88 @@
+import ctypes
+import torch
+from .base import Processor
+
+class TorchProcessor(Processor):
+    def __init__(self, color_mode):
+        if color_mode not in ['BGRA', 'BGR', 'RGB', 'GRAY']:
+            raise ValueError("Unsupported color mode. Supported modes are 'BGRA', 'BGR', 'RGBA', 'RGB', and 'GRAY'.")
+        self.color_mode = color_mode
+
+    def process_cvtcolor(self, image):
+        if self.color_mode == 'RGB':
+            return image[:, :, [2, 1, 0]]  # BGRA to RGB
+        elif self.color_mode == 'BGR':
+            return image[:, :, :3]  # BGRA to BGR
+        elif self.color_mode == 'GRAY':
+            # BGRA to Grayscale using the luminosity method
+            return 0.2989 * image[:, :, 2] + 0.5870 * image[:, :, 1] + 0.1140 * image[:, :, 0]
+        return image
+
+    def processCPA0(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = region[1] * pitch
+        height = region[3] - region[1]
+        size = pitch * height
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + offset
+        buffer = (ctypes.c_char * size).from_address(buffer_ptr)
+        image = torch.frombuffer(buffer, dtype=torch.uint8).reshape((height, pitch // 4, 4)).cuda()
+
+        if region[3] - region[1] != image.shape[0]:
+            image = image[region[1]:region[3], :, :]
+        if region[2] - region[0] != image.shape[1]:
+            image = image[:, region[0]:region[2], :]
+            
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image.cpu().numpy()
+    
+    def processCPA90(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = (width - region[2]) * pitch
+        width = region[2] - region[0]
+        size = pitch * width
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + offset
+        buffer = (ctypes.c_char * size).from_address(buffer_ptr)
+        image = torch.frombuffer(buffer, dtype=torch.uint8).reshape((height, pitch // 4, 4)).cuda()
+        image = torch.rot90(image, 1, (0, 1))
+        if width != image.shape[0]:
+            image = image[:width, :, :]
+        if height != image.shape[1]:
+            image = image[:, :height, :]
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image.cpu().numpy()
+    
+    def processCPA180(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = (height - region[3]) * pitch
+        height = region[3] - region[1]
+        size = pitch * height
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + offset
+        buffer = (ctypes.c_char * size).from_address(buffer_ptr)
+        image = torch.frombuffer(buffer, dtype=torch.uint8).reshape((height, pitch // 4, 4)).cuda()
+        image = torch.rot90(image, 2, (0, 1))
+        if region[3] - region[1] != image.shape[0]:
+            image = image[region[1]:region[3], :, :]
+        if region[2] - region[0] != image.shape[1]:
+            image = image[:, region[0]:region[2], :]
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image.cpu().numpy()
+    
+    def processCPA270(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = region[0] * pitch
+        width = region[2] - region[0]
+        size = pitch * width
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + offset
+        buffer = (ctypes.c_char * size).from_address(buffer_ptr)
+        image = torch.frombuffer(buffer, dtype=torch.uint8).reshape((height, pitch // 4, 4)).cuda()
+        image = torch.rot90(image, 3, (0, 1))
+        if width != image.shape[0]:
+            image = image[:width, :, :]
+        if height != image.shape[1]:
+            image = image[:, :height, :]
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image.cpu().numpy()
diff --git a/bettercam/processor/base.py b/bettercam/processor/base.py
index c80e888..decffec 100644
--- a/bettercam/processor/base.py
+++ b/bettercam/processor/base.py
@@ -1,32 +1,66 @@
-import enum
-
-
-class ProcessorBackends(enum.Enum):
-    PIL = 0
-    NUMPY = 1
-    CUPY = 2
+from enum import Enum
 
+class ProcessorBackends(Enum):
+    NUMPY = 'numpy'
+    CUPY = 'cupy'
+    TORCHCUDA='TORCH-CUDA'
 
 class Processor:
-    def __init__(self, backend=ProcessorBackends.NUMPY, output_color: str = "RGB", nvidia_gpu: bool = False):
-        self.color_mode = output_color
+    def __init__(self, nvidia_gpu=False,torch_cuda=False, output_color="RGB", rotation_angle=0):
         if nvidia_gpu:
             backend = ProcessorBackends.CUPY
+        elif torch_cuda:
+            backend = ProcessorBackends.TORCHCUDA
+        else:
+            backend = ProcessorBackends.NUMPY
+
+        self.color_mode = output_color
         self.backend = self._initialize_backend(backend)
 
-    def process(self, rect, width, height, region, rotation_angle):
-        return self.backend.process(rect, width, height, region, rotation_angle)
+        if nvidia_gpu or torch_cuda :
+            self.set_rotation_CPfunction(rotation_angle)
+        else:
+            self.set_rotation_function(rotation_angle)
+
+    def set_rotation_function(self, angle):
+        angle_to_function = {
+            0: self.backend.processA0,
+            90: self.backend.processA90,
+            180: self.backend.processA180,
+            270: self.backend.processA270
+        }
+        self.process = angle_to_function.get(angle)
+        if not self.process:
+            raise ValueError(f"Unsupported rotation angle: {angle}")
+
+    def set_rotation_CPfunction(self, angle):
+        angle_to_function = {
+            0: self.backend.processCPA0,
+            90: self.backend.processCPA90,
+            180: self.backend.processCPA180,
+            270: self.backend.processCPA270
+        }
+        self.process = angle_to_function.get(angle)
+        if not self.process:
+            raise ValueError(f"Unsupported rotation angle: {angle}")
+
+    def process(self, rect, width, height, region):
+        pass
 
     def _initialize_backend(self, backend):
+        print(f"Initializing backend: {backend}")
         if backend == ProcessorBackends.NUMPY:
             from bettercam.processor.numpy_processor import NumpyProcessor
-
             return NumpyProcessor(self.color_mode)
-        
         elif backend == ProcessorBackends.CUPY:
             from bettercam.processor.cupy_processor import CupyProcessor
-
             return CupyProcessor(self.color_mode)
-        
+        elif backend == ProcessorBackends.TORCHCUDA:
+            from bettercam.processor.TorchCuda_Processor import TorchProcessor
+            return TorchProcessor(self.color_mode)
         else:
-            print(f"Unknown backend: {backend}")
+            raise ValueError(f"Unknown backend: {backend}")
+
+
+
+
diff --git a/bettercam/processor/cupy_processor.py b/bettercam/processor/cupy_processor.py
index ee10eb5..5b41f43 100644
--- a/bettercam/processor/cupy_processor.py
+++ b/bettercam/processor/cupy_processor.py
@@ -5,72 +5,93 @@
 
 class CupyProcessor(Processor):
     def __init__(self, color_mode):
-        self.cvtcolor = None
+        if color_mode not in ['BGRA', 'BGR','RGB', 'GRAY']:
+            raise ValueError("Unsupported color mode. Supported modes are 'BGRA', 'BGR', 'RGBA', 'RGB', and 'GRAY'.")
         self.color_mode = color_mode
-        if self.color_mode=='BGRA':
-            self.color_mode = None
 
     def process_cvtcolor(self, image):
-        import cv2
+        if self.color_mode == 'RGB':
+            return image[:, :, [2, 1, 0]]  # BGRA to RGB
+        elif self.color_mode == 'BGR':
+            return image[:, :, :3]  # BGRA to BGR
+        elif self.color_mode == 'GRAY':
+            # BGRA to Grayscale using the luminosity method
+            return 0.2989 * image[:, :, 2] + 0.5870 * image[:, :, 1] + 0.1140 * image[:, :, 0]
+        return image
 
-        # only one time process
-        if self.cvtcolor is None:
-            color_mapping = {
-                "RGB": cv2.COLOR_BGRA2RGB,
-                "RGBA": cv2.COLOR_BGRA2RGBA,
-                "BGR": cv2.COLOR_BGRA2BGR,
-                "GRAY": cv2.COLOR_BGRA2GRAY
-            }
-            cv2_code = color_mapping[self.color_mode]
-            if cv2_code != cv2.COLOR_BGRA2GRAY:
-                self.cvtcolor = lambda image: cv2.cvtColor(image, cv2_code)
-            else:
-                self.cvtcolor = lambda image: cv2.cvtColor(image, cv2_code)[
-                    ..., cp.newaxis
-                ] 
-        return self.cvtcolor(image)
-
-    def process(self, rect, width, height, region, rotation_angle):
+    def processCPA0(self, rect, width, height, region):
         pitch = int(rect.Pitch)
-
-        if rotation_angle in (0, 180):
-            offset = (region[1] if rotation_angle==0 else height-region[3])*pitch
-            height = region[3] - region[1]
-        else:
-            offset = (region[0] if rotation_angle==270 else width-region[2])*pitch
-            width = region[2] - region[0]
-
-        if rotation_angle in (0, 180):
-            size = pitch * height
-        else:
-            size = pitch * width
-
-        buffer = (ctypes.c_char*size).from_address(ctypes.addressof(rect.pBits.contents)+offset)#Pointer arithmetic
-        pitch = pitch // 4
-        if rotation_angle in (0, 180):
-            image = cp.frombuffer(buffer, dtype=cp.uint8).reshape(height, pitch, 4)
-
-        elif rotation_angle in (90, 270):
-            image = cp.frombuffer(buffer, dtype=cp.uint8).reshape(width, pitch, 4)
-
-        if not self.color_mode is None:
+        offset = region[1] * pitch
+        height = region[3] - region[1]
+        size = pitch * height
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + offset
+        buffer = cp.frombuffer((ctypes.c_char * size).from_address(buffer_ptr),dtype=cp.uint8)
+        #pitch = pitch // 4
+        image = cp.asarray(buffer, dtype=cp.uint8).reshape((height, pitch // 4, 4))
+        #image = image[:, :width, :]
+        if region[3] - region[1] != image.shape[0]:
+            image = image[region[1]:region[3], :, :]
+        if region[2] - region[0] != image.shape[1]:
+            image = image[:, region[0]:region[2], :]
+            
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return cp.asnumpy(image)
+    
+    def processCPA90(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = (width - region[2]) * pitch
+        width = region[2] - region[0]
+        size = pitch * width
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + offset
+        buffer = cp.frombuffer((ctypes.c_char * size).from_address(buffer_ptr),dtype=cp.uint8)
+        #pitch = pitch // 4
+        image = cp.asarray(buffer, dtype=cp.uint8).reshape((height, pitch // 4, 4))
+        image = cp.rot90(image, axes=(1, 0))
+        if width != image.shape[0]:
+            image = image[:width, :, :]
+        if height != image.shape[1]:
+            image = image[:, :height, :]
+        if self.color_mode is not None:
             image = self.process_cvtcolor(image)
+        return cp.asnumpy(image)
+    
+    def processCPA180(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = (height - region[3]) * pitch
+        height = region[3] - region[1]
+        size = pitch * height
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + offset
+        buffer = cp.frombuffer((ctypes.c_char * size).from_address(buffer_ptr),dtype=cp.uint8)
+        #pitch = pitch // 4
+        image = cp.asarray(buffer, dtype=cp.uint8).reshape((height, pitch // 4, 4))
+        image = cp.rot90(image, k=2, axes=(0, 1))
+        if region[3] - region[1] != image.shape[0]:
+            image = image[region[1]:region[3], :, :]
+        if region[2] - region[0] != image.shape[1]:
+            image = image[:, region[0]:region[2], :]
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return cp.asnumpy(image)
+    
+    def processCPA270(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = region[0] * pitch
+        width = region[2] - region[0]
+        size = pitch * width
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + offset
+        buffer = cp.frombuffer((ctypes.c_char * size).from_address(buffer_ptr),dtype=cp.uint8)
+        image = cp.asarray(buffer, dtype=cp.uint8).reshape((height, pitch // 4, 4))
+        image = cp.rot90(image, axes=(0, 1))
+        if width != image.shape[0]:
+            image = image[:width, :, :]
+        if height != image.shape[1]:
+            image = image[:, :height, :]
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return cp.asnumpy(image)
+
 
-        if rotation_angle == 90:
-            image = cp.rot90(image, axes=(1, 0))
-        elif rotation_angle == 180:
-            image = cp.rot90(image, k=2, axes=(0, 1))
-        elif rotation_angle == 270:
-            image = cp.rot90(image, axes=(0, 1))
 
-        if rotation_angle in (0, 180) and pitch != width:
-            image = image[:, :width, :]
-        elif rotation_angle in (90, 270) and pitch != height:
-            image = image[:height, :, :]
 
-        if region[3] - region[1] != image.shape[0]:
-            image = image[region[1] : region[3], :, :]
-        if region[2] - region[0] != image.shape[1]:
-            image = image[:, region[0] : region[2], :]
 
-        return image
\ No newline at end of file
diff --git a/bettercam/processor/numpy_processor.py b/bettercam/processor/numpy_processor.py
index c8a0da0..b1c5526 100644
--- a/bettercam/processor/numpy_processor.py
+++ b/bettercam/processor/numpy_processor.py
@@ -22,6 +22,7 @@ def process_cvtcolor(self, image):
                 "RGB": cv2.COLOR_BGRA2RGB,
                 "RGBA": cv2.COLOR_BGRA2RGBA,
                 "BGR": cv2.COLOR_BGRA2BGR,
+                "HSV": cv2.COLOR_BGR2HSV,
                 "GRAY": cv2.COLOR_BGRA2GRAY
             }
             cv2_code = color_mapping[self.color_mode]
@@ -36,46 +37,79 @@ def process_cvtcolor(self, image):
     def shot(self, image_ptr, rect, width, height):
         ctypes.memmove(image_ptr, rect.pBits, height*width*4)
 
-    def process(self, rect, width, height, region, rotation_angle):
-        pitch = int(rect.Pitch)
 
-        if rotation_angle in (0, 180):
-            offset = (region[1] if rotation_angle==0 else height-region[3])*pitch
-            height = region[3] - region[1]
-        else:
-            offset = (region[0] if rotation_angle==270 else width-region[2])*pitch
-            width = region[2] - region[0]
-
-        if rotation_angle in (0, 180):
-            size = pitch * height
-        else:
-            size = pitch * width
-
-        buffer = (ctypes.c_char*size).from_address(ctypes.addressof(rect.pBits.contents)+offset)#Pointer arithmetic
-        pitch = pitch // 4
-        if rotation_angle in (0, 180):
-            image = np.ndarray((height, pitch, 4), dtype=np.uint8, buffer=buffer)
-        elif rotation_angle in (90, 270):
-            image = np.ndarray((width, pitch, 4), dtype=np.uint8, buffer=buffer)
-
-        if not self.color_mode is None:
+    def processA0(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = region[1] * pitch
+        height = region[3] - region[1]
+        size = pitch * height
+        buffer = (ctypes.c_char * size).from_address(ctypes.addressof(rect.pBits.contents) + offset)
+        image = np.ndarray((height, pitch // 4, 4), dtype=np.uint8, buffer=buffer)
+        #image = image[:, :width, :]        
+        if region[3] - region[1] != image.shape[0]:
+            image = image[region[1]:region[3], :, :]
+        if region[2] - region[0] != image.shape[1]:
+            image = image[:, region[0]:region[2], :]
+            
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image
+    
+    def processA90(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = (width - region[2]) * pitch
+        width = region[2] - region[0]
+        size = pitch * width
+        buffer = (ctypes.c_char * size).from_address(ctypes.addressof(rect.pBits.contents) + offset)
+        image = np.ndarray((width, pitch // 4, 4), dtype=np.uint8, buffer=buffer)
+        image = np.rot90(image, axes=(1, 0))
+        if width != image.shape[0]:
+            image = image[:width, :, :]
+        if height != image.shape[1]:
+            image = image[:, :height, :]
+        if self.color_mode is not None:
             image = self.process_cvtcolor(image)
+        return image
+    
+    def processA180(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = (height - region[3]) * pitch
+        height = region[3] - region[1]
+        size = pitch * height
+        buffer = (ctypes.c_char * size).from_address(ctypes.addressof(rect.pBits.contents) + offset)
+        image = np.ndarray((height, pitch // 4, 4), dtype=np.uint8, buffer=buffer)
+        image = np.rot90(image, k=2, axes=(0, 1))
+        if region[3] - region[1] != image.shape[0]:
+            image = image[region[1]:region[3], :, :]
+        if region[2] - region[0] != image.shape[1]:
+            image = image[:, region[0]:region[2], :]
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image
+    
+    def processA270(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        offset = region[0] * pitch
+        width = region[2] - region[0]
+        size = pitch * width
+        buffer = (ctypes.c_char * size).from_address(ctypes.addressof(rect.pBits.contents) + offset)
+        image = np.ndarray((width, pitch // 4, 4), dtype=np.uint8, buffer=buffer)
+        image = np.rot90(image, axes=(0, 1))
+        if width != image.shape[0]:
+            image = image[:width, :, :]
+        if height != image.shape[1]:
+            image = image[:, :height, :]
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image
+
+
+
+
+
+
+
 
-        if rotation_angle == 90:
-            image = np.rot90(image, axes=(1, 0))
-        elif rotation_angle == 180:
-            image = np.rot90(image, k=2, axes=(0, 1))
-        elif rotation_angle == 270:
-            image = np.rot90(image, axes=(0, 1))
 
-        if rotation_angle in (0, 180) and pitch != width:
-            image = image[:, :width, :]
-        elif rotation_angle in (90, 270) and pitch != height:
-            image = image[:height, :, :]
 
-        if region[3] - region[1] != image.shape[0]:
-            image = image[region[1] : region[3], :, :]
-        if region[2] - region[0] != image.shape[1]:
-            image = image[:, region[0] : region[2], :]
 
-        return image
\ No newline at end of file
diff --git a/bettercam/processor/prova b/bettercam/processor/prova
new file mode 100644
index 0000000..cc96f5b
--- /dev/null
+++ b/bettercam/processor/prova
@@ -0,0 +1,70 @@
+import ctypes
+import torch
+from .base import Processor
+
+class TorchProcessor(Processor):
+    def __init__(self, color_mode):
+        if color_mode not in ['BGRA', 'BGR', 'RGBA', 'RGB', 'GRAY']:
+            raise ValueError("Unsupported color mode. Supported modes are 'BGRA', 'BGR', 'RGBA', 'RGB', and 'GRAY'.")
+        self.color_mode = color_mode
+
+    def process_cvtcolor(self, image):
+        if self.color_mode == 'RGB':
+            return image[:, :, [2, 1, 0]]  # BGRA to RGB
+        elif self.color_mode == 'BGR':
+            return image[:, :, :3]  # BGRA to BGR
+        elif self.color_mode == 'GRAY':
+            # BGRA to Grayscale using the luminosity method
+            return 0.2989 * image[:, :, 2] + 0.5870 * image[:, :, 1] + 0.1140 * image[:, :, 0]
+        return image
+
+    def processCPA0(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + region[1] * pitch
+        size = (region[3] - region[1]) * pitch
+        buffer = (ctypes.c_char * size).from_address(buffer_ptr)
+        image = torch.frombuffer(buffer, dtype=torch.uint8).reshape((region[3] - region[1], pitch // 4, 4)).cuda()
+        image = image[:, region[0]:region[2], :]
+        
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image.cpu().numpy()
+
+    def processCPA90(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + (width - region[2]) * pitch
+        size = (region[2] - region[0]) * pitch
+        buffer = (ctypes.c_char * size).from_address(buffer_ptr)
+        image = torch.frombuffer(buffer, dtype=torch.uint8).reshape((height, pitch // 4, 4)).cuda()
+        image = torch.rot90(image, 1, (0, 1))
+        image = image[:region[2] - region[0], :region[3] - region[1], :]
+        
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image.cpu().numpy()
+
+    def processCPA180(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + (height - region[3]) * pitch
+        size = (region[3] - region[1]) * pitch
+        buffer = (ctypes.c_char * size).from_address(buffer_ptr)
+        image = torch.frombuffer(buffer, dtype=torch.uint8).reshape((region[3] - region[1], pitch // 4, 4)).cuda()
+        image = torch.rot90(image, 2, (0, 1))
+        image = image[:, region[0]:region[2], :]
+        
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image.cpu().numpy()
+
+    def processCPA270(self, rect, width, height, region):
+        pitch = int(rect.Pitch)
+        buffer_ptr = ctypes.addressof(rect.pBits.contents) + region[0] * pitch
+        size = (region[2] - region[0]) * pitch
+        buffer = (ctypes.c_char * size).from_address(buffer_ptr)
+        image = torch.frombuffer(buffer, dtype=torch.uint8).reshape((height, pitch // 4, 4)).cuda()
+        image = torch.rot90(image, 3, (0, 1))
+        image = image[:region[2] - region[0], :region[3] - region[1], :]
+        
+        if self.color_mode is not None:
+            image = self.process_cvtcolor(image)
+        return image.cpu().numpy()