diff --git a/docs/tutorial/installation.md b/docs/tutorial/installation.md
index 356e8a2a8..63cdcc7c3 100644
--- a/docs/tutorial/installation.md
+++ b/docs/tutorial/installation.md
@@ -90,6 +90,58 @@ python3 examples/env/validate_installation.py
 
 After installation validation passed, you are good to go!
 
+(install-skypilot)=
+
+## (Optional) Install SkyPilot
+
+SkyPilot helps you run AReaL easily on 17+ different cloud or your own Kubernetes
+infrastructure. For more details about Skypilot, check
+[SkyPilot Documentation](https://docs.skypilot.co/en/latest/overview.html). Below shows
+the minimal steps to setup skypilot on GCP or Kubernetes.
+
+### Install SkyPilot by pip
+
+```bash
+# In your conda environment
+# NOTE: SkyPilot requires 3.7 <= python <= 3.13
+pip install -U "skypilot[gcp,kubernetes]"
+```
+
+### GCP setup
+
+```bash
+# Install Google Cloud SDK
+conda install -y -c conda-forge google-cloud-sdk
+
+# Initialize gcloud and select your account/project
+gcloud init
+
+# (Optional) choose a project explicitly
+gcloud config set project <PROJECT_ID>
+
+# Create Application Default Credentials
+gcloud auth application-default login
+```
+
+### Kubernetes setup
+
+Check
+[here](https://docs.skypilot.co/en/latest/reference/kubernetes/kubernetes-setup.html)
+for a comprehensive guide on how to set up a kubernetes cluster for SkyPilot.
+
+### Verify
+
+```bash
+sky check
+```
+
+If `GCP: enabled` or `Kubernetes: enabled` are shown, you're ready to use SkyPilot with
+AReaL. Check
+[here](https://github.com/inclusionAI/AReaL/blob/main/examples/skypilot/README.md) for a
+detailed example to run AReaL with SkyPilot. For more options and details for SkyPilot,
+see the official
+[SkyPilot installation guide](https://docs.skypilot.co/en/latest/getting-started/installation.html).
+
 ## (Optional) Launch Ray Cluster for Distributed Training
 
 On the first node, start the Ray Head:
diff --git a/docs/tutorial/quickstart.md b/docs/tutorial/quickstart.md
index 7d963b738..143b92e18 100644
--- a/docs/tutorial/quickstart.md
+++ b/docs/tutorial/quickstart.md
@@ -111,6 +111,29 @@ Additional references:
 > **Note**: Ray and Slurm launchers only work for distributed experiments with more than 1 node (`cluster.n_nodes > 1`). They allocate GPUs for training and generation at the granularity of **nodes**, which means the number of GPUs allocated for generation and training must be integer multiples of `cluster.n_gpus_per_node`.
 -->
 
+## Distributed Experiments on Cloud or K8s with SkyPilot
+
+If you want to directly run an experiment on cloud or your own Kubernetes
+infrastructure, we recommend you to use SkyPilot. After installing and setting up
+SkyPilot (see [Install SkyPilot](installation.md#install-skypilot)), you could launch a
+distributed experiment based on our SkyPilot example (two 8xA100 GPU nodes) with one
+command line:
+
+```bash
+# Launch on GCP
+sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra gcp
+# Launch on AWS
+sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra aws
+# Launch on your K8s Cluster
+sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra k8s
+```
+
+Check
+[Running AReaL with SkyPilot](https://github.com/inclusionAI/AReaL/blob/main/examples/skypilot/README.md),
+for more details about the examples. Check
+[SkyPilot Documentation](https://docs.skypilot.co/en/latest/docs/index.html) for more
+information about SkyPilot.
+
 (switching-from-legacy-areal-to-areal-lite)=
 
 ## Switching from legacy AReaL to AReaL-lite
diff --git a/examples/skypilot/README.md b/examples/skypilot/README.md
new file mode 100644
index 000000000..ac5b21126
--- /dev/null
+++ b/examples/skypilot/README.md
@@ -0,0 +1,196 @@
+# Running AReaL with SkyPilot
+
+This README includes examples and guidelines to running AReaL experiments with SkyPilot.
+Make sure you have SkyPilot properly installed following
+[our installation guide](../../docs/tutorial/installation.md#optional-install-skypilot)
+before running this example. Note that all command lines shown in this file are assumed
+to be execute under the root of AReaL repository.
+
+## Running a Single Node Experiment
+
+To run a single node experiment, you only need to setup the node with SkyPilot and
+launch the experiment with AReaL local launcher.
+[The following file](single_node.sky.yaml) shows a SkyPilot yaml that could launch a
+simple GSM8K GRPO experiment in a single command line. This example is tested on both
+GCP and a K8S cluster.
+
+```yaml
+name: areal-test-skypilot
+
+resources:
+  accelerators: A100:2
+  autostop:
+    idle_minutes: 10
+    down: true
+  cpus: 8+
+  memory: 32GB+
+  disk_size: 256GB
+  image_id: docker:ghcr.io/inclusionai/areal-runtime:v0.3.4
+
+num_nodes: 1
+
+file_mounts:
+  /storage: # Should be consistent with the storage paths set in gsm8k_grpo_ray.yaml
+    source: s3://my-bucket/  # or gs://, https://<azure_storage_account>.blob.core.windows.net/<container>, r2://, cos://<region>/<bucket>, oci://<bucket_name>
+    mode: MOUNT  # MOUNT or COPY or MOUNT_CACHED. Defaults to MOUNT. Optional.
+
+workdir: .
+
+run: |
+  python3 -m areal.launcher.local examples/math/gsm8k_grpo.py \
+    --config examples/math/gsm8k_grpo.yaml \
+    experiment_name=gsm8k-grpo \
+    trial_name=trial0 \
+    cluster.n_nodes=1 \
+    cluster.n_gpus_per_node=$SKYPILOT_NUM_GPUS_PER_NODE \
+    allocation_mode=sglang.d1+d1 \
+    train_dataset.batch_size=4 \
+    actor.mb_spec.max_tokens_per_mb=4096
+```
+
+To run the experiment, execute:
+
+```bash
+sky launch -c areal-test examples/skypilot/single_node.sky.yaml
+```
+
+To designate the cloud or infrastructure you wish to run your experiment on by adding
+`--infra xxx`. For example:
+
+```bash
+sky launch -c areal-test examples/skypilot/single_node.sky.yaml --infra gcp
+sky launch -c areal-test examples/skypilot/single_node.sky.yaml --infra aws
+sky launch -c areal-test examples/skypilot/single_node.sky.yaml --infra k8s
+```
+
+## Running a Multi-Node Experiment
+
+### Running AReaL with Ray Launcher
+
+The following example shows how to setup a ray cluster with SkyPilot and then use AReaL
+to run GRPO with GSM8K dataset on 2 nodes, each with 1 A100 GPU. This example is tested
+on GCP and a K8S cluster.
+
+Specify the resources and image used to run the experiment.
+
+```yaml
+resources:
+  accelerators: A100:8
+  image_id: docker:ghcr.io/inclusionai/areal-runtime:v0.3.4
+  memory: 256+
+  cpus: 32+
+
+num_nodes: 2
+
+workdir: .
+```
+
+Designate shared storage. You could either use an existing cloud bucket or volume:
+
+```yaml
+file_mounts:
+  /storage: # Should be consistent with the storage paths set in gsm8k_grpo_ray.yaml
+    source: s3://my-bucket/  # or gs://, https://<azure_storage_account>.blob.core.windows.net/<container>, r2://, cos://<region>/<bucket>, oci://<bucket_name>
+    mode: MOUNT  # MOUNT or COPY or MOUNT_CACHED. Defaults to MOUNT. Optional.
+```
+
+or create a new bucket or volume with SkyPilot:
+
+```yaml
+# Create an empty gcs bucket
+file_mounts:
+  /storage: # Should be consistent with the storage paths set in gsm8k_grpo_ray.yaml
+    name: my-sky-bucket
+    store: gcs  # Optional: either of s3, gcs, azure, r2, ibm, oci
+```
+
+For more information about shared storage with SkyPilot, check
+[SkyPilot Cloud Buckets](https://docs.skypilot.co/en/latest/reference/storage.html) and
+[SkyPilot Volume](https://docs.skypilot.co/en/latest/reference/volumes.html).
+
+Next, prepare commands used to setup ray cluster and run the experiment.
+
+```yaml
+envs:
+  EXPERIMENT_NAME: my-areal-experiment
+  TRIAL_NAME: my-trial-name
+
+run: |
+  run: |
+  # Get the Head node's IP and total number of nodes (environment variables injected by SkyPilot).
+  head_ip=$(echo "$SKYPILOT_NODE_IPS" | head -n1)
+
+  if [ "$SKYPILOT_NODE_RANK" = "0" ]; then
+    echo "Starting Ray head node..."
+    ray start --head --port=6379
+
+    while [ $(ray status | grep node_ | wc -l) -lt $SKYPILOT_NUM_NODES ]; do
+      echo "Waiting for all nodes to join... Current nodes: $(ray status | grep node_ | wc -l) / $SKYPILOT_NUM_NODES"
+      sleep 5
+    done
+
+    echo "Executing training script on head node..."
+    python3 -m areal.launcher.ray examples/math/gsm8k_grpo.py \
+            --config examples/skypilot/gsm8k_grpo_ray.yaml \
+            experiment_name=gsm8k-grpo \
+            trial_name=trial0 \
+            cluster.n_nodes=$SKYPILOT_NUM_NODES \
+            cluster.n_gpus_per_node=$SKYPILOT_NUM_GPUS_PER_NODE \
+            allocation_mode=sglang.d8+d8
+  else
+    sleep 10
+    echo "Starting Ray worker node..."
+    ray start --address $head_ip:6379
+    sleep 5
+  fi
+
+  echo "Node setup complete for rank $SKYPILOT_NODE_RANK."
+```
+
+**Note**: If you are running on a cluster in which nodes are connected via infiniband,
+you might need an additional config field to the example yaml file for the experiment to
+run:
+
+```yaml
+config:
+  kubernetes:
+    pod_config:
+      spec:
+        containers:
+        - securityContext:
+            capabilities:
+              add:
+              - IPC_LOCK
+```
+
+### Launch the Ray Cluster and Run AReaL Experiment
+
+Then you are ready to run AReaL with command line:
+
+```bash
+sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml
+```
+
+To designate the cloud or infrastructure you wish to run your experiment on by adding
+`--infra xxx`. For example:
+
+```bash
+sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra gcp
+sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra aws
+sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra k8s
+```
+
+You should be able to see your AReaL running and producing training logs in your
+terminal.
+
+Successfully launched 2 nodes on GCP and deployed a ray cluster:
+<img align="center" alt="Launching Ray Cluster" src="ray_launch.png" width="100%">
+
+Successfully ran a training step:
+<img align="center" alt="Running a train step" src="train_step_success.png" width="100%">
+
+### Running AReaL with SkyPilot Launcher
+
+AReaL plans to support a SkyPilot native launcher with
+[SkyPilot Python SDK](https://docs.skypilot.co/en/latest/reference/api.html), which is
+currently under development.
diff --git a/examples/skypilot/gsm8k_grpo_ray.yaml b/examples/skypilot/gsm8k_grpo_ray.yaml
new file mode 100644
index 000000000..97cf55a15
--- /dev/null
+++ b/examples/skypilot/gsm8k_grpo_ray.yaml
@@ -0,0 +1,153 @@
+experiment_name: gsm8k-grpo-on-ray
+trial_name: trial0
+
+seed: 1
+total_train_epochs: 10
+tokenizer_path: ${actor.path}
+async_training: true
+
+cluster:
+  n_nodes: 2
+  n_gpus_per_node: 8
+  fileroot: /storage/experiments
+  name_resolve:
+    type: ray
+    ray_actor_name: ray_kv_store
+
+allocation_mode: sglang.d8+d8
+
+rollout:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  max_concurrent_rollouts: 256
+  queue_size: null
+  consumer_batch_size: ${train_dataset.batch_size}
+  max_head_offpolicyness: 2
+  enable_rollout_tracing: false
+
+gconfig:
+  n_samples: 4
+  min_new_tokens: 0
+  max_new_tokens: 1024
+  greedy: false
+  temperature: 1.0
+
+actor:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  path: Qwen/Qwen2.5-1.5B-Instruct
+  init_from_scratch: false
+  disable_dropout: true
+  gradient_checkpointing: false
+  dtype: bfloat16
+  mb_spec:
+    max_tokens_per_mb: 4096
+  optimizer:
+    type: adam
+    lr: 1.70e-5
+    weight_decay: 0.017
+    beta1: 0.9
+    beta2: 0.999
+    eps: 1e-8
+    lr_scheduler_type: constant
+    gradient_clipping: 1.0
+    warmup_steps_proportion: 0.001
+  backend: fsdp
+  group_size: ${gconfig.n_samples}
+  eps_clip: 0.4
+  temperature: ${gconfig.temperature}
+  reward_scaling: 10.0
+  reward_bias: -0.5
+  kl_ctl: 0.0
+  ppo_n_minibatches: 1
+  recompute_logprob: true
+  use_decoupled_loss: true
+  behav_imp_weight_cap: 5.0
+  dynamic_sampling: false
+  reward_norm:
+    mean_level: group
+    std_level: group
+    group_size: ${gconfig.n_samples}
+  adv_norm:
+    mean_level: batch
+    std_level: batch
+  max_new_tokens: ${gconfig.max_new_tokens}
+
+ref:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  path: ${actor.path}
+  init_from_scratch: false
+  disable_dropout: true
+  dtype: ${actor.dtype}
+  mb_spec:
+    max_tokens_per_mb: 10240
+  optimizer: null
+  backend: fsdp
+
+# SGLang
+sglang:
+  model_path: ${actor.path}
+  random_seed: ${seed}
+  skip_tokenizer_init: true
+  dtype: ${actor.dtype}
+  max_running_requests: null
+  context_length: 32768
+  mem_fraction_static: 0.8
+
+# datasets
+train_dataset:
+  batch_size: 128
+  shuffle: true
+  pin_memory: true
+  num_workers: 4
+  path: openai/gsm8k
+  type: rl
+  max_length: 1024
+
+valid_dataset:
+  batch_size: 128
+  shuffle: true
+  pin_memory: true
+  num_workers: 4
+  path: openai/gsm8k
+  type: rl
+
+# Utilities
+saver:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  freq_epochs: 1
+  freq_steps: null
+  freq_secs: null
+
+recover:
+  mode: disabled
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  freq_epochs: 1
+  freq_steps: null
+  freq_secs: 3600
+
+evaluator:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  freq_epochs: 1
+  freq_steps: null
+  freq_secs: null
+
+stats_logger:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  wandb:
+    mode: disabled
+
+launcher:
+  inference_server_cpus_per_gpu: 4
+  inference_server_mem_per_gpu: 32768
+  trainer_cpus_per_gpu: 4
+  trainer_mem_per_gpu: 32768
diff --git a/examples/skypilot/ray_cluster.sky.yaml b/examples/skypilot/ray_cluster.sky.yaml
new file mode 100644
index 000000000..f04112c37
--- /dev/null
+++ b/examples/skypilot/ray_cluster.sky.yaml
@@ -0,0 +1,45 @@
+
+resources:
+  accelerators: A100:8
+  image_id: docker:ghcr.io/inclusionai/areal-runtime:v0.3.4
+  memory: 32+
+  cpus: 8+
+
+num_nodes: 2
+
+workdir: .
+
+file_mounts:
+  /storage: # Should be consistent with the storage paths set in gsm8k_grpo_ray.yaml
+    source: s3://my-bucket/  # or gs://, https://<azure_storage_account>.blob.core.windows.net/<container>, r2://, cos://<region>/<bucket>, oci://<bucket_name>
+    mode: MOUNT  # MOUNT or COPY or MOUNT_CACHED. Defaults to MOUNT. Optional.
+
+run: |
+  # Get the Head node's IP and total number of nodes (environment variables injected by SkyPilot).
+  head_ip=$(echo "$SKYPILOT_NODE_IPS" | head -n1)
+
+  if [ "$SKYPILOT_NODE_RANK" = "0" ]; then
+    echo "Starting Ray head node..."
+    ray start --head --port=6379
+
+    while [ $(ray status | grep node_ | wc -l) -lt $SKYPILOT_NUM_NODES ]; do
+      echo "Waiting for all nodes to join... Current nodes: $(ray status | grep node_ | wc -l) / $SKYPILOT_NUM_NODES"
+      sleep 5
+    done
+
+    echo "Executing training script on head node..."
+    python3 -m areal.launcher.ray examples/math/gsm8k_grpo.py \
+            --config examples/skypilot/gsm8k_grpo_ray.yaml \
+            experiment_name=gsm8k-grpo \
+            trial_name=trial0 \
+            cluster.n_nodes=$SKYPILOT_NUM_NODES \
+            cluster.n_gpus_per_node=$SKYPILOT_NUM_GPUS_PER_NODE \
+            allocation_mode=sglang.d8+d8
+  else
+    sleep 10
+    echo "Starting Ray worker node..."
+    ray start --address $head_ip:6379
+    sleep 5
+  fi
+
+  echo "Node setup complete for rank $SKYPILOT_NODE_RANK."
diff --git a/examples/skypilot/ray_launch.png b/examples/skypilot/ray_launch.png
new file mode 100644
index 000000000..207251b28
Binary files /dev/null and b/examples/skypilot/ray_launch.png differ
diff --git a/examples/skypilot/single_node.sky.yaml b/examples/skypilot/single_node.sky.yaml
new file mode 100644
index 000000000..11f5015cd
--- /dev/null
+++ b/examples/skypilot/single_node.sky.yaml
@@ -0,0 +1,31 @@
+name: areal-test-skypilot
+
+resources:
+  accelerators: A100:2
+  autostop:
+    idle_minutes: 10
+    down: true
+  cpus: 8+
+  memory: 32GB+
+  disk_size: 256GB
+  image_id: docker:ghcr.io/inclusionai/areal-runtime:v0.3.4
+
+num_nodes: 1
+
+file_mounts:
+  /storage: # Should be consistent with the storage paths set in gsm8k_grpo_ray.yaml
+    source: s3://my-bucket/  # or gs://, https://<azure_storage_account>.blob.core.windows.net/<container>, r2://, cos://<region>/<bucket>, oci://<bucket_name>
+    mode: MOUNT  # MOUNT or COPY or MOUNT_CACHED. Defaults to MOUNT. Optional.
+
+workdir: .
+
+run: |
+  python3 -m areal.launcher.local examples/math/gsm8k_grpo.py \
+    --config examples/math/gsm8k_grpo.yaml \
+    experiment_name=gsm8k-grpo \
+    trial_name=trial0 \
+    cluster.n_nodes=1 \
+    cluster.n_gpus_per_node=$SKYPILOT_NUM_GPUS_PER_NODE \
+    allocation_mode=sglang.d1+d1 \
+    train_dataset.batch_size=4 \
+    actor.mb_spec.max_tokens_per_mb=4096
diff --git a/examples/skypilot/train_step_success.png b/examples/skypilot/train_step_success.png
new file mode 100644
index 000000000..4acb5f6ba
Binary files /dev/null and b/examples/skypilot/train_step_success.png differ