Skip to content

The unavoidable UX–security trade-offs in TLOS arising from interface design and physical constraints. #78

@SoraSuegami

Description

@SoraSuegami

I will first summarize, to the best of my understanding, TLOS’s interface and the security it provides, and then suggest the unavoidable UX–security trade-offs in TLOS arising from interface design and physical constraints.

Summary of TLOS

TLOS is an obfuscation scheme for a point function that returns 1 if the given input x is equal to the hardcoded (planted) secret; otherwise it returns 0. In a simple hash-based instantiation of point-function obfuscation, TLOS claims that it increases the computational cost of evaluating the function on each input. Consequently, an honest user who knows the planted secret only needs to evaluate the obfuscated circuit once, whereas an attacker must attempt evaluations on many candidate inputs. If, in the application design, the benefit of learning the planted secret is larger than the cost of evaluating the obfuscated circuit once, but smaller than the total cost an attacker would incur to brute-force the secret (i.e., to evaluate the obfuscated circuit as many times as needed), then TLOS is secure against an economically rational adversary. In particular, even when the input domain is small, security can be maintained as long as this cost–benefit relationship continues to hold.


The unavoidable UX–security trade-offs in TLOS arising from interface design and physical constraints

I will analyze the properties of point-function obfuscation itself, independently of any particular implementation. The procedure for evaluating the obfuscated circuit can be classified into the following three types:

  1. Computation that does not depend on the input $x$.
  2. Computation that depends on $x$ and is necessary to learn the output of the obfuscated circuit.
  3. Computation that may depend on $x$ but is not necessary to learn the output of the obfuscated circuit. (For example, TLOS’s PoW layer falls into this category, since it can be executed after the LWE puzzle layer reveals the output—namely, after one learns that a particular input is the planted one.)

Let $c_1$, $c_2$, and $c_3$ denote the per-input computational costs required for the computations in categories 1, 2, and 3, respectively. When an adversary tries $n$ inputs, it must incur a total cost of $(c_1 + c_3) + n c_2$. Therefore, the difference in computational cost between the adversary and an honest user is $(n-1)c_2$.

The number of trials $n$ that an adversary needs in order to find an input equal to the planted secret with non-negligible probability depends only on the application, and is independent of the particular implementation of point-function obfuscation. If the planted secret is chosen uniformly at random from a domain $\mathcal{D}$ of size $|\mathcal{D}|$, then an adversary who tries $n$ distinct inputs hits the planted secret with probability $\frac{n}{|\mathcal{D}|}$.

For example, in an application such as a wallet with human-memorable recovery codes mentioned in the paper, if the code is a 6-digit decimal string, then with $n = 10000$ the adversary can guess the planted secret with probability 1%. If this wallet is intended to remain secure for balances up to $1000, then the computation cost $c_2$ must be on the order of $0.1 even for an honest user.

The appendix presents a ChatGPT-generated analysis of the relationship between electricity cost and the computation time required on a smartphone, a laptop PC, and a GPU server. To force an electricity cost of $0.10—assuming the lowest electricity price in the US and measuring computation in 64-bit integer multiplications—a smartphone would need to compute continuously for more than five days. Note that in practice, TLOS computation is not composed solely of 64-bit multiplications, so this estimate does not directly carry over as-is.

Summarizing the analysis above, to ensure that a wallet protected by a six-digit recovery code safeguards $1,000 of assets against an economically rational adversary with probability at least 99%, an honest user with a smartphone would need to make the phone compute continuously for a few days—incurring about $0.10 in electricity cost—each time they authenticate. This is hardly an ideal user experience.

Even when considering the problem more generally, in settings with a small input domain—such as the applications discussed in the TLOS paper—the required attack cost is fundamentally limited by brute-force search over the input space and thus can be at most a small constant factor larger than the honest evaluation cost. As a result, if one aims to guarantee a sufficiently high attack cost except with sufficiently small probability, it is unavoidable that the honest evaluation cost also increases accordingly, which in turn degrades the UX by increasing the honest user’s computation (waiting) time.


Appendix: ChatGPT-generated analysis of the relationship between electricity cost and the computation time required on a smartphone, a laptop PC, and a GPU server

1) Scope

This report estimates:

  1. How much electrical energy you can buy for $0.10 at the lowest U.S. residential retail electricity price (state-level average).
  2. How many 64‑bit integer multiplications that energy corresponds to on:
    • a smartphone‑class CPU,
    • a laptop‑class CPU,
    • a server‑class GPU.
  3. How long each device would need to run (at assumed power draw) to consume $0.10 of electricity.

All compute figures are order‑of‑magnitude engineering estimates, not benchmark results.


2) Electricity price assumption: lowest U.S. residential retail price (state average)

Using the Electric Power Monthly Table 5.6.A (Average Price of Electricity to Ultimate Customers by End‑Use Sector, by State), data for November 2025 (release date Jan 26, 2026), the lowest residential state average listed is North Dakota: 11.93 ¢/kWh. (eia.gov)

Let:

  • Price $p = 11.93 \text{¢/kWh} = 0.1193 \text{\$/kWh}$

3) Energy purchasable with $0.10

Energy purchasable with $0.10:

$$ E_{\text{kWh}} = \frac{0.10}{p} = \frac{0.10}{0.1193} \approx 0.8382\ \text{kWh} $$

Convert to joules using $1\ \text{kWh} = 3.6\times 10^6\ \text{J}$:

$$ E_{\text{J}} \approx 0.8382\times 3.6\times 10^6 \approx 3.018\times 10^6\ \text{J} = 3.018\ \text{MJ} $$

So, $0.10 buys ~0.838 kWh (~3.02 MJ) under the price assumption above.


4) Device models and compute models

4.1 Smartphone‑class CPU model (A17 Pro‑class)

Core configuration (representative): 6‑core CPU with 2 performance + 4 efficiency cores. (support.apple.com)
Clock assumptions used for the model: P‑cores up to 3.78 GHz; E‑cores up to 2.11 GHz. (notebookcheck.net)

Throughput model (idealized):

  • Assume 1 × int64 multiply per cycle per core at the assumed clocks.
  • Aggregate cycle budget: $R_{\text{phone}} = 2\cdot 3.78 + 4\cdot 2.11 = 16.0 \text{GHz} \Rightarrow 1.6\times 10^{10} \text{int64 mul/s} $

Power assumption (workload power, not a spec): 3–6 W.


4.2 Laptop‑class CPU model (Intel Core i7‑1360P‑class)

Representative parameters:

  • 4 P‑cores (max turbo 5.00 GHz)
  • 8 E‑cores (max turbo 3.70 GHz)
  • Processor base power 28 W; maximum turbo power 64 W (intel.com)

Throughput model (idealized): $R_{\text{laptop}} = 4\cdot 5.0 + 8\cdot 3.7 = 49.6 \text{GHz} \Rightarrow 4.96\times 10^{10}\ \text{int64 mul/s}$

Power assumption: use 28 W and 64 W as a bracket (CPU power limits, not whole‑system wall power). (intel.com)


4.3 Server‑class GPU model (NVIDIA H100 SXM‑class)

Representative parameters (H100 SXM):

  • FP32 peak: 67 TFLOPS
  • Max TDP: up to 700 W (configurable)
    (nvidia.com)

Architecture detail used:

Estimating 32‑bit integer throughput from NVIDIA’s instruction‑throughput table: The instruction‑throughput table defines throughput in operations per clock cycle per multiprocessor; it also states that for warp size 32, one instruction corresponds to 32 operations (i.e., $N$ ops/clock implies $N/32$ warp‑instructions/clock). (docs.nvidia.com)

For 32‑bit integer multiply / multiply‑add (mad.lo.s32), Table 5 gives a throughput of 64 results per clock per multiprocessor (shown as “64” with a footnote). (docs.nvidia.com)

Step A: infer an effective clock from FP32 peak (approximation): Assuming the FP32 TFLOPS spec counts FMA as 2 FLOPs:

$$ f \approx \frac{67\times 10^{12}}{(16896)\cdot 2} \approx 1.98\times 10^9\ \text{Hz} $$

Step B: estimate 32‑bit integer “ops/s”: $R_{32} \approx 64\ \frac{\text{ops}}{\text{cycle·SM}}\cdot 132\ \text{SM}\cdot 1.98\times 10^9\ \frac{\text{cycles}}{\text{s}} \approx 1.675\times 10^{13}\ \text{ops/s}$

Step C: map 32‑bit ops to int64 multiplications: A 64‑bit integer multiplication typically expands to multiple narrower operations (sequence depends on compiler/codegen). Model: $R_{64} \approx \frac{R_{32}}{k}$

Report a sensitivity band:

  • optimistic: $k=8$
  • baseline: $k=20$
  • conservative: $k=32$

For the baseline $k=20$: $R_{64} \approx 8.38\times 10^{11}\ \text{int64 mul/s}$


5) Method (common to all devices)

Given energy budget $E_J$ and average device power $P$:

  • Runtime to consume $0.10: $t = \frac{E_J}{P}$
  • Total int64 multiplies: $N = R\cdot t$
  • Energy efficiency: $\eta = \frac{R}{P}\quad[\text{int64 mul/J}]$ and $N = \eta\cdot E_J$.

6) Results

6.1 Energy for $0.10 at the assumed minimum U.S. residential rate

  • 0.838 kWh
  • 3.02 MJ

6.2 Compute and time per $0.10

Device class Representative model Assumed power (W) Est. int64 mul/s Time to spend $0.10 Est. int64 mul for $0.10 Est. int64 mul/J
Smartphone CPU A17 Pro‑class 3 1.6e10 279 h (11.6 d) 1.61e16 5.33e9
Smartphone CPU A17 Pro‑class 6 1.6e10 140 h (5.82 d) 8.05e15 2.67e9
Laptop CPU i7‑1360P‑class 28 4.96e10 29.9 h (1.25 d) 5.35e15 1.77e9
Laptop CPU i7‑1360P‑class 64 4.96e10 13.1 h 2.34e15 7.75e8
Server GPU H100 SXM‑class (baseline $k=20$) 700 8.38e11 1.20 h (71.8 min) 3.61e15 1.20e9

GPU sensitivity to the int64 expansion factor $k$ (700 W):

  • $k=8$: $N \approx 9.03\times 10^{15}$ int64 multiplies per $0.10
  • $k=20$: $N \approx 3.61\times 10^{15}$ int64 multiplies per $0.10
  • $k=32$: $N \approx 2.26\times 10^{15}$ int64 multiplies per $0.10

7) Interpretation

For a fixed energy budget ($0.10 → ~3.02 MJ at the assumed price), total work is:

$$ N = E_J \cdot \left(\frac{R}{P}\right) $$

So “multiplies per $0.10” depends on energy efficiency $(R/P)$, not wattage alone:

  • Lower power increases the time to spend the same dollar amount.
  • Higher $R/P$ increases the amount of work per dollar.

The smartphone runs for days to spend $0.10, while the GPU spends $0.10 in ~1.2 hours (under the chosen assumptions).


8) Major limitations (important)

  1. Electricity price definition: This uses EIA state‑average residential “average price” for one month; the cheapest achievable retail rate can differ by utility, tariff, plan, time‑of‑use, etc. (eia.gov)
  2. Power draw basis mismatch: CPU “W” here is not consistently wall‑power; system overheads and PSU losses are ignored. GPU TDP is used as a proxy for sustained board power. (nvidia.com)
  3. Clock sustainability: Real devices may not sustain peak clocks under long continuous load; DVFS/thermal throttling will reduce sustained $R$.
  4. GPU int64 model uncertainty: The instruction‑throughput table provides theoretical maxima and notes that achieving them can require specific sequences/care; mapping int64 to narrower ops is compiler‑ and kernel‑dependent. (docs.nvidia.com)

References

  1. U.S. Energy Information Administration. Electric Power Monthly, Table 5.6.A “Average Price of Electricity to Ultimate Customers by End‑Use Sector, by State, November 2025 and 2024 (Cents per Kilowatthour)”; data for Nov 2025; release date Jan 26, 2026. (eia.gov)
  2. Apple. “iPad mini (A17 Pro) — Tech Specs” (CPU core configuration). (support.apple.com)
  3. NotebookCheck.net. “Apple A17 Pro Processor — Benchmarks and Specs” (clock assumptions used in the model). (notebookcheck.net)
  4. Intel. “Intel Core i7‑1360P Processor — Specifications” (core counts, turbo frequencies, power limits). (intel.com)
  5. NVIDIA. “H100 GPU” product specifications (FP32 peak; max TDP up to 700 W). (nvidia.com)
  6. NVIDIA Technical Blog. “NVIDIA Hopper Architecture In‑Depth” (H100 SXM SM count; FP32 cores/SM). (developer.nvidia.com)
  7. CUDA C++ Best Practices Guide. Section 12.1 / Table 5 (definition of ops/clock per multiprocessor; mad.lo.s32 throughput row). (docs.nvidia.com)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions