From 17311708ed6b947180ccab24994a0c3e74d3a835 Mon Sep 17 00:00:00 2001 From: Mike Kistler Date: Sun, 28 Jan 2024 14:12:47 -0600 Subject: [PATCH 1/2] Add guidelines on returning string offsets & lengths --- azure/ConsiderationsForServiceDesign.md | 83 ++++++++++++++++++++----- azure/Guidelines.md | 13 +++- 2 files changed, 80 insertions(+), 16 deletions(-) diff --git a/azure/ConsiderationsForServiceDesign.md b/azure/ConsiderationsForServiceDesign.md index 0b37c2f4..c219c1e4 100644 --- a/azure/ConsiderationsForServiceDesign.md +++ b/azure/ConsiderationsForServiceDesign.md @@ -293,12 +293,12 @@ The operation is initiated with a POST operation and the operation path ends in ```text POST /:?api-version=2022-05-01 -Operation-Id: 22 - -{ - "arg1": 123 - "arg2": "abc" -} +Operation-Id: 22 + +{ + "arg1": 123 + "arg2": "abc" +} ``` The response is a `202 Accepted` as described above. @@ -306,7 +306,7 @@ The response is a `202 Accepted` as described above. ```text HTTP/1.1 202 Accepted Operation-Location: https:///22 - + { "id": "22", "status": "NotStarted" @@ -323,7 +323,7 @@ When the operation completes successfully, the result (if there is one) will be ```text HTTP/1.1 200 OK - + { "id": "22", "status": "Succeeded", @@ -344,7 +344,7 @@ PUT /items/FooBar&api-version=2022-05-01 Operation-Id: 22 { - "prop1": 555, + "prop1": 555, "prop2": "something" } ``` @@ -358,13 +358,13 @@ The response may also include an `Operation-Location` header for backward compat If the resource supports ETags, the response may contain an `etag` header and possibly an `etag` property in the resource. ```text -HTTP/1.1 201 Created +HTTP/1.1 201 Created Operation-Id: 22 Operation-Location: https://items/operations/22 etag: "123abc" { - "id": "FooBar", + "id": "FooBar", "etag": "123abc", "prop1": 555, "prop2": "something" @@ -381,7 +381,7 @@ When the additional processing completes, the status monitor will indicate if it ```text HTTP/1.1 200 OK - + { "id": "22", "status": "Succeeded" @@ -412,8 +412,8 @@ POST /:cancel?api-version=2022-05-01 A successful response to a control operation should be a `200 OK` with a representation of the status monitor. ```text -HTTP/1.1 200 OK - +HTTP/1.1 200 OK + { "id": "22", "status": "Canceled" @@ -515,6 +515,61 @@ For example, the client can specify an `If-Match` header with the last ETag valu The service processes the update only if the ETag value in the header matches the ETag of the current resource on the server. By computing and returning ETags for your resources, you enable clients to avoid using a strategy where the "last write always wins." +## Returning String Offsets & Lengths (Substrings) + +Some Azure services return substring offset & length values within a string. For example, the offset & length within a string to a name, email address, or phone #. +When a service response includes a string, the client's programming language deserializes that string into that language's internal string encoding. Below are the possible encodings and examples of languages that use each encoding: + +| Encoding | Example languages | +| -------- | ------- | +| UTF-8 | Go, Rust, Ruby, PHP | +| UTF-16 | JavaScript, Java, C# | +| CodePoint (UTF-32) | Python | + +Because the service doesn't know what language a client is written in and what string encoding that language uses, the service can't return UTF-agnostic offset and length values that the client can use to index within the string. To address this, the service response must include offset & length values for all 3 possible encodings and then the client code must select the encoding it required by its language's internal string encoding. + +For example, if a service response needed to identify offset & length values for "name" and "email" substrings, the JSON response would look like this: + +``` +{ + (... other properties not shown...) + "fullString": "(...some string containing a name and an email address...)", + "name": { + "offset": { + "utf8": 12, + "utf16": 10, +      "codePoint": 4 +    }, +    "length": { +    "uft8": 10, +      "utf16": 8, +      "codePoint": 2 +    } +  }, +  "email": { +  "offset": { +      "utf8": 12, +      "utf16": 10, +      "codePoint": 4 +    }, +    "length": { +      "uft8": 10, +      "utf16": 8, +      "codePoint": 4 +    } +  } +} +``` + +Then, the Go developer, for example, would get the substring containing the name using code like this: + +``` + var response := client.SomeMethodReturningJSONShownAbove(...) + name := response.fullString[ response.name.offset.utf8 : response.name.offset.utf8 + response.name.length.utf8] +``` + +The service must calculate the offset & length for all 3 encodings and return them because clients find it difficult working with Unicode encodings and how to convert from one encoding to another. In other words, we do this to simplify client development and ensure customer success when isolating a substring. + ## Getting Help: The Azure REST API Stewardship Board The Azure REST API Stewardship board is a collection of dedicated architects that are passionate about helping Azure service teams build interfaces that are intuitive, maintainable, consistent, and most importantly, delight our customers. Because APIs affect nearly all downstream decisions, you are encouraged to reach out to the Stewardship board early in the development process. These architects will work with you to apply these guidelines and identify any hidden pitfalls in your design. diff --git a/azure/Guidelines.md b/azure/Guidelines.md index dee45afa..b720d29d 100644 --- a/azure/Guidelines.md +++ b/azure/Guidelines.md @@ -16,6 +16,7 @@ Please ensure that you add an anchor tag to any new guidelines that you add and | Date | Notes | | ----------- | -------------------------------------------------------------- | +| 2024-Jan-17 | Added guidelines on returning string offsets & lengths | | 2023-May-12 | Explain service response for missing/unsupported `api-version` | | 2023-Apr-21 | Update/clarify guidelines on POST method repeatability | | 2023-Apr-07 | Update/clarify guidelines on polymorphism | @@ -438,7 +439,7 @@ This indicates to client libraries and customers that values of the enumeration Polymorphism types in REST APIs refers to the possibility to use the same property of a request or response to have similar but different shapes. This is commonly expressed as a `oneOf` in JsonSchema or OpenAPI. In order to simplify how to determine which specific type a given request or response payload corresponds to, Azure requires the use of an explicit discriminator field. -Note: Polymorphic types can make your service more difficult for nominally typed languages to consume. See the corresponding section in the [Considerations for service design](./ConsiderationsForServiceDesign.md#avoid-surprises) for more information. +Note: Polymorphic types can make your service more difficult for nominally typed languages to consume. See the corresponding section in the [Considerations for service design](./ConsiderationsForServiceDesign.md#avoid-surprises) for more information. :white_check_mark: **DO** define a discriminator field indicating the kind of the resource and include any kind-specific fields in the body. @@ -838,7 +839,7 @@ For example: ### Repeatability of requests Fault tolerant applications require that clients retry requests for which they never got a response, and services must handle these retried requests idempotently. In Azure, all HTTP operations are naturally idempotent except for POST used to create a resource and [POST when used to invoke an action]( -https://github.com/microsoft/api-guidelines/blob/vNext/azure/Guidelines.md#performing-an-action). +https://github.com/microsoft/api-guidelines/blob/vNext/azure/Guidelines.md#performing-an-action). :ballot_box_with_check: **YOU SHOULD** support repeatable requests as defined in [OASIS Repeatable Requests Version 1.0](https://docs.oasis-open.org/odata/repeatable-requests/v1.0/repeatable-requests-v1.0.html) for POST operations to make them retriable. - The tracked time window (difference between the `Repeatability-First-Sent` value and the current time) **MUST** be at least 5 minutes. @@ -1098,6 +1099,14 @@ While it may be tempting to use a revision/version number for the resource as th :white_check_mark: **DO**, when supporting multiple representations (e.g. Content-Encodings) for the same resource, generate different ETag values for the different representations. + +### Returning String Offsets & Lengths (Substrings) + +All string values in JSON are inherently Unicode and UTF-8 encoded, but clients written in a high-level programming language must work with strings in that language's string encoding, which may be UTF-8, UTF-16, or CodePoints (UTF-32). +When a service response includes a string offset or length value, it should specify these values in all 3 encodings to simplify client development and ensure customer success when isolating a substring. + +:white_check_mark: **DO** include all 3 encodings (UTF-8, UTF-16, and CodePoint) for every string offset or length value in a service response. + ### Distributed Tracing & Telemetry Azure SDK client guidelines specify that client libraries must send telemetry data through the `User-Agent` header, `X-MS-UserAgent` header, and Open Telemetry. From 03c0af7286853a1d30f6b8e9801d72e92d5f477c Mon Sep 17 00:00:00 2001 From: Mike Kistler Date: Sun, 4 Feb 2024 06:32:05 -0600 Subject: [PATCH 2/2] Address PR review feedback --- azure/ConsiderationsForServiceDesign.md | 19 ++++++++++--------- azure/Guidelines.md | 8 ++++++++ 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/azure/ConsiderationsForServiceDesign.md b/azure/ConsiderationsForServiceDesign.md index c219c1e4..ae1e4213 100644 --- a/azure/ConsiderationsForServiceDesign.md +++ b/azure/ConsiderationsForServiceDesign.md @@ -6,8 +6,9 @@ | Date | Notes | | ----------- | -------------------------------------------------------------- | +| 2024-Jan-17 | Added guidelines on returning string offsets & lengths | | 2022-Jul-15 | Update guidance on long-running operations | -| 2022-Feb-01 | Updated error guidance | +| 2022-Feb-01 | Updated error guidance | | 2021-Sep-11 | Add long-running operations guidance | | 2021-Aug-06 | Updated Azure REST Guidelines per Azure API Stewardship Board. | @@ -517,7 +518,7 @@ By computing and returning ETags for your resources, you enable clients to avoid ## Returning String Offsets & Lengths (Substrings) -Some Azure services return substring offset & length values within a string. For example, the offset & length within a string to a name, email address, or phone #. +Some Azure services return substring offset & length values within a string. For example, the offset & length within a string to a name, email address, or phone number. When a service response includes a string, the client's programming language deserializes that string into that language's internal string encoding. Below are the possible encodings and examples of languages that use each encoding: | Encoding | Example languages | @@ -526,11 +527,11 @@ When a service response includes a string, the client's programming language des | UTF-16 | JavaScript, Java, C# | | CodePoint (UTF-32) | Python | -Because the service doesn't know what language a client is written in and what string encoding that language uses, the service can't return UTF-agnostic offset and length values that the client can use to index within the string. To address this, the service response must include offset & length values for all 3 possible encodings and then the client code must select the encoding it required by its language's internal string encoding. +Because the service doesn't know in what language a client is written and what string encoding that language uses, the service can't return UTF-agnostic offset and length values that the client can use to index within the string. To address this, the service response must include offset & length values for all 3 possible encodings and then the client code must select the encoding required by its language's internal string encoding. For example, if a service response needed to identify offset & length values for "name" and "email" substrings, the JSON response would look like this: -``` +```json { (... other properties not shown...) "fullString": "(...some string containing a name and an email address...)", @@ -538,24 +539,24 @@ For example, if a service response needed to identify offset & length values for "offset": { "utf8": 12, "utf16": 10, -      "codePoint": 4 +      "codePoint": 4    },    "length": {    "uft8": 10,       "utf16": 8, -      "codePoint": 2 +      "codePoint": 2     }   },   "email": {  "offset": {       "utf8": 12,       "utf16": 10, -      "codePoint": 4 +      "codePoint": 4     },     "length": {       "uft8": 10,       "utf16": 8, -      "codePoint": 4 +      "codePoint": 4     }   } } @@ -563,7 +564,7 @@ For example, if a service response needed to identify offset & length values for Then, the Go developer, for example, would get the substring containing the name using code like this: -``` +```go var response := client.SomeMethodReturningJSONShownAbove(...) name := response.fullString[ response.name.offset.utf8 : response.name.offset.utf8 + response.name.length.utf8] ``` diff --git a/azure/Guidelines.md b/azure/Guidelines.md index b720d29d..09ab4cdf 100644 --- a/azure/Guidelines.md +++ b/azure/Guidelines.md @@ -1107,6 +1107,14 @@ When a service response includes a string offset or length value, it should spec :white_check_mark: **DO** include all 3 encodings (UTF-8, UTF-16, and CodePoint) for every string offset or length value in a service response. +:white_check_mark: **DO** define every string offset or length value in a service response as an object with the following structure: + +| Property | Type | Required | Description | +| ----------- | ------- | :------: | ----------- | +| `utf8` | integer | true | The offset or length of the substring in UTF-8 encoding | +| `utf16` | integer | true | The offset or length of the substring in UTF-16 encoding | +| `codePoint` | integer | true | The offset or length of the substring in CodePoint encoding | + ### Distributed Tracing & Telemetry Azure SDK client guidelines specify that client libraries must send telemetry data through the `User-Agent` header, `X-MS-UserAgent` header, and Open Telemetry.