Skip to content

Add data and start_offset methods for StringView #2090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 5, 2025

Conversation

MINGtoMING
Copy link
Contributor

The StringView currently lacks methods to obtain raw data and starting offset, which hinders seamless switching between StringView and String.

Here is an example:

  • Original:
///|
pub type Oid

///|
extern "C" fn oid_from_string_ffi(
  str : String,
  oid_buf : FixedArray[Oid?],
  err_buf : FixedArray[GitErrorMeta?]
) -> Unit = "oid_from_string_ffi"

///|
pub fn Oid::from_string(str : String) -> Oid!GitError {
  let oid_buf : FixedArray[Oid?] = [None]
  let err_buf : FixedArray[GitErrorMeta?] = [None]
  oid_from_string_ffi(str, oid_buf, err_buf)
  match (oid_buf, err_buf) {
    ([Some(oid)], [None]) => oid
    ([None], [Some(err)]) => raise GitError(err)
    _ => abort("unreachable")
  }
}
  • After adding:
///|
pub type Oid

///|
extern "C" fn oid_from_string_ffi(
  str : String,
  start_offset : Int,
  length : Int,
  oid_buf : FixedArray[Oid?],
  err_buf : FixedArray[GitErrorMeta?]
) -> Unit = "oid_from_string_ffi"

///|
pub fn Oid::from_string(str : @string.StringView) -> Oid!GitError {
  let oid_buf : FixedArray[Oid?] = [None]
  let err_buf : FixedArray[GitErrorMeta?] = [None]
  oid_from_string_ffi(
    str.data(),
    str.start_offset(),
    str.length(),
    oid_buf,
    err_buf,
  )
  match (oid_buf, err_buf) {
    ([Some(oid)], [None]) => oid
    ([None], [Some(err)]) => raise GitError(err)
    _ => abort("unreachable")
  }
}

Copy link

peter-jerry-ye-code-review bot commented May 13, 2025

Documentation for StringView::data() could be more specific about lifetime implications

Category
Maintainability
Code Snippet
/// Returns the original string that is being viewed.
pub fn data(self : View) -> String {
self.str
}
Recommendation
Add more detailed documentation explaining ownership and lifetime implications:

/// Returns the original string that is being viewed.
/// Note: This returns the entire underlying string, not just the viewed portion.
/// The returned string remains valid as long as the original string exists.
pub fn data(self : View) -> String {
  self.str
}```
**Reasoning**
For APIs dealing with string views and raw data access, it's important to be explicit about lifetime and ownership semantics to prevent misuse

</details>
<details>

<summary> The test case for start_offset() uses a string literal in inspect! that could be misleading </summary>

**Category**
Correctness
**Code Snippet**
test "stringview start_offset" {
  let str = "Hello🤣🤣🤣"
  let view = str.view(start_offset=1, end_offset=7)
  inspect!(view.start_offset(), content="1")
}
**Recommendation**
Use a numeric literal instead of string:
```moonbit
  inspect!(view.start_offset(), content=1)

Reasoning
Using a string literal "1" for comparing an integer value could be confusing and might hide type-related issues. Using the actual numeric type makes the expected type more clear

The test case for data() could be more comprehensive

Category
Maintainability
Code Snippet
test "stringview data" {
let str = "Hello🤣🤣🤣"
let view = str.view(start_offset=1, end_offset=7)
inspect!(view.data(), content="Hello🤣🤣🤣")
}
Recommendation
Add more test cases to verify behavior:

test "stringview data" {
  let str = "Hello🤣🤣🤣"
  let view = str.view(start_offset=1, end_offset=7)
  inspect!(view.data(), content="Hello🤣🤣🤣")
  // Verify that it returns the full string
  assert(view.data().length() > view.length())
  // Verify identity
  assert(view.data() === str)
}```
**Reasoning**
The current test only verifies basic functionality. Additional assertions would help ensure the method behaves correctly with respect to string identity and full string access

</details>

@coveralls
Copy link
Collaborator

coveralls commented May 13, 2025

Pull Request Test Coverage Report for Build 7073

Details

  • 2 of 2 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.002%) to 93.254%

Totals Coverage Status
Change from base Build 7072: 0.002%
Covered Lines: 8902
Relevant Lines: 9546

💛 - Coveralls

@Yu-zh
Copy link
Collaborator

Yu-zh commented May 14, 2025

Is there a reason that you are using String and @string.View for C FFI instead of Bytes and @bytes.View? We have provided APIs for accessing the data and start_offset in @bytes.View. See #1822 for more background information.

@MINGtoMING
Copy link
Contributor Author

MINGtoMING commented May 14, 2025

The reason for using @string.View instead of @bytes.View here is as follows:

  1. Semantic consistency: The purpose of Oid::from_string is to create an object id from a hexadecimal git id string (or prefix slice). Using @bytes.View would essentially treat the parameter as a CString, which would require additional clarification to avoid ambiguity.
  2. Ergonomics: The caller can use it directly without worrying about whether the encoding of the passed @bytes.View is correct. For example: let oid = Oid::from_string!("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
  3. Interface conflict: There is another function, Oid::from_raw, which directly copies and creates a git oid from the passed Bytes.

@peter-jerry-ye
Copy link
Collaborator

Although I don't think it's a bad idea to add these functions, I don't think your reasons hold.

You can pass the Bytes at the FFI boundary, i.e. oid_from_string_ffi, and perform the encoding / decoding on the MoonBit side. And as far as I'm aware of, the git_oid_from_string does not accept UTF16 encoded strings.

@MINGtoMING
Copy link
Contributor Author

Yes, the C prototype of oid_from_string_ffi accepts a hex string of length 40. Perhaps directly accepting a Bytes with \x00 at the end would be more convenient, but on the Moonbit interface side, it feels somewhat inelegant and ambiguous. My previous solution was to place the encoding conversion in the C code to utilize the system-provided encoding conversion functions(iconv / win).

@peter-jerry-ye
Copy link
Collaborator

Solely for your case, I think you don't even need the encoder or iconv / win since you should be accepting only asciis. A simple for loop should be enough for you.

@MINGtoMING
Copy link
Contributor Author

Right, in this func, it can be handled better as you suggested.

@MINGtoMING
Copy link
Contributor Author

However, in other cases, when the amount of data contained in StringView is large, converting it to Bytes(or other) before interaction may incur significant overhead.

@peter-jerry-ye
Copy link
Collaborator

I think it's fine to expose them. @Yu-zh

@Yu-zh
Copy link
Collaborator

Yu-zh commented Jun 1, 2025

I think it's fine to expose them. @Yu-zh

Then it looks good to me

@peter-jerry-ye peter-jerry-ye enabled auto-merge (rebase) June 5, 2025 07:18
@peter-jerry-ye peter-jerry-ye merged commit d2c2066 into moonbitlang:main Jun 5, 2025
34 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants