Skip to content

Conversation

@dennwc
Copy link
Contributor

@dennwc dennwc commented Dec 2, 2025

This PR refactors and optimises the SIP message parsers.

Short list of changes:

  • Parser will now track the number of bytes consumed, removing the need for hacks in streaming mode.
  • Steaming parser now properly skips CRLF between messages (RFC 3261 - 7.5).
  • Content-Length is now required for streaming mode (RFC 3261 - 7.5).
  • Parser postpones the []byte -> string conversions for headers, improving performance by 4% and reducing allocations by 10%.
  • Streaming parser reuses more code from the regular parser.
  • Parser exposes new API for parsing the message headers without the body.
  • Streaming parser exposes API for getting individual messages and how many bytes were read.
  • Streaming parser can now discard data to recover the connection state after malformed messages.

Benchmarking results

4% faster parsing, up to 10% less allocations.

                               │   old.txt   │               new.txt                │
                               │   sec/op    │    sec/op     vs base                │
ParserStream/NoChunks-16         2.500µ ± 3%   2.377µ ±  4%   -4.94% (p=0.004 n=10)
ParserStream/SingleRoutine-16    2.612µ ± 2%   2.518µ ±  1%   -3.62% (p=0.000 n=10)
ParserStream/Paralel-16          13.18µ ± 6%   13.33µ ±  6%        ~ (p=0.912 n=10)
Parser/SingleRoutine-16          2.843µ ± 4%   2.785µ ±  1%   -2.04% (p=0.000 n=10)
Parser/Paralel-16                14.21µ ± 5%   14.10µ ±  5%        ~ (p=0.239 n=10)
ParserNoData/New-16              2.120µ ± 1%   2.014µ ±  3%   -5.00% (p=0.000 n=10)

                               │    old.txt     │                new.txt                 │
                               │      B/op      │     B/op      vs base                  │
ParserStream/NoChunks-16         2.790Ki ± 0%     2.656Ki ± 0%   -4.80% (p=0.000 n=10)
ParserStream/SingleRoutine-16    2.845Ki ± 0%     2.656Ki ± 0%   -6.63% (p=0.000 n=10)
ParserStream/Paralel-16          3.021Ki ± 0%     2.661Ki ± 0%  -11.90% (p=0.000 n=10)
Parser/SingleRoutine-16          3.117Ki ± 0%     2.984Ki ± 0%   -4.26% (p=0.000 n=10)
Parser/Paralel-16                3.123Ki ± 0%     2.989Ki ± 0%   -4.28% (p=0.000 n=10)
ParserNoData/New-16              2.520Ki ± 0%     2.430Ki ± 0%   -3.57% (p=0.000 n=10)

                               │   old.txt    │              new.txt                │
                               │  allocs/op   │ allocs/op   vs base                 │
ParserStream/NoChunks-16         39.00 ± 0%     37.00 ± 0%  -5.13% (p=0.000 n=10)
ParserStream/SingleRoutine-16    41.00 ± 0%     37.00 ± 0%  -9.76% (p=0.000 n=10)
ParserStream/Paralel-16          41.00 ± 0%     37.00 ± 0%  -9.76% (p=0.000 n=10)
Parser/SingleRoutine-16          44.00 ± 0%     43.00 ± 0%  -2.27% (p=0.000 n=10)
Parser/Paralel-16                44.00 ± 0%     43.00 ± 0%  -2.27% (p=0.000 n=10)
ParserNoData/New-16              32.00 ± 0%     30.00 ± 0%  -6.25% (p=0.000 n=10)

Changes to Parser

Original parsing method is kept unchanged:

func (p *Parser) ParseSIP(data []byte) (Message, error)

Instead, a new method is introduced:

func (p *Parser) Parse(data []byte, stream bool) (Message, int, error)

This method now returns number of bytes used to parse the message. It always stops at the beginning of the line that caused a failure, which helps reuse the same underlying code for streaming mode.

The stream flag adjust the behaviour of the parser slightly, for example, in streaming mode the Content-Length is required and CRLF at the beginning of the message is silently skipped (RFC 3261 - 7.5).

This new method now always returns io.UnexpectedEOF if message was not read completely and more data is required. The old ParseSIP is not affected and still returns ParseEOF error.

Additionally, another new method is added for parsing the message without the body:

func (p *Parser) ParseHeaders(data []byte, stream bool) (Message, int, error)

This allows using the parser more efficiently in proxies that only act on the headers. It will not require allocating the body separately, instead, the proxy may write the headers followed by the body from the original buffer.

Changes to ParserStream

Existing methods for ParserStream are kept unchanged:

func (p *ParserStream) ParseSIPStream(data []byte) ([]Message, error)
func (p *ParserStream) ParseSIPStreamEach(data []byte, cb func(msg Message)) error

New method was added for reading messages separately:

func (p *ParserStream) Write(data []byte) (int, error)
func (p *ParserStream) ParseNext() (Message, int, error)

After using Write to append data to the internal buffer, ParseNext can be called multiple times to get SIP messages. As opposed to ParseSIPStream and ParseSIPStreamEach, the caller can decide when to stop reading the messages.

Under the hood, all ParserStream methods now reuse the methods for parsing headers and start line from the Parser, reducing code duplication and differences in behaviour.

Similar to the new Parse method, ParseNext returns the number of bytes that were used to read the message, or io.UnexpectedEOF in case the message was parsed only partially. Old methods like ParseSIPStreamEach still return ErrParseSipPartial instead.

Additionally, there are a few new methods to control the stream state:

func (p *ParserStream) Buffer() *bytes.Buffer
func (p *ParserStream) Discard(n int)
func (p *ParserStream) Reset()
func (p *ParserStream) Close()

Buffer allows the caller to examine the underlying parser state after an error. ParseNext will return the offset into this buffer at the beginning of the failed line.

Discard allows the caller to drop N first bytes and reset the parser to recover the remaining stream. This API assumes that the caller has some heuristic to decide how and when to recover.

Reset completely resets the internal state and the buffer, allowing reuse of the ParserStream.

Close is similar, but calling it is now required to reuse the underlying buffer for other parsers (using the Pool). Previously, the parser was always discarding the buffer after each message. Now the caller can control it by calling Reset or Close.

@emiago
Copy link
Owner

emiago commented Dec 2, 2025

Like a bomb :), just we are almost at 1.0.0 and I am now not sure where to go with this.
I like the performance improvements and all and would like to merge but...

So do I understand you need some exposure here of Parser because you are running some manual parsing?
I would like to see is there need to expose this right now all, and give chance for merging prio 1.0.0

I have to look more, but here some first looks.
This I think could be better as 2 seperate functions. Reason is that caller has to know this upfront, and
error handling will anyway be determined by flag, but it can be underhood left the rest.

func (p *Parser) Parse(data []byte, stream bool) (Message, int, error)

Also btw reason for ParseSIP and not Parse was historifcal as we had interface usage :)


// headerParserTo generates ToHeader
func headerParserTo(headerName string, headerText string) (header Header, err error) {
func headerParserTo(headerName []byte, headerText string) (header Header, err error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could understand this comes from line buffer, but value is mostly not used, so not sure do we need this change anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally it would be great to switch values to byte slices - that should give even better performance. But that would be a very large change. So I only did header names for now.

Answering your question, tbh I'm not sure why it helps here specifically. As you mentioned, the string here is unused and is mostly generated from static strings in the lower case header names switch. So it should not matter if we do bytes here. But it does affect performance in the end.


colonIdx := strings.Index(headerText, ":")
// ParseHeader parses a SIP header from the line and appends it to out.
func (headersParser HeadersParser) ParseHeader(out []Header, line []byte) ([]Header, error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was avoiding to expose this, but I guess there is use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite happy with the current API having the ParseHeader which avoids the body, so we could unexport it back. The reason I still exposed it is just in case we'll need something more exotic for parsing, while still keeping compatibility with sipgo data structures. We'll need to call this directly then. Maybe the answer is similar to the one for NextLine - move it to another package like parser. Would that work?

// It returns io.ErrUnexpectedEOF is there's no CRLF (\r\n) in the data.
// If there's a CR (\r) which is not followed by LF (\n), a ErrParseLineNoCRLF is returned.
// As a special case, it returns io.EOF if data is empty.
func NextLine(data []byte) ([]byte, int, error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not have this package exposed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, would it be more acceptable if I introduce a new parser package that will expose this method without polluting the sip package?

@dennwc
Copy link
Contributor Author

dennwc commented Dec 3, 2025

First, thank you for a quick reply!

So do I understand you need some exposure here of Parser because you are running some manual parsing?

That's correct. We now have a separate SIP proxy that needs the parser, but not the other layers. And we've hit a few issues with it. This is why we'd appreciate a bit more flexibility for the parser, which this PR attempts.

I would like to see is there need to expose this right now all, and give chance for merging prio 1.0.0

No rush at all! We are totally fine with running this code from our own branch for now, until you are ready to upstream these changes in one form or another (1.1.0 maybe?).

This I think could be better as 2 seperate functions.

You mean having two separate parsers? PR still keeps ParseSIP method intact, so it doe use two different function for old vs new signatures and error semantics. Under the hood it's the same code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants