-
-
Notifications
You must be signed in to change notification settings - Fork 147
Improve SIP message parsers #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Like a bomb :), just we are almost at 1.0.0 and I am now not sure where to go with this. So do I understand you need some exposure here of Parser because you are running some manual parsing? I have to look more, but here some first looks.
Also btw reason for |
|
|
||
| // headerParserTo generates ToHeader | ||
| func headerParserTo(headerName string, headerText string) (header Header, err error) { | ||
| func headerParserTo(headerName []byte, headerText string) (header Header, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could understand this comes from line buffer, but value is mostly not used, so not sure do we need this change anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally it would be great to switch values to byte slices - that should give even better performance. But that would be a very large change. So I only did header names for now.
Answering your question, tbh I'm not sure why it helps here specifically. As you mentioned, the string here is unused and is mostly generated from static strings in the lower case header names switch. So it should not matter if we do bytes here. But it does affect performance in the end.
|
|
||
| colonIdx := strings.Index(headerText, ":") | ||
| // ParseHeader parses a SIP header from the line and appends it to out. | ||
| func (headersParser HeadersParser) ParseHeader(out []Header, line []byte) ([]Header, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was avoiding to expose this, but I guess there is use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite happy with the current API having the ParseHeader which avoids the body, so we could unexport it back. The reason I still exposed it is just in case we'll need something more exotic for parsing, while still keeping compatibility with sipgo data structures. We'll need to call this directly then. Maybe the answer is similar to the one for NextLine - move it to another package like parser. Would that work?
| // It returns io.ErrUnexpectedEOF is there's no CRLF (\r\n) in the data. | ||
| // If there's a CR (\r) which is not followed by LF (\n), a ErrParseLineNoCRLF is returned. | ||
| // As a special case, it returns io.EOF if data is empty. | ||
| func NextLine(data []byte) ([]byte, int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not have this package exposed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, would it be more acceptable if I introduce a new parser package that will expose this method without polluting the sip package?
|
First, thank you for a quick reply!
That's correct. We now have a separate SIP proxy that needs the parser, but not the other layers. And we've hit a few issues with it. This is why we'd appreciate a bit more flexibility for the parser, which this PR attempts.
No rush at all! We are totally fine with running this code from our own branch for now, until you are ready to upstream these changes in one form or another (1.1.0 maybe?).
You mean having two separate parsers? PR still keeps |
This PR refactors and optimises the SIP message parsers.
Short list of changes:
Content-Lengthis now required for streaming mode (RFC 3261 - 7.5).[]byte -> stringconversions for headers, improving performance by 4% and reducing allocations by 10%.Parserexposes new API for parsing the message headers without the body.Benchmarking results
4% faster parsing, up to 10% less allocations.
Changes to Parser
Original parsing method is kept unchanged:
Instead, a new method is introduced:
This method now returns number of bytes used to parse the message. It always stops at the beginning of the line that caused a failure, which helps reuse the same underlying code for streaming mode.
The
streamflag adjust the behaviour of the parser slightly, for example, in streaming mode theContent-Lengthis required and CRLF at the beginning of the message is silently skipped (RFC 3261 - 7.5).This new method now always returns
io.UnexpectedEOFif message was not read completely and more data is required. The oldParseSIPis not affected and still returnsParseEOFerror.Additionally, another new method is added for parsing the message without the body:
This allows using the parser more efficiently in proxies that only act on the headers. It will not require allocating the body separately, instead, the proxy may write the headers followed by the body from the original buffer.
Changes to ParserStream
Existing methods for
ParserStreamare kept unchanged:New method was added for reading messages separately:
After using
Writeto append data to the internal buffer,ParseNextcan be called multiple times to get SIP messages. As opposed toParseSIPStreamandParseSIPStreamEach, the caller can decide when to stop reading the messages.Under the hood, all
ParserStreammethods now reuse the methods for parsing headers and start line from theParser, reducing code duplication and differences in behaviour.Similar to the new
Parsemethod,ParseNextreturns the number of bytes that were used to read the message, orio.UnexpectedEOFin case the message was parsed only partially. Old methods likeParseSIPStreamEachstill returnErrParseSipPartialinstead.Additionally, there are a few new methods to control the stream state:
Bufferallows the caller to examine the underlying parser state after an error.ParseNextwill return the offset into this buffer at the beginning of the failed line.Discardallows the caller to drop N first bytes and reset the parser to recover the remaining stream. This API assumes that the caller has some heuristic to decide how and when to recover.Resetcompletely resets the internal state and the buffer, allowing reuse of theParserStream.Closeis similar, but calling it is now required to reuse the underlying buffer for other parsers (using thePool). Previously, the parser was always discarding the buffer after each message. Now the caller can control it by callingResetorClose.