Skip to content

Add guard against long topics in (un)subscribe #164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wnelis opened this issue Jan 16, 2025 · 18 comments
Closed

Add guard against long topics in (un)subscribe #164

wnelis opened this issue Jan 16, 2025 · 18 comments

Comments

@wnelis
Copy link

wnelis commented Jan 16, 2025

In file mqtt_as/__init__.py, in both methods subscribe and unsubscribe, the field remaining length is encoded with a single octet. Lines 536 and 560 both read:

    struct.pack_into("!BH", pkt, 1, sz, pid)

The (implicit) assumption is that the payload, a topic in both cases, is short enough to have the message fit in at most 127 octets. A check to see if this assumption is correct, might be needed.

Personally, I would prefer a check like:

    assert sz < 128, "Topic too long"
    struct.pack_into("!BH", pkt, 1, sz, pid)

Once the application using this module is tested and it is known that topics in use are not too long, the assert-statements can be effectively removed by increasing the optimization level of microPython, thus decreasing the size of the byte-code.

@bobveringa
Copy link
Contributor

The assert statement would at least prevent users from getting confused when using longer topic lengths.

But looking at the spec, and other elements of the code, I would argue for just supporting any length for (un)subscribe. The logic is simple, and (un)subscribes happen rarely enough that it shouldn't really impact performance. The downside is that it cannot be optimized away with MicroPython optimization levels. In my opinion, this small tradeoff is worth it for being able to use any length topic.

@wnelis
Copy link
Author

wnelis commented Jan 17, 2025

As a side note: if methods (un)subscribe are to support any length topics, encoding (and sending?) field remaining length will probably be implemented in a separate method. If so, this new method can also be used in method connect. In that case a cosmetic change in the latter method could be implemented, by effectively changing (at line 309)

    premsg = bytearray(b"\x10\0\0\0\0\0")
    msg = bytearray(b"\x04MQTT\x00\0\0\0")

into

    premsg = bytearray(b"\x10\0\0\0\0")
    msg = bytearray(b"\0\x04MQTT\x00\0\0\0")

(which requires an update of all indices in byte-array msg). The new method would replace code snippet (at line 336)

        i = 1
        while sz > 0x7F:
            premsg[i] = (sz & 0x7F) | 0x80
            sz >>= 7
            i += 1
        premsg[i] = sz
        await self._as_write(premsg, i + 2)

in which the last line looks like a coding error, but is not with the current definitions of the aforementioned byte-arrays.

@bobveringa
Copy link
Contributor

Seems like you have a good handle on the situation. I wish I could give more input at this time, but I am leaving for a holiday soon and do not have time at the moment. I can investigate further (or implement the feature) in about 2 weeks.

@peterhinch
Copy link
Owner

peterhinch commented Jan 18, 2025

[EDIT]
I have now spent some time with the specs for V3.1.1 and V5 and I think we have a few problems with [un]subscribe.

The Remaining Length byte sz

In V3.1.1 this is a single byte value <= 127 (limiting allowable topic length).
In V5 this is 1-4 bytes encoded as a Variable Byte Integer (spec 1.5.5) as per @wnelis. We need to address this.

UNSUBACK

This is currently not handled; unsubscribe causes the connection to drop after a period of waiting.

Suggested Plan

I will fix the V3.1.1 bugs, namely

  • Unsubscribe error as per original post.
  • Guard on topic length.
  • Handle UNSUBACK for V3.1.1 (will require adapting for V5).
  • Add code comments re possible V5 issues (labelled TODO).

I will also refactor to remove code duplicated in the [un]subscribe methods.

@bobveringa

By the time you return I will have pushed an update and reported back. Please could you check my observations on the V5 spec and when you get time submit a PR?

@wnelis
Copy link
Author

wnelis commented Jan 18, 2025

The Remaining Length byte sz

In V3.1.1 this is a single byte value <= 127 (limiting allowable topic length).

The specification I've used is taken from https://mqtt.org/mqtt-specification/. I read in the specification of v3.1.1 that field remaining length is always encoded in a variable length, see section 2.2. In the description of the subscribe control message, section 3.8.1, there is no mention of encoding the length in a single octet. The same is true for the unsubscribe, section 3.10.1. Moreover, both allow for multiple topics to be included in the payload, increasing the probability that the payload will be longer than 127 octets. Thus there seems to be no difference between v3.1.1 en v5 with respect to encoding the length of the payload.

@peterhinch
Copy link
Owner

The V5 spec 3.8.1 is specific:

This is the length of Variable Header plus the length of the Payload, encoded as a Variable Byte Integer.

However I've not spotted anything to that effect in the V3.1.1. It would simplify the code if both versions use VBI and remove the need for a guard. Please can you point me at a specific statement to that effect in the V3.1.1 spec?

@bobveringa
Copy link
Contributor

bobveringa commented Jan 19, 2025

I only have access to my phone at the moment (and MQTT docs are not ideal to read on a phone)

But I think section 2.2.3 in the v3.1.1 spec is clear on the matter.

The Remaining Length is the number of bytes remaining within the current packet, including data in the variable header and the payload. The Remaining Length does not include the bytes used to encode the Remaining Length.

I think that it is fair to assume this applies to all packets. The only reason to doubt this is that a lot of packets say byte 2... - remaining length but for subscribe it just says byte 2. But given section 2.2.3 I think this is just a visualization error.

EDIT: I just looked at the source code of paho.mqtt (python) and it seems like they use a variable length header for (un)subscribe on both v5 and v3.1.1.

@wnelis
Copy link
Author

wnelis commented Jan 20, 2025

Using paho mqtt client, version 2.1.0, selecting MQTT version 3.1.1, a subscription to a topic with a name of 131 (ASCII) characters was performed. The subscription was successful. Using tcpdump and wireshark the network packets are captured and analyzed. Wireshark showed a remaining length of 136 octets, encoded in two octets as 0x88 0x01, in the subscribe control message.

@peterhinch
Copy link
Owner

I now have a build which supports long topics, both for publication and subscription. This works under V3.1.1 and V5.

A historical note.

mqtt_as was conceived as an adaptation of the official library on the assumption that the implementation of the V3.1.1 protocol was correct. This bug originated here. It's presumably never been spotted because topics in microcontroller applications tend to be short.

@peterhinch
Copy link
Owner

I have raised this issue against the official library.

@peterhinch
Copy link
Owner

peterhinch commented Jan 31, 2025

I have pushed a branch protocol_tests. Changes to the code are nominally complete, but the V5 test scripts are a work in progress.

Changes in V0.8.3

Scope

  1. Fix bug where Variable Byte Integers not correctly represented.
  2. Fix bug where unsubscribe failed.
  3. Add protocol test scripts for V3.1.1 and V5.
  4. Refactor as detailed below. Objectives: reduce code duplication, use new language features in the hope of improving readablity.

Files changed

mqtt_v5_properties.py

No code changes.
Add copyright and licence comments.
Replace u libraries with normal name.
Code format changes: my text editor runs black by default which applied PEP8 formatting.

__init__.py

Bugfixes

Provide vbi function to encode an integer as a VBI (variable byte integer).
Fix .unsubscribe for V5 compatibility.
Fix .wait_msg to handle UNSUBACK packets, combining SUBACK and UNSUBACK code.

Refactoring

Replace u libraries with normal name.
Combine subscribe and unsubscribe methods to reduce duplicated code.
Replace string modulus operator with f-strings, remove .dprint method.
On protocol errors replace exceptions with assert statements. Rationale: these
errors never occur in the wild due to TCP/IP integrity. Optimisation can remove
asserts, saving bytes.
Create .kill_pid method to replace duplicated code.

Comments

The VBI functions are recursive. This simplifies the code. Further, in normal cases of reasonably small strings, recursion will not actually occur.

Test scripts

The library was developed as a fix for the official MQTT library, whose handling of the MQTT protocol was assumed correct. Testing focussed on proving resilience. The VBI bug and the failure of unsubscribe, coupled with the protocol changes involved with V5, indicated that we need additional tests to prove the protocol handling. I have made an attempt at this, in the tests directory.

The approach is to use two targets, one running target.py and the other running test.py. The role of target.py is to respond to challenges sourced by test.py. Note that it is valid to challenge the V5 target with V5 or V3 tests. The converse - sending V5 messages to a V3 target - is invalid.

Comments on protocol tests

I would be grateful for comments. An alternative methodology would be to use Paho MQTT under CPython to challenge the MP target. My approach was based purely on my own familiarity with MP/mqtt_as.

[EDIT]
Tests pass in their current form, including challenging a V5 target with the V3 test. If tests of properties are to be expanded there is an issue over the use of JSON to encode properties because of its treatment of integers. The code fixes the property dict key, but not the value, meaning that some properties will not work. Options are to use Pickle or MessagePack for serialisation.

Comments are welcome on the scope of protocol testing.

The new tests do not attempt to be resilient: they assume a good WiFi connection.

@peterhinch
Copy link
Owner

@bobveringa Have you had a chance to look at this?

@bobveringa
Copy link
Contributor

@peterhinch I have not yet had a chance to look at this yet. I've added an item to look at this next week.

@bobveringa
Copy link
Contributor

It, took me a bit more time to get this set up than desired. Partially because self.dprint() has been removed, and we override this method in our subclass to allow use of the logger module.

If possible, I'd like to keep dprint.

Other than that, I seem to have no issues running the new version and performance is either better, or not noticeably different.

@peterhinch
Copy link
Owner

Thanks for testing.

Re dprint I'll revert the self.DEBUG and ... statements. Are you happy with the assert statements? I believe the three instances of assert cover errors that can never actually occur in the wild. However I'll revert those too if you wish.

I'd be interested to hear your views on testing. The kind of scenario where these tests would be run is where a bug affecting the protocol was identified. With suitable tests I could fix the bug and run the tests. My thought is that beyond notifying you, there would be no need for you to be involved. Examples of protocol bugs are the one in this thread's OP, and the fact that unsubscribe didn't actually work.

The scope of testing could be enhanced to test all possible properties (using Pickle in place of JSON), but I'm not sure this is necessary. Again your thoughts would be welcome.

@bobveringa
Copy link
Contributor

The use of assert statements seems valid to me. And as you said, they shouldn't trigger in the wild.

I've been giving testing some thought as part of our own testing strategy. Ideally, there is some form of automated unit testing/integration testing for these protocols (maybe through GitHub actions). This is more difficult than for most other projects as mqtt_as uses both a lot of MicroPython modules and has some hardware-specific behavior (but that is mostly for Wi-Fi).

Once I have time again (currently in the process of moving, so I have even less time than normal). I would like to take a look at introducing this kind of testing with GitHub actions. Once some initial tests are in place, it will be easier to expand with more and more complicated tests.

@peterhinch
Copy link
Owner

That sounds good, but GitHub actions are outside of my experience.

I'll revert the dprint statements and merge in the next week.

Good luck with the move!

@peterhinch
Copy link
Owner

I have now pushed an update. Closing this as the issue in the OP is now resolved (by correcting the code rather than by guarding).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants