Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorthand IPv4Network and IPv6Network #128810

Closed
jfuruness opened this issue Jan 14, 2025 · 24 comments
Closed

Shorthand IPv4Network and IPv6Network #128810

jfuruness opened this issue Jan 14, 2025 · 24 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@jfuruness
Copy link

jfuruness commented Jan 14, 2025

Feature or enhancement

Proposal:

IPv4Networks and IPv6Networks are often written in shorthand by network operators. For example, 1.2.0.0/16 would be written as 1.2/16. I've added a very simple .shorthand function (along with a few simple tests) to these networks in the hopes that this can be a supported feature.

For the IPv4Network:

    @property
    def shorthand(self):
        """
        Returns the shorthand representation of the IPv4 network.

        This method abbreviates the IPv4 network by removing trailing
        zero octets from the network address.

        Returns:
            str: The shorthand IPv4 network in the format 'X.X/X'.

        Example:
            >>> network = IPv4Network('192.168.0.0/24')
            >>> network.shorthand
            '192.168/24'
        """
        # Split the network address into octets
        octets = str(self.network_address).split('.')
        # Remove trailing zero octets
        while octets and octets[-1] == '0':
            octets.pop()
        # Rejoin the remaining octets and append the prefix length
        return '.'.join(octets) + f"/{self.prefixlen}"

For the IPv6Network:

    @property
    def shorthand(self):
        """
        Returns the shorthand representation of the IPv6 network.

        This method compresses the IPv6 address to its shortest form
        and appends the prefix length.

        Returns:
            str: The shorthand IPv6 network in the format 'X::/Y'.

        Example:
            >>> network = IPv6Network('2001:db8:0:0:0:0:0:0/32')
            >>> network.shorthand
            '2001:db8::/32'
        """
        return f"{self.network_address.compressed}/{self.prefixlen}"

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@jfuruness
Copy link
Author

I'm a first time contributor so if I've missed a step please let me know! Thank you for your time.

@rruuaanng
Copy link
Contributor

            >>> network = IPv4Network('192.168.0.0/24')
            >>> network.shorthand
            '192.168/24'

The case you mentioned should be abbreviated as 192.168.0/24 because the mask is 24. Is it my misunderstanding of the abbreviation rule?

Typically, such behavior in network systems should follow the rules outlined below, if I remember correctly. In other words, the non-zero byte portions are retained based on the subnet mask.

192.0.0.0/8 => 192/8
192.1.0.0/16 => 192.1/16
192.168.1.0/24 => 192.168.1/24
192.168.0.0/24 => 192.168.0/24

@jfuruness
Copy link
Author

jfuruness commented Jan 14, 2025 via email

@picnixz picnixz added the stdlib Python modules in the Lib dir label Jan 14, 2025
@picnixz
Copy link
Member

picnixz commented Jan 14, 2025

Are "shorthand" IPv{4,6}Network standardized or is it just something that network operators came up with?

@ZeroIntensity
Copy link
Member

It's standard for IPv4, but I'm not sure about IPv6. The relevant RFC is RFC 4632. Though, I'm not really sure it's right to call it "shorthand"--it's really a notation for subnet masks, not necessarily a short version of the address. (Also, if we're going to support generating the notation from an IPv4Address object we should support parsing it into one as well.)

@jfuruness
Copy link
Author

jfuruness commented Jan 14, 2025 via email

@rruuaanng
Copy link
Contributor

To be honest, I like the term "shorthand", It perfectly captures this behavior, this notation is called CIDR, which can serve as an alternative name. Naming is a significant challenge in the field of computer science!

@picnixz
Copy link
Member

picnixz commented Jan 14, 2025

It's not the term "shorthand" that I'm worried about. What I'm worried about is whether this is a standardized notation and feature. If this is not, then we shouldn't include it in the standard library. Now, for IPv4 it seems that it is the case. If it's not the case for IPv6, I would prefer to amend the current PR.

@picnixz
Copy link
Member

picnixz commented Jan 14, 2025

Note that RFC 4632 is currently in "Best Current Practice" and not in the "Standard Tracks". Since BCP tracks may change in the future and evolve, it's not necessarily something that we want in the standard library. Is there a strong motivation to support shorthand notations (or a use case that cannot be achieved with standard notations)? Maybe we can instead add a comparision function that checks if two representations are equal instead. That way we could have more flexibility in the future.

@ZeroIntensity
Copy link
Member

Bikeshedding: cidr_shorthand is probably a better name, but I'd want good docs + docstring on that. (I'm all too familiar with having no idea what an acronym in the stdlib means.)

Google is saying that IPv6 supports CIDR notation as well, but yeah, being a "best practice" rather than a standard isn't convincing.

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Jan 14, 2025

A thought I had: instead of explicitly adding cases for this, how about we add some sort of converter protocol for IP addresses? So something like:

class IPConverter:
    def parse_to_ipv4(self, text: str) -> IPv4Address:
        ...

    def convert_from_ipv4(self, address: IPv4Address) -> str:
        ...

    def parse_to_ipv6(self, text: str) -> IPv4Address:
        ...

    def convert_from_ipv6(self, address: IPv6Address) -> str:
        ...

ipaddress.add_converter(IPConverter())

Then, for this case, we could expose an optional CIDRConverter class that users could decide whether or not to use.

@picnixz
Copy link
Member

picnixz commented Jan 14, 2025

Then, for this case, we could expose an optional CIDRConverter class that users could decide whether or not to use.

How useful would it be if we only have one converter? I would prefer standalone functions rather than a namespace like this. OTOH, when I fixed ipaddress reverse pointers, I wanted to refactor the API as I thought some things could be improved here and there so maybe it could be a first step to that refactoring. That being said, this is just an idea and I would prefer a PoC to see first how it would be used and how it could improve the readability of the module in general.

@ZeroIntensity
Copy link
Member

How useful would it be if we only have one converter?

No idea, I'm thinking out loud. We could define a converter for every notation, and just keep the ones that currently exist enabled by default.

@jfuruness
Copy link
Author

jfuruness commented Jan 15, 2025

Thank you everyone for your time and comments, they are very insightful. My proposal would be to change this feature and amend the PR as follows:

I will fix the issue identified by rruuaanng (great catch!), to bring the proposal in line with RFC 4632, which was identified by ZeroIntensity.

ZeroIntensity also brings up a good point, that parsing from this shorthand format defined by RFC 4632 should be supported as well. I will do my best to add that.

As far as the naming concerns go, I like ZeroIntensity’s suggestion of cidr_shorthand. I do wonder if the cidr_ part is not a bit redundant though, since this can only be called on a CIDR (IPv4Network or IPv6Network), or am I wrong?

Picnixz and ZeroIntensity both point out that IPv6 “shorthand” is not standardized. I agree with them that the IPv6 version should be removed in that case. However, my thought is that we still have a property called cidr_shorthand on the IPv6Network, and just return the string of the class. The reason being that without this, every single call to cidr_shorthand for a generic ip_network would need to be wrapped in some type of try except in case the ip_network happened to be an IPv6Network, which I don’t think makes much sense.

Picnixz and ZeroIntensity have also expressed concern that RFC 4632 is the “best current practice” and not in the “Standard Tracks”. Thank you for this feedback. While the RFC is not in the standard tracks, in my experience, the full CIDR notation is very rarely written out when people are familiar with CIDR notation. The shorthand is extremely common in my opinion. From a practical perspective, writing out 1.2/16 is much faster, shorter, and more readable than 1.2.0.0/16 (for people that are familiar with CIDRs), which is why the shorthand is the default notation used by many. This is especially true when drawing diagrams of prefixes sent across autonomous systems, where a long prefix makes the diagram very hard to read. Given how common it is I figured it made sense for this to be supported.

ZeroIntensity also brought up an interesting idea of a converter protocol for IP addresses. I agree with picnixz that I prefer a standalone function in this case. I think that offers the easiest use from a user perspective. Just my opinion though :)

I’ll do my best to fix the PR although I need another day or two. Thanks again for all the feedback and happy to continue to iterate on this based on further discussion.

@jfuruness
Copy link
Author

jfuruness commented Jan 15, 2025

Upon further research and consulting some BGP researchers (Thanks Dr. Amir Herzberg!) it appears that the .with_prefixlen method returns the compressed version of the IPv6 prefix, in line with RFC 5952. This is essentially the cidr_shorthand notation used for IPv6. So I think it likely makes sense to return self.with_prefixlen for the .cidr_shorthand method in the IPv6Network.

@jfuruness
Copy link
Author

Considering that I had no idea that the CIDR shorthand was already supported by IPv6, perhaps it makes sense to add this to the documentation with a link to the RFC. I'm also quite confused in regards to "with_prefixlen" now. It's not really documented either. In the IPv6Network class, it returns the shortest form of the prefix, in CIDR notation. But in the IPv4Network class, it returns the exact prefix, in CIDR notation - a different behavior than in IPv6. Perhaps the "with_prefixlen of IPv4Network should have been returning the compressed version, and was not? Even if that's the case, I still think it makes sense to do this as a separate method (cidr_shorthand) to avoid any backwards compatibility breaking changes.

@jfuruness
Copy link
Author

Looking at the source code for "with_prefixlen", I think this is a weird mismatch in behavior. IPv6 returns the compressed version, while IPv4 does not. Both functions are undocumented in the code, and the main docs only have this to say about IPv4, while the IPv6 versions are undocumented: https://docs.python.org/3/library/ipaddress.html#ipaddress.IPv4Interface.with_prefixlen

Also this specific example has an invalid prefix with host bits set unless I am misunderstanding. That should probably also be fixed. I'll open a separate PR for that when this one is done.

@picnixz
Copy link
Member

picnixz commented Jan 15, 2025

As far as the naming concerns go, I like ZeroIntensity’s suggestion of cidr_shorthand. I do wonder if the cidr_ part is not a bit redundant though, since this can only be called on a CIDR (IPv4Network or IPv6Network), or am I wrong?

It's not redundant because we don't know how it can evolve in the future. It's better to precise. Remember that we are a standard library, not an application, so we try to follow standards as much as possible.

The reason being that without this, every single call to cidr_shorthand for a generic ip_network would need to be wrapped in some type of try except in case the ip_network happened to be an IPv6Network, which I don’t think makes much sense.

EDIT: Forget about my answer, I was wrong.

The shorthand is extremely common in my opinion. From a practical perspective, writing out 1.2/16 is much faster, shorter, and more readable than 1.2.0.0/16 (for people that are familiar with CIDRs), which is why the shorthand is the default notation used by many.

I would like to see numbers to reflect that claim. How many usage of this pattern can you find on GitHub for instance? what about Rust? (AFAICT, https://doc.rust-lang.org/std/net/struct.Ipv4Addr.html does not CIDR shorthand) We need some convincing use cases to include something in the standard library (the reason being is that once a feature is in the standard library, it's more or less frozen and we need to maintain it; we can't change the implementation since it could cause breaking changes so we need to be careful).

Honestly, I've never seen CIDR shorthand. Applications may decide to render a shorthand notation or not, but should the standard library decides to follow BCP and nost ST? I'm not really sure (for instance, we rejected RFC 9562 implementations (UUID v6-8) some years ago because it was not in the standard tracks). Now, we also seem to rely on Proposed Standards and not Standard Tracks, so it should be fine (maybe?)

This is especially true when drawing diagrams of prefixes sent across autonomous systems, where a long prefix makes the diagram very hard to read. Given how common it is I figured it made sense for this to be supported.

I agree with that.

It's not really documented either. In the IPv6Network class, it returns the shortest form of the prefix, in CIDR notation. But in the IPv4Network class, it returns the exact prefix, in CIDR notation - a different behavior than in IPv6

EDIT: Forget about my answer, I was wrong as well. Compressed IPv6 should also return compressed form here (or should be documented as not returning compressed forms...).

Even if that's the case, I still think it makes sense to do this as a separate method (cidr_shorthand) to avoid any backwards compatibility breaking changes

What I see is that:

  • IPv4's compressed attribute is not returning a compressed form and is always returning the with_prefixlen form. Should it be changed? I'm not sure. We can see it as a bug fix or a new feature, but both changes would be breaking. So we can also deprecate the compressed attribute in favor of a shorthand attribute? (though the canonical terminology would be compressed...)

  • IPv6's compressed attribute is the shorthand notation. Compressed IPv6 addressed are defined in https://datatracker.ietf.org/doc/html/rfc4291.html#section-2.2 (we refer to that RFC in our docs).

Let's ask the original code author @ncoghlan

@jfuruness
Copy link
Author

Cidr_shorthand as the name sounds good to me. As far as IPv6Network’s not having that attribute - I don’t see why this would be the case. To me, the IPv6Network already have a function that returns the CIDR shorthand. While it’s undocumented, this is how with_prefixlen functions.

In terms of usage statistics, I haven’t parsed all github repos, but there is an RFC for it (4632). I would say that the most convincing argument is that IPv6Network already does this by default, it’s just strange that IPv4Networks does not, for the same function name (the "with_prefixlen" function).

Maybe I’m not understanding something here but I don’t see why CIDR shorthand would be supported and the default in IPv6 networks but not in IPv4 networks. Definitely if the original author could provide some context that would be great.

I'm going to wait on further modifications to the PR since it sounds like this feature may be rejected.

@picnixz
Copy link
Member

picnixz commented Jan 15, 2025

Maybe I’m not understanding something here but I don’t see why CIDR shorthand would be supported and the default in IPv6 networks but not in IPv4 networks

I think it's mainly because the notion of a compressed IPv4 is not defined in the RFC actually. I can see https://datatracker.ietf.org/doc/html/rfc4632#section-3.1 which explains the prefix notation but I don't see the shorthand notation. For IPv6, it's explicitly defined (this may be the reason why we don't have this for IPv4 actually!)

@picnixz
Copy link
Member

picnixz commented Jan 15, 2025

I've edited my previous answer because I was wrong:

The reason being that without this, every single call to cidr_shorthand for a generic ip_network would need to be wrapped in some type of try except in case the ip_network happened to be an IPv6Network, which I don’t think makes much sense.

Both IPv4 and IPv6 have the notion of CIDR, but IPv4 does not define what a "compressed" form is.

It's not really documented either. In the IPv6Network class, it returns the shortest form of the prefix, in CIDR notation. But in the IPv4Network class, it returns the exact prefix, in CIDR notation - a different behavior than in IPv6

Different behaviour so we sahould document it. We should be able to return non-compressed forms as well I think and compressed forms when they are well-defined.


Now once we decide whether IPv4 shorthand notation can be well-defined or not, then we can move forward. However, I have no answer on this question. For IPv6, omitting parts between : is simple but for IPv4, omitting trailing zeroes does not seem to be standardized explicitly (or maybe I missed the paragraph?)

@jfuruness
Copy link
Author

You're right. I took the RFC at face value, but reading it, while it implies the possibility of an abbreviated notation, it doesn't mention it explicitly. That explains why IPv4 has different behavior.

Your comments have all been very insightful and I'm sorry you've spent your time on a PR that ultimately I think should not go through, I should have done better research before and I definitely understand this process a lot better. I think my plan from here is:

  1. Open a new ticket to add documentation around some of these functions, with links to the RFCs and explanations as to the differences
  2. I'll reach out to some guys at the IETF and see if there's any interest in an RFC for IPv4 shorthand.

Unless anyone is opposed I'm going to close this ticket and the PR

@picnixz picnixz closed this as not planned Won't fix, can't repro, duplicate, stale Jan 15, 2025
@picnixz
Copy link
Member

picnixz commented Jan 15, 2025

You're welcome! Feel free to open a new issue and a PR for the docs. It would be much appreciated!

I'll reach out to some guys at the IETF and see if there's any interest in an RFC for IPv4 shorthand.

If you have such contacts, or if they can provide confirmation whether there is a standardized way to defined IPv4 shorthands, I would be happy to continue the discussion.

@jfuruness
Copy link
Author

jfuruness commented Jan 16, 2025

Thanks a bunch! I'll open a new issue and PR for the docs in the next day or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants