-
-
Notifications
You must be signed in to change notification settings - Fork 386
Open
Labels
Description
It seems that "hdfs://host:port/path/file.txt" url scheme is not supported by smart open currently. Therefore, with smart_open it is only possible to access the local HDFS filesystem with the hdfs:// protocol. The URI is parsed correctly by the urlsplit, but the network location is always ignored.
The hdfs dfs command supports the host:port in the URI out of the box, so adding this would be very easy. However, then the url scheme "hdfs://tmp/test.txt" would no longer work as the 'tmp' here would be interpreted as the network location, which it really is if you read the URI literally according to specification.
For me it would be expected result that:
- ParseUri('hdfs://host:port/path/file.txt', 'wb') ==> ["hdfs", "dfs", "-text", "hdfs://host:port/path/test.txt"]
- ParseUri ('hdfs:///path/file.txt', 'wb') ==> ["hdfs", "dfs", "-text", "/path/test.txt"]
- ParseUri ('hdfs://host/path/file.txt', 'wb') ==> ["hdfs", "dfs", "-text", "hdfs://host/path/test.txt"]
Let me know what you think.