Conversation
Craigacp
left a comment
There was a problem hiding this comment.
I've gone through it with some initial comments. Do you have any of this refactoring implemented? I think it might be useful to look at some examples of the changes it implies in user code.
| accessing a tensor instance. Actually, this information is carried by the [`DataType` enum class](https://github.com/karllessard/tensorflow/blob/master/tensorflow/java/src/main/java/org/tensorflow/DataType.java), | ||
| which converts back and forth a type alias (such as `Integer`) to a TF data type (such as `INT32`). | ||
|
|
||
| Again, this conversion can be sometimes painful and could be avoid by changing the `DataType` enum class to a |
There was a problem hiding this comment.
I'm confused as to how this is better than an enum. The enum can carry all this information already, and is just as accessible from JNI as an anonymous class is if not more. Do you have an example which shows the benefit of this approach over the current one? The placeholder factory could be modified to ask the enum what the ordinal is (in the way it would need to be updated to call DataType.ordinal).
There was a problem hiding this comment.
Ok, maybe I mixed two different concept here.
The main idea was not to get rid of the enum but to get rid of the conversion between a Java type and an enum, using this method, like here.
Now why using a static class instead of an enum? Well, since we already have a specific class per supported tensor type (in this RFC I mean), for me it sounds just more intuitive to store all information related to that type inside this class and the user does not have to deduce the corresponding value in an enum. (e.g. the type info of TInt32is TInt32.DTYPE, instead of the type of of TInt32is DataType.INT32).
There was a problem hiding this comment.
Ok. That approach precludes switching over the types or other things which need to know about all the Tensor types in the C API. Which might be fine, as you point out later it's much more extensible, but I'm not sure if it's worth exposing types in Java that aren't understood by the C API underneath.
There was a problem hiding this comment.
but I'm not sure if it's worth exposing types in Java that aren't understood by the C API underneath.
That is not the plan neither, all tensor class such a TInt32would match a type supported by the C API. Or maybe you mean the "custom" type I was proposing later? Yes, this one need more rethinking for sure.
There was a problem hiding this comment.
Yeah, I mean the custom types proposed at the end of the document.
There was a problem hiding this comment.
However there are things like fp16 and bf16 which aren't easily representable in Java at the moment, but are probably useful.
| return allocate(DTYPE, Shape.make()).set(value); | ||
| } | ||
|
|
||
| public static TInt32 vector(Integer... values) { |
There was a problem hiding this comment.
The boxing here makes this very inefficient, and also it's not possible to pass in an int[]. We'd need a signature that is int... in addition to these methods.
There was a problem hiding this comment.
Can you please give more explanation what makes the boxed varargs not performant, other than the regular required conversion? I'm lacking a bit of knowledge in this field and I think you know a lot more about it.
I'm just trying to avoid supporting primitive arrays in the NdArray API, like I used to, because it complexifies a lot the hierarchy of objects required if we want to keep control on which tensor supports which primitive array (i.e. you can pass a int[] to a TInt32 but not to a TBool).
There was a problem hiding this comment.
The varargs construction allocates an array and writes all the values into it. So for an f(Integer... inputs) called as obj.f(1000,2000,3000,4000) it first creates an Integer object which contains the field storing 1000, then an Integer representing 2000, etc. Then it stores a reference to each of those Integers into the created Integer[4]. This means that the call allocated 5 objects on the heap. For an entry point g(int... inputs) called as obj.g(1000,2000,3000,4000) then it allocates a single int[4] and writes all the values into that array. So that's only a single object allocation. This isn't too important for short arrays like in your example, but it's really important for longer arrays.
The integers used are important too, as the JDK has a cache of the integers in the range [-127, 127] so those integers shouldn't trigger allocation. But values outside those ranges (and Floats, Doubles etc) will allocate a fresh object.
The code to convert an array of primitives into an array of unboxed primitives is a single line IntStream.of(intArr).boxed().toArray(Integer[]::new) and there isn't a FloatStream or BooleanStream to do the equivalent transformation for those types quickly.
I think it's important to be able to pass in a primitive array, as that's what people will be using for efficiency reasons, and if we require the intermediate boxing then it's going to generate a lot of garbage and object allocations.
Either that or we get the NDArray set(T value, int... indices) method to be extremely fast and remove all ways of creating things with arrays or varargs, so the natural way of using it is generating a blank one and then filling it with the feature extractor. If it's backed by a ByteBuffer then it's probably a quick method, but if it's writing directly into the TF Tensor then we pay the JNI cost every single time.
The split between primitives and objects is painful in current Java, as you need to use the primitives to get any kind of speed, but the lack of generics over primitives leads to ugly APIs with edges. Once specialised generics lands we should be able to get around this, but that's still a while away.
There was a problem hiding this comment.
Thanks a lot, that was very helpful. And yes, your last comment says it all:
The split between primitives and objects is painful in current Java, as you need to use the primitives to get any kind of speed, but the lack of generics over primitives leads to ugly APIs with edges.
I guess we can start with an easy clean API principally relying on boxed types, while being careful to avoid allocation overhead like you stated, and then add the primitive arrays if performances are not reached. But like you also said:
This isn't too important for short arrays like in your example, but it's really important for longer arrays.
which is probably most of the cases when an NdArray using one of those method (vector(), set(), ...). For bulk initialize of a large set of data, probably the NdArray.write(DataBuffer)will be a better pick.
If it's backed by a ByteBuffer then it's probably a quick method, but if it's writing directly into the TF Tensor then we pay the JNI cost every single time.
There should not be any difference in performances between writing to a NdArrayor a Tensoras both are backed by ByteBuffer (currently JNI is only implied to return a reference to the ByteBufferof the tensor and then everything can be done in Java space).
There was a problem hiding this comment.
When does the ByteBuffer get handed to the TF C API?
There was a problem hiding this comment.
Actually? When the tensor is allocated by the native layer, its buffer is mapped to a ByteBuffer (see here and returned to the Java runtime.
So it was just per design that a user can not write directly to the tensor memory right now, something we’ll change with the new Tensor NIO interface.
BTW, @EronWright, who wrote that code a few years ago, has join the SIG lately to offer his help on this.
There was a problem hiding this comment.
Ah, cool, I didn't realize that. Does that interfere with any of the automatic memory management it's doing under the hood (e.g. CUDA unified memory, or whatever magic happens with TPUs)?
There was a problem hiding this comment.
My blind guess is that a tensor in GPU memory is always copy to local memory before being mapped to a ByteBuffer, but Eron probably know better about it.
If that's the case, then yes there would be a performance impact caused by the copy so we should make sure to map that memory only when required (which is the case right now... I think).
There was a problem hiding this comment.
I wouldn't be surprised if a direct ByteBuffer could manipulate a GPU's mapped memory. Not an expert on that.
About the whole concept of an NDArray that is backed by a direct byte buffer, we might find inspiration in the V8TypedArray in the Java bindings for V8 (article).
Regarding the copying of buffer data that TF Java performs today, I believe the rationale is that TF assumes that it owns the memory. There's some reference counting there but I haven't looked at it recently. Meanwhile, I am very keen on introducing Netty-style reference counting to TF Java, and intend to write a proposal to that effect.
|
|
||
| It has been reported that in some cases, supporting compile-time type safety can become a nightmare when | ||
| testing a network using different data types. Actually, this can require a lot of search & replace on each | ||
| experiment (e.g. `FLOAT` vs `DOUBLE`). |
There was a problem hiding this comment.
If we expanded the set of input and output methods (i.e. so Tensor had methods that accepted and returned all Java types and did the expected conversions) we could use a generic form of this, store the required enum or DataType implementation in a field and then make all tensor construction either accept that field or infer the type from other tensors. That would make it a single line change to change data types.
There was a problem hiding this comment.
Yeah I'm wondering if the Tensor should accept all Java types and handle the conversion... right now, it does expose all input/output methods but fail at runtime if the type is not compatible the tensor data type, like here.
There might be other issues as well, think of "UInt32"... how do you pass such a value in a 32-bit Integer?? It sounds like we might need to some conversion in the end...
There was a problem hiding this comment.
The problem with accepting all types is when we have value types and can more easily express uint32 it'll be irritating to deal with it. But the type system will be quite different then, and we're unlikely to upgrade to that Java version for a while after release (plus it's not ready yet).
Thinking more about this, maybe there should be a static factory on Tensor that accepts a DataType and an NDArray or other mass type and performs the appropriate conversions. That way the only thing that needs to change is the data type which can be a variable stored in a single place. That's pretty close to what exists in the current version, but allowing it to make type conversions rather than just checking for type equality.
There was a problem hiding this comment.
I still don’t have a solution for supporting those kind of DataTypes... again, if we let the user pick the representation of his choice for initializing the data instead of forcing him to a specific one might help a bit (e.g. a user could decide to use ints, floats or doubles or even BigInteger/Decimal to feed a TFloat tensor, instead of forcing him to use floatss only). We’ll then need a converter to map the provided values to the physical buffer...
That’s something that wasn’t foreseen though and we should work on it if we all think that’s the right approach.
|
|
||
| Note also that an advantage of using `NdArray` is that the type and the shape of the tensor is explicit and can be easily | ||
| retrieved by calling `t.dataType()` and `t.shape()` respectively, while with standard Java arrays those need | ||
| to be inferred and discovered by the TF Java client using costly reflections and navigating through the array. |
There was a problem hiding this comment.
It's not a reflective operation that takes time, it's just the physical traversal of the Java multidimensional array, as they could be stored anywhere on the heap, and a 4d array could be made up of hundreds of thousands of arrays.
There was a problem hiding this comment.
ok I'll change the text to avoid the confusion.
| ``` | ||
| Of course, the standard Java arrays gives a better view on where is located a given row, by embedding it | ||
| through a series of brackets. The actual proposal is focusing on simplicity and performance more that readability. | ||
| *Please, feel free to suggest any better solution.* |
There was a problem hiding this comment.
I'm not too fond of this implicit row wise construction as I got completely lost trying to figure out where I was in the Tensor at any given call, plus it means the TensorBuilder is carrying around implicit state saying where it wants to write to next.
At the expense of readability I think it would be preferable to use something like
TInt32.ofShape(3,2,2)
.set(0,0,new NDArray<>(1,2))
.set(0,1,new NDArray<>(3,4))
.set(1,0,new NDArray<>(5,6))
.set(1,1,new NDArray<>(7,8))
.set(2,0,new NDArray<>(9,10))
.set(2,1,new NDArray<>(11,12));
Replacing new NDArray<>() with whatever the appropriate constructor is. To make the varargs work we might have to permute the order of arguments (i.e. new NDArray<>(),int..). It could also be a TensorBuilder and require a done() method to construct too, I've no strong feelings either way.
There was a problem hiding this comment.
hehe, yes me neither to be honest, I just drop the basic idea here to let others suggest something better :)
The example you gave is pretty much already supported by the NdArray API, except that the coordinates are provided after the value (because its a vararg that can conflict with an int scalar value). For example, you can set a scalar value like this:
TInt32 t = ...;
t.set(100, 0, 0).set(200, 1, 0);We could add something similar but vectors, which will accept a vector, something like
TInt32.ofShape(3, 2, 2)
.set(vector(1, 2), 0, 0)
.set(vector(3, 4), 0, 1)
.set(vector(5, 6), 1, 0)
...where vector instantiate a rank-1 array... or simply a primitive or auto-boxed array. What do you think? I understand that specifying the coordinates for each vector is more explicit but more verbose at the same time. I wouldn't mind though, personally.
There was a problem hiding this comment.
Yeah, I think that's better. I'm in favour of explicitness when constructing values.
I partially do in the |
|
It strikes me that |
In the current proposal, a |
|
Thinking of it, with that new proposal, if we opt for concrete classes per type ( |
|
@Craigacp : here is a little sample of what the migration would be from actual Tensor API to the new one. Note that this code does not compile yet and also assume that will find a way to avoid those It's not the best example though as there is not much happening here but well, at least there is one. |
|
WRT Scala type classes there was some discussion of typing tensors in Java Scala and kotlin at the jvmls after Erik Meijer's talk and I think the consensus was that the Java type system isn't powerful enough. That said my type theory isn't too strong and John Rose was taking about something like that on Twitter during it (https://twitter.com/JohnRose00/status/1156308406585532416?s=19). We're looking at if the Panama memory access API plus var handles will be enough to express tensor slicing, but that's not too useful for the SIG as it's targeting Java 8. |
|
@jxtps, do you have some opinion/suggestions on this topic? I think it’s pretty to close to what you were discussing before in the SIG mailing list. |
|
Hey, sorry I have been neck-deep in some other projects, will try to take a look later this week. |
|
I think that @Craigacp has a good point that the Java type system isn't powerful enough (and in some cases may even be counter-productive). To @karllessard's point about NDArray vs Tensor, I prefer to separate the tensor (view) vs buffer (storage) separation. 2 cents. May I suggest you take a look at tensorflow-scala, which has a fairly well-developed set of types. At least for comparison sake. |
|
@EronWright, as parts of OpenJDK Valhalla start to land in Java then the barrier between primitive types and references wrt things like generics will go away, so it would be nice if we can avoid surfacing too many of these issues until the JVM is ready to help us. I think we'll need to for performance reasons though, which is disappointing. One of the issues I've had when coding in Scala is that it's difficult to know when it's going to box your primitives, and if it does that then your performance wanders off a cliff. As the unification between primitives and references only exists in the Scala language and not in the class files I had to resort to running |
|
Nice work on the RFCs @karllessard et al! I have a couple of somewhat open-ended questions / comments (sorry I'm a little all over the place):
|
The Tensor type-safety features you're looking to create could maybe work in Scala where there are typedefs, primitive specialization of generic classes, and multiple inheritance via traits. In Java, which is a great language in so many ways, you will be banging your head against the type system at every turn. Framework developer time and effort is a very limited resource. Tensor dtype safety is not that big of a deal in practice. Let's not create an enormous type edifice that we will be struggling to design and maintain when doing: public class Tensor {
final public DataType dtype;
...
public IntNdArray asInts() {...} // Throws exception if dtype isn't compatible
...
}would get the job done in a fraction of the time/effort and be easy to use. The only substantive loss I see in shedding the A potential workaround for that is to default to auto-casting and issuing a warning. The user can then choose to either turn off auto-cast (making it an error that throws an exception), or accept auto-cast (silencing the warning). |
|
Thanks @jxtps, there is a lot to reply to right now but let me start with your latest post:
There is no need to overload the operation per tensor type. The type of the operation (and in the case of
That's because in this new RFC, I dropped the primitive-based In this new prototype, tensor types like Tensor<TInt32> t = TInt32.tensorOfShape(2, 2); // TInt32 extends from IntNdArray
t.data().at(0).write(new int[] { 100, 101 }) // possible because IntNdArray accepts to write a int[]
.at(1).write(new int[] { 102, 103 });
In this new prototype, then it is up to
If the trend is to remove the type-safety feature completely, we sure can do but we need to ask more broadly first because I think some users enjoy this aspect of TF Java. Nonetheless, the snippet you provided as a alternative is pretty close to what my new prototype is exposing, except that by keeping type-safety present, only one method is required ( To be continued... |
I tried to simplify it too by moving the factory methods to the datatype class instead of Also, we could have some shortcuts for rank-0 and rank-2 tensors, such as
That is a very valid point, there is also the |
|
Suggestion that you close this PR if it is obsolete. |
|
Thanks @EronWright for raising this up, while a lot of stuff is still accurate with what has been merged lately, some of it must be updated. Instead of just closing it, it could also be interesting to make changes and merge it in TF community repo so it gains some visibility to all TF developers. I can take a look at it. |
RFC: TPU SavedModel Export API for TF2
Proofreading, etc.
Co-Authored-By: Frédéric Branchaud-Charron <frederic.branchaud-charron@usherbrooke.ca>
Co-Authored-By: Gabriel de Marmiesse <gabrieldemarmiesse@gmail.com>
RFC: CSRSparseMatrix
More importantly, add an official guide on how TF1 Estimator users with feature columns can convert their code into TF2 Keras users with KPL.
Update Keras categorical input design, mark it as complete.
RFC: Introducing Transactions extension to Modular Filesystems
RFC: Migrate gelu activation from Addons to Core
Update two proposed changes to the existing Attention layer
RFC: Multihead Attention and EinsumDense on Keras
Merge changes from base
We want to require all new public APIs to have publicly documented arguments and return values.
RFC: TensorFlow Extension Types
No description provided.