Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is it bug or my fault ,I found JavaTensorDataset and Java***DataLoader can not split TensorEexample data into batch size , get batch just whole data size ! #1564

Open
mullerhai opened this issue Jan 2, 2025 · 1 comment

Comments

@mullerhai
Copy link

  val exampleTensorSeq = mnistTrain.map(x => new TensorExample(x._1.native))
  val tensorExampleVector = new TensorExampleVector(exampleTensorSeq*)
val ds = new JavaTensorDataset () {

  override def get(index: Long): TensorExample = {
    tensorExampleVector .get(index)

  }
  override def get_batch(indices: SizeTArrayRef): TensorExampleVector =tensorExampleVector 

  override def size = new SizeTOptional(tensorExampleVector .size)
}

  val opts = new DataLoaderOptions(32)
  opts.workers.put(5)
  opts.enforce_ordering.put(true)
  opts.drop_last.put(false)
  val data_loader = new JavaSequentialTensorDataLoader(ds, new SS(ds.size.get), opts)
println(s"ds.size.get ${ds.size.get} data_loader option  batchsize ${data_loader.options.batch_size()}")
  for (epoch <- 1 to 20) {
//    var it: ExampleVectorIterator = data_loader.begin
    var it :TensorExampleVectorIterator = data_loader.begin
    var batchIndex = 0
   // println("coming in for loop")
    while (!it.equals(data_loader.end)) {
      Using.resource(new PointerScope()) { p =>
        println(s"try to get loop data " )
        val batch = it.access
//        val es = new ExampleStack
//        val stack: Example = es.apply_batch(batch)
        val tes = new TensorExampleStack
        val stack: TensorExample = tes.apply_batch(batch)

        println(s"get batch epoch ${epoch} batchIndex ${batchIndex} batch batchsize   ${batch.size} ds.size.get ${ds.size.get}   " )


console log

Using device: Device(CPU,-1)
ds.size.get 60000 data_loader option batchsize  32
try to get loop data 
get batch epoch 1 batchIndex 0 batchsize : 60000 ds.size.get 60000   
@mullerhai
Copy link
Author

now I had use mnist dataset test most dataset and dataloader in javacpp-pytorch ,

first ; JavaDataset with Java**DataLoader not have bug, could run perfectly ,

JavaTensorDataset with Java*TensorDataLoader , could not detect the batch size

ChunkDataReader with ChunkDataset ChunkDataLoader ,somethime stop down or not detect the batch size .

ChunkTensorDataReader with ChunkTensorDataset ChunkTensorDataloader could not detect the batch size

StreamDataset could run but not batch size , statefulDataset could run but not detect batch size

so we should find the fault operate or if is it bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants