Skip to content

Compatibility with Intel GPUs #3

@haesleinhuepf

Description

@haesleinhuepf

Hi @maarzt ,

I'm just testing on Windows 10 / Intel UHD 620 and found some issues in this benchmark:
https://github.com/maarzt/imglib2-trainable-segmentation/blob/clij-benchmark/src/test/java/net/imglib2/trainable_segmention/performance/GpuCpuComparisonBenchmark.java

It crashes on my GPU with this error:

2:570:35: error: access qualifier can only be used for pipe and image type
__kernel void separable_operation(OUTPUT_IMAGE_PARAMETER, INPUT_IMAGE_PARAMETER, __constant float* kernelValues)
                                  ^
2:518:33: note: expanded from macro 'OUTPUT_IMAGE_PARAMETER'
#define OUTPUT_IMAGE_PARAMETER  __write_only __global float* output
                                ^
2:570:59: error: access qualifier can only be used for pipe and image type
__kernel void separable_operation(OUTPUT_IMAGE_PARAMETER, INPUT_IMAGE_PARAMETER, __constant float* kernelValues)
                                                          ^
2:555:32: note: expanded from macro 'INPUT_IMAGE_PARAMETER'
#define INPUT_IMAGE_PARAMETER   __read_only __global float* input
                                ^
Error when trying to create kernel separable_operation
net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -45 -> CL_INVALID_PROGRAM_EXECUTABLE
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkOpenCLErrorCode(BackendUtils.java:352)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.lambda$getKernelPeerPointer$19(ClearCLBackendJOCL.java:601)
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkExceptions(BackendUtils.java:156)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.getKernelPeerPointer(ClearCLBackendJOCL.java:593)
	at net.haesleinhuepf.clij.clearcl.ClearCLCompiledProgram.createKernel(ClearCLCompiledProgram.java:137)
	at net.haesleinhuepf.clij.clearcl.ClearCLProgram.createKernel(ClearCLProgram.java:685)
	at net.haesleinhuepf.clij.clearcl.util.CLKernelExecutor.getKernel(CLKernelExecutor.java:353)
	at net.haesleinhuepf.clij.clearcl.util.CLKernelExecutor.enqueue(CLKernelExecutor.java:229)
	at net.haesleinhuepf.clij2.CLIJ2.lambda$executeSubsequently$0(CLIJ2.java:236)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:97)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:28)
	at net.haesleinhuepf.clij2.CLIJ2.executeSubsequently(CLIJ2.java:227)
	at net.imglib2.trainable_segmention.gpu.api.DefaultGpuApi.lambda$execute$1(DefaultGpuApi.java:57)
	at net.imglib2.trainable_segmention.gpu.api.DefaultGpuApi.handleOutOfMemoryException(DefaultGpuApi.java:65)
	at net.imglib2.trainable_segmention.gpu.api.DefaultGpuApi.execute(DefaultGpuApi.java:56)
	at net.imglib2.trainable_segmention.gpu.api.GpuScope.execute(GpuScope.java:42)
	at net.imglib2.trainable_segmention.gpu.algorithms.GpuSeparableOperation.run(GpuSeparableOperation.java:76)
	at net.imglib2.trainable_segmention.gpu.algorithms.GpuKernelConvolution.convolve(GpuKernelConvolution.java:61)
	at net.imglib2.trainable_segmention.gpu.algorithms.GpuKernelConvolution.apply(GpuKernelConvolution.java:47)
	at net.imglib2.trainable_segmention.gpu.algorithms.GpuConcatenatedNeighborhoodOperation.apply(GpuConcatenatedNeighborhoodOperation.java:49)
	at net.imglib2.trainable_segmention.gpu.compute_cache.GpuGaussContent.load(GpuGaussContent.java:54)
	at net.imglib2.trainable_segmention.gpu.compute_cache.GpuComputeCache$CacheEntry.get(GpuComputeCache.java:90)
	at net.imglib2.trainable_segmention.gpu.compute_cache.GpuComputeCache.get(GpuComputeCache.java:51)
	at net.imglib2.trainable_segmention.gpu.GpuFeatureInput.gauss(GpuFeatureInput.java:41)
	at net.imglib2.trainable_segmention.pixel_feature.filter.gauss.SingleGaussianBlurFeature.apply(SingleGaussianBlurFeature.java:51)
	at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.lambda$apply$2(FeatureJoiner.java:62)
	at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.genericApply(FeatureJoiner.java:71)
	at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.apply(FeatureJoiner.java:62)
	at net.imglib2.trainable_segmention.pixel_feature.filter.AbstractGroupFeatureOp.apply(AbstractGroupFeatureOp.java:57)
	at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.lambda$apply$2(FeatureJoiner.java:62)
	at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.genericApply(FeatureJoiner.java:71)
	at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.apply(FeatureJoiner.java:62)
	at net.imglib2.trainable_segmention.pixel_feature.calculator.FeatureCalculator.applyUseGpu(FeatureCalculator.java:152)
	at net.imglib2.trainable_segmention.classification.Segmenter.segmentGpu(Segmenter.java:129)
	at net.imglib2.trainable_segmention.classification.Segmenter.segment(Segmenter.java:109)
	at net.imglib2.trainable_segmention.performance.ParallelSegmentationTask.lambda$run$0(ParallelSegmentationTask.java:44)
	at net.imglib2.cache.img.LoadedCellCacheLoader.get(LoadedCellCacheLoader.java:91)
	at net.imglib2.cache.img.LoadedCellCacheLoader.get(LoadedCellCacheLoader.java:51)
	at net.imglib2.cache.ref.SoftRefLoaderCache.get(SoftRefLoaderCache.java:101)
	at net.imglib2.cache.util.LoaderCacheAsCacheAdapter.get(LoaderCacheAsCacheAdapter.java:30)
	at net.imglib2.cache.util.CacheAsUncheckedCacheAdapter.get(CacheAsUncheckedCacheAdapter.java:28)
	at net.imglib2.img.cell.LazyCellImg$LazyCells.get(LazyCellImg.java:104)
	at net.imglib2.img.list.AbstractLongListImg$LongListRandomAccess.get(AbstractLongListImg.java:274)
	at net.imglib2.img.cell.CellRandomAccess.getCell(CellRandomAccess.java:136)
	at net.imglib2.img.cell.CellRandomAccess.updatePosition(CellRandomAccess.java:474)
	at net.imglib2.img.cell.CellRandomAccess.<init>(CellRandomAccess.java:130)
	at net.imglib2.img.cell.AbstractCellImg.randomAccess(AbstractCellImg.java:104)
	at net.imglib2.trainable_segmention.Utils.lambda$populateCellImg$7(Utils.java:355)
	at preview.net.imglib2.parallel.DefaultTaskExecutor.lambda$forEach$0(DefaultTaskExecutor.java:101)
	at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1424)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

When removing the __write_only and __read_only from here execution of the benchmark hangs here for quite some time (minutes):

net.imglib2.trainable_segmention.performance.GpuCpuComparisonBenchmark
[WARNING] 1 exceptions occurred during plugin discovery.
log4j:WARN No appenders could be found for logger (org.bushe.swing.event.EventService).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
GPU: Intel(R) UHD Graphics 620
Time copying: 107.304 ms

Edit: It's not "hanging" - it just takes a minute or so before it continues...

Note, I had to make the OPENCL_DEVICE_NAME configurable in order to test this. It might make sense to make this available to end users as well:
https://github.com/haesleinhuepf/imglib2-trainable-segmentation/commit/d32ab1af07d0bd48ab9bcac28736f55231b3f91a

Let me know if I can help with further infos.

Cheers,
Robert

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions