Thanks for your awesome work.
Your paper mentioned the LASO method as the baseline, to compare your model performance in your image-object dataset. But the LASO model uses language description and point cloud as input. In your experiment setting, what is the language input of LASO?