-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding temporal data to clocs_SecCas #66
Comments
HI @pangsu0613 congratulations for the good work. We also spoke over email before. I am doing my master thesis on 3D object detection using sensor fusion..Do you have any thoughts/suggestions regarding this...Since the kitti object tracking dataset([http://www.cvlibs.net/datasets/kitti/eval_tracking.php]) provides frames from previous time steps as sequences. I was thinking of training the 2D detector(Cascade-RCNN) and 3D detector(SECOND) with the kitti object tracking dataset. Finally I will put the generated 2D detections in the CLOCs source code '/d2_detection_data' path. And then add a LSTM layer in CLocs network and train CLOCs using the same dataset (while keeping the SECOND model as eval mode). I plann to do this as part of my thesis. Sorry for my ignorance. But could you give your feedback regarding whether this can be done. |
Hello @Anirbanbhk88 , I think using multiple frames to augment the detections is a promissing direction. For KITTI, the object detection dataset is not organized in a continuous data sequence style, every frame is relatively independent. KITTI does provide the previous 3 frames for each frame in the object detection dataset, but these preceding frames are not labeled. Using the KITTI tracking dataset is a good option, the only issue is that I remember KITTI tracking dataset is much smaller compared to detection dataset. You could also have a look at nuScenes, Argoverse and Waymo dataset, these data are organized as data sequences (around 15-20 seconds each), and they are all well labeled. Regarding your implementation idea, I think it is good. For simplicity, maybe you could use the pretrained model to start instead of training using the tracking dataset. |
Hi @pangsu0613 thanks for your feedback. As per my obvservations, The kitti tracking dataset has 21 sequences in the training set and 28 sequences in test set. each sequence have more than 100 frames. So overall in total there was around 8008 frames in training set(icluding all the sequences). Once I get some results with a public dataset, I plan to try with a private, company dataset. I have another question. based on your suggestion:
2)Which pretrained models for cascade-RCNN and SECOND I should take? Since I need to fine tune them further. |
Hi @pangsu0613 sorry for asking again...could you please answer my previous questions? |
Hello @Anirbanbhk88 , sorry for the late response.
|
@pangsu0613 Hi I did not quite get what did you mean by 'But noted that there is a potential issue, there are some overlaps between detection dataset and tracking dataset.' and what are the consequences due to this? |
Regarding the first point. The authors of KITTI collected many data, I just refere it as 'raw data', but they only labeled part of it for detection dataset and tracking dataset. Both the detection dataset and tracking dataset are the subsets of the 'raw data'. So, some frames are identical in detection and tracking datasets. So, compared to the detection dataset, tracking dataset is not 'brand new' dataset. The potential issue could be, if there are too many overlaps, it could result in overfitting. |
Hi @pangsu0613 Thanks for your earlier explanations. Since I am using kitti object tracking dataset to feed clocs with sequential data. I have generated 2D detections (in kitti format) from the 2D detector cascade RCNN. I have 2 doubts:
|
|
Hi @pangsu0613 thanks for the quick reply. regarding Question1: which areas of the CLOCs code do I need to make changes for supporting other class (like Van)? Also as per my understanding the point cloud files velodyne_reduced folder of default clocs implementation, are the point clouds for kitti object detection dataset. Now that I am training with kitti tracking dataset I have to replace those with the point clouds of tracking dataset...am I right? |
Hi @pangsu0613 I have one question
|
Hello @Anirbanbhk88 ,
|
Thanks @pangsu0613 for the info |
Hi @pangsu0613 during training the SECOND for cyclist and pedestrian which config files did you use. I see from the SECOND code(https://github.com/traveller59/second.pytorch/tree/v1.5.1) that they have config for car and all.fhd.config(which is config file for multiclass classification). I tried to build a config file for cyclist referring them and my evaluation results came quite low. |
Hello @Anirbanbhk88 , we provide config files for pedestrian and cyclist (pedestrian.fhd.config and cyclist.fhd.config) under CLOCs/second/configs. I would recommend referecing to them for training SECOND. |
Hi @pangsu0613 ,
|
Hello @Anirbanbhk88
|
@pangsu0613 Thanks for the clarifications. However for some 2D detections there are only Cars. Now when I am trying to train Clocs for Pedestrian class (by modifying the line 393 in voxelnet.py), no labels is read from such detection files with only Car detections. So there is no IOU match and hence the fusion network cannot give any valid output (it gives a tensor somewhat like [[-9999999, -999999] |
Hi @pangsu0613 One small question. Is the CLOCS code setup to train with just batch_size=1? Because even if I am increasing the batch_size=8, it is returning just 1 IOU sparse tensor from the voxelnet.py class as per the code. |
It seems the code base only support batch_size ==1. Because in voxelnet forward function, only one image detection result is considered. |
Hi @pangsu0613 I was trying some experiment to pass image and pointcloud data from previous time frames to clocs_SecCas. And modifying by adding some recurrent layers in clocs. I want to check whether this leads to improved detections accuracy. Will this be possible in the current architecture and do you have any suggestion about which dataset I can use for this kind of experiment.
The text was updated successfully, but these errors were encountered: