Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

WWW 2024 in Singapore

less than 1 minute read

Published:

Post!

The Web Conference 2024 was held at Resorts World Convention Centre located at 8 Sentosa Gateway, Singapore 098269. Resorts World Sentosa (RWS), Asia’s premium lifestyle destination resort, is located on Singapore’s resort island of Sentosa.

portfolio

publications

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Published in Proceedings of the 18th European Conference on Computer Vision (ECCV), 2024

This work has introduced a new training method that enhances general-purpose vision-language understanding and image-oriented question answering through visual self-questioning.

Recommended citation: Sun, G., Qin, C., Wang, J., Chen, Z., Xu, R., & Tao, Z. (2024). SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant. ECCV
Download Paper

Aligning Out-of-Distribution Web Images and Caption Semantics via Evidential Learning

Published in Proceedings of the ACM on Web Conference (WWW), 2024

This work efficiently improve the pre-trained vision-language networks in terms of robustness and performance when handling ID and OOD cases in image-text retrieval tasks via evidence knowledge.

Recommended citation: Guohao Sun, Yue Bai, Xueying Yang, Yi Fang, Yun Fu, and Zhiqiang Tao. 2024. Aligning Out-of-Distribution Web Images and Caption Semantics via Evidential Learning. WWW.
Download Paper

Prototypical Transformer as Unified Motion Learners

Published in Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

This work refines the feature representations via prototype-feature association

Recommended citation: Han, C., Lu, Y., Sun, G., Liang, J., Cao, Z., Wang, Q., Guan, Q., Dianat, S.A., Rao, R.M., Geng, T., Tao, Z., & Liu, D. (2024). Prototypical Transformer as Unified Motion Learners. ICML
Download Paper

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

We introduced T-MASS, where text is modeled as a stochastic embedding, facilitating joint learning of the text mass and video points.

Recommended citation: Wang, J., Sun, G., Wang, P., Liu, D., Dianat, S.A., Rabbani, M., Rao, R.M., & Tao, Z. (2024). Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval. CVPR
Download Paper

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.