Lingyu Zhu

Hey, I am a doctoral student at Computer Vision Group, Tampere University, where I am working with Prof. Esa Rahtu on multi-model machine learning and audio-visual learning.

I completed my master thesis at Tampere University of Technology with Prof. Heikki Huttunen on data engineering and signal processing in 2017. Prior to doctoral study, I worked as a student researcher at Nokia Technologies till 2019 and was advised by Dr. Tinghuai Wang and Prof. Joni Kamarainen.

Contact info:
Address: TC 307, Korkeakoulunkatu 7, FI-33720, Tampere, Finland
Email: firstname.lastname(at)tuni(dot)fi

profile photo
Research

My research interest is in computer vision with a focus on multi-model machine learning, audio-visual learning, self-supervision, and semantic segmentation. Recent published papers are presented as follows.

blind-date V-SlowFast Network for Efficient Visual Sound Separation
Lingyu Zhu, Esa Rahtu
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022
project | paper

In this paper, we propose a new light yet efficient three-stream framework V-SlowFast that operates on Visual frame, Slow spectrogram, and Fast spectrogram for visual sound separation. The Slow spectrogram captures the coarse temporal resolution while the Fast spectrogram contains the fine-grained temporal resolution.

blind-date Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations
Lingyu Zhu, Esa Rahtu
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022
project | paper

In this paper, we introduce a two-stage visual sound source separation architecture, called Appearance and Motion network (AMnet), where the stages specialise to appearance and motion cues, respectively. We propose an Audio-Motion Embedding (AME) framework to learn the motions of sounds in a self-supervised manner. Furthermore, we design a new Audio-Motion Transformer (AMT) module to facilitate the fusion of audio and motion cues.

blind-date Leveraging Category Information for Single-Frame Visual Sound Source Separation
Lingyu Zhu, Esa Rahtu
European Workshop on Visual Information Processing (EUVIP), 2021 (BEST PAPER AWARD)
project | paper

We study simple yet efficient models for visual sound separation using only a single video frame. Furthermore, our models are able to exploit the information of the sound source category in the separation process. To this end, we propose two models where we assume that i) the category labels are available at the training time, or ii) we know if the training sample pairs are from the same or different category.

blind-date Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu, Esa Rahtu
Asian Conference on Computer Vision (ACCV), 2020 (Oral)
project | paper

We present a system for efficient refining sound separation based on appearance and motion information of all sound sources, and localizing sound sources at pixel level.

blind-date Cross-Granularity Attention Network for Semantic Segmentation
Lingyu Zhu*, Tinghuai Wang*, Emre Aksu, Joni-Kristian Kamarainen
Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019
project | paper

By integrating the cross-granularity contour enhancement into the cross-granularity categorical attention, we achieve better semantic coherence and boundary delineation.

blind-date Portrait Instance Segmentation for Mobile Devices
Lingyu Zhu*, Tinghuai Wang*, Emre Aksu, Joni-Kristian Kamarainen
2019 IEEE International Conference on Multimedia and Expo (ICME), 2019
project | paper

We propose an efficient non-parametric affinity model to achieve efficient instance segmentation on mobile devices.

blind-date Predicting Gene Expression Levels from Histone Modification Signals with Convolutional Recurrent Neural Networks
Lingyu Zhu, Juha Kesseli, Matti Nykter, Heikki Huttunen
EMBEC & NBC, 2017
project | paper

This paper studies how a Convolutional Recurrent Neural Network performs on predicting the gene expression levels from histone modification signals.


Template link.