Professor Raouf Hamzaoui

Job: Professor in Media Technology

Faculty: Technology

School/department: School of Engineering and Sustainable Development

Research group(s): Centre for Electronic and Communications Engineering (CECE)

Address: De Montfort University, The Gateway, Leicester, LE1 9BH

T: +44 (0)116 207 8096

E: rhamzaoui@dmu.ac.uk

W: www.tech.dmu.ac.uk/~hamzaoui/

 

Personal profile

Raouf Hamzaoui received the MSc degree in mathematics from the University of Montreal, Canada, in 1993, the Dr.rer.nat. degree from the University of Freiburg, Germany, in 1997 and the Habilitation degree in computer science from the University of Konstanz, Germany, in 2004. He was an Assistant Professor with the Department of Computer Science of the University of Leipzig, Germany and with the Department of Computer and Information Science of the University of Konstanz. In September 2006, he joined DMU where he is a Professor in Media Technology and Head of the Signal Processing and Communications Systems Group in the Institute of Engineering Sciences. Raouf Hamzaoui is an IEEE Senior member. He is a member of the Editorial Board of the IEEE Transactions on Multimedia. He has published more than 80 research papers in books, journals, and conferences. His research has been funded by the EU, DFG, Royal Society, and industry and received best paper awards (ICME 2002, PV’07, CONTENT 2010, MESM’2012).

Research group affiliations

Institute of Engineering Sciences (IES)

Context, Intelligence and Interaction Research Group (CIIRG)

Publications and outputs 

  • Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection
    Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection Vafeiadis, Anastasios; Fanioudakis, Eleftherios; Potamitis, Ilyas; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf Speech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions.
  • Comparing CNN and Human Crafted Features for Human Activity Recognition
    Comparing CNN and Human Crafted Features for Human Activity Recognition Cruciani, Federico; Vafeiadis, Anastasios; Nugent, Chris; Cleland, Ian; McCullagh, Paul; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf Deep learning techniques such as Convolutional Neural Networks (CNNs) have shown good results in activity recognition. One of the advantages of using these methods resides in their ability to generate features automatically. This ability greatly simplifies the task of feature extraction that usually requires domain specific knowledge, especially when using big data where data driven approaches can lead to anti-patterns. Despite the advantage of this approach, very little work has been undertaken on analyzing the quality of extracted features, and more specifically on how model architecture and parameters affect the ability of those features to separate activity classes in the final feature space. This work focuses on identifying the optimal parameters for recognition of simple activities applying this approach on both signals from inertial and audio sensors. The paper provides the following contributions: (i) a comparison of automatically extracted CNN features with gold standard Human Crafted Features (HCF) is given, (ii) a comprehensive analysis on how architecture and model parameters affect separation of target classes in the feature space. Results are evaluated using publicly available datasets. In particular, we achieved a 93.38% F-Score on the UCI-HAR dataset, using 1D CNNs with 3 convolutional layers and 32 kernel size, and a 90.5% F-Score on the DCASE 2017 development dataset, simplified for three classes (indoor, outdoor and vehicle), using 2D CNNs with 2 convolutional layers and a 2x2 kernel size.
  • Image-based Text Classification using 2D Convolutional Neural Networks
    Image-based Text Classification using 2D Convolutional Neural Networks Merdivan, Erinç; Vafeiadis, Anastasios; Kalatzis, Dimitrios; Hanke, Sten; Kropf, Johannes; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf; Geist, Matthieu We propose a new approach to text classification in which we consider the input text as an image and apply 2D Convolutional Neural Networks to learn the local and global semantics of the sentences from the variations of the visual patterns of words. Our approach demonstrates that it is possible to get semantically meaningful features from images with text without using optical character recognition and sequential processing pipelines, techniques that traditional natural language processing algorithms require. To validate our approach, we present results for two applications: text classification and dialog modeling. Using a 2D Convolutional Neural Network, we were able to outperform the state-ofart accuracy results for a Chinese text classification task and achieved promising results for seven English text classification tasks. Furthermore, our approach outperformed the memory networks without match types when using out of vocabulary entities from Task 4 of the bAbI dialog dataset.
  • SUR-Net: Predicting the Satisfied User Ratio Curve for Image Compression with Deep Learning
    SUR-Net: Predicting the Satisfied User Ratio Curve for Image Compression with Deep Learning Fan, Chunling; Lin, Hanhe; Hosu, Vlad; Zhang, Yun; Jiang, Qingshan; Hamzaoui, Raouf; Saupe, Dietmar The Satisfied User Ratio (SUR) curve for a lossy image compression scheme, e.g., JPEG, characterizes the probability distribution of the Just Noticeable Difference (JND) level, the smallest distortion level that can be perceived by a subject. We propose the first deep learning approach to predict such SUR curves. Instead of the direct approach of regressing the SUR curve itself for a given reference image, our model is trained on pairs of images, original and compressed. Relying on a Siamese Convolutional Neural Network (CNN), feature pooling, a fully connected regression-head, and transfer learning, we achieved a good prediction performance. Experiments on the MCL-JCI dataset showed a mean Bhattacharyya distance between the predicted and the original JND distributions of only 0.072. The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.
  • Picture-level just noticeable difference for symmetrically and asymmetrically compressed stereoscopic images: Subjective quality assessment study and datasets
    Picture-level just noticeable difference for symmetrically and asymmetrically compressed stereoscopic images: Subjective quality assessment study and datasets Fan, Chunling; Zhang, Yun; Zhang, Huan; Hamzaoui, Raouf; Jiang, Qingshan The Picture-level Just Noticeable Difference (PJND) for a given image and compression scheme reflects the smallest distortion level that can be perceived by an observer with respect to a reference image. Previous work has focused on the PJND of images and videos. In this paper, we study the PJND of symmetrically and asymmetrically compressed stereoscopic images for JPEG2000 and H.265 intra coding. We conduct interactive subjective quality assessment tests to determine the PJND point using both a pristine image and a distorted image as a reference. We find that the PJND points are highly dependent on the image content. In asymmetric compression, there exists a perceptual threshold in the quality difference between the left and right views due to the binocular masking effect. We generate two PJND-based stereo image datasets (one for symmetric compression and one for asymmetric compression) and make them accessible to the public. The Publisher's final version can be found by following the DOI link.
  • Interactive subjective study on picture-level just noticeable difference of compressed stereoscopic images
    Interactive subjective study on picture-level just noticeable difference of compressed stereoscopic images Fan, Chunling; Zhang, Yun; Hamzaoui, Raouf; Jiang, Qingshan The Just Noticeable Difference (JND) reveals the minimum distortion that the Human Visual System (HVS) can perceive. Traditional studies on JND mainly focus on background luminance adaptation and contrast masking. However, the HVS does not perceive visual content based on individual pixels or blocks, but on the entire image. In this work, we conduct an interactive subjective visual quality study on the Picturelevel JND (PJND) of compressed stereo images. The study, which involves 48 subjects and 10 stereoscopic images compressed with H.265 intra coding and JPEG2000, includes two parts. In the first part, we determine the minimum distortion that the HVS can perceive against a pristine stereo image. In the second part, we explore the minimum distortion that each subject perceives against a distorted stereo image. Modeling the distribution of the PJND samples as Gaussian, we obtain their complementary cumulative distribution functions, which are known as Satisfied User Ratio (SUR) functions. Statistical analysis results demonstrate that the SUR is highly dependent on the image contents. The HVS is more sensitive to distortion in images with more texture details. The compressed stereoscopic images and the PJND samples are collected in a data set called SIAT-JSSI, which we release to the public.
  • Audio Content Analysis for Unobtrusive Event Detection in Smart Homes
    Audio Content Analysis for Unobtrusive Event Detection in Smart Homes Vafeiadis, Anastasios; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf Environmental sound signals are multi-source, heterogeneous, and varying in time. Many systems have been proposed to process such signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. This paper contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features and which classifiers are most suitable in the presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the signal-to-noise-ratio and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D Convolutional Neural Networks (CNN) using mel-spectrogram energies and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems. The first one, which uses a gradient boosting classifier, achieved an F1-Score of 90.2% and a recognition accuracy of 91.7%. The second one, which uses a 2D CNN with mel-spectrogram images, achieved an F1-Score of 92.7% and a recognition accuracy of 96%. Institute of Engineering Sciences The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.
  • Model-based encoding parameter optimization for 3D point cloud compression
    Model-based encoding parameter optimization for 3D point cloud compression Liu, Qi; Yuan, Hui; Hou, Junhui; Liu, Hao; Hamzaoui, Raouf Rate-distortion optimal 3D point cloud compression is very challenging due to the irregular structure of 3D point clouds. For a popular 3D point cloud codec that uses octrees for geometry compression and JPEG for color compression, we first find analytical models that describe the relationship between the encoding parameters and the bitrate and distortion, respectively. We then use our models to formulate the rate-distortion optimization problem as a constrained convex optimization problem and apply an interior point method to solve it. Experimental results for six 3D point clouds show that our technique gives similar results to exhaustive search at only about 1.57% of its computational cost.
  • Energy-based decision engine for household human activity recognition
    Energy-based decision engine for household human activity recognition Vafeiadis, Anastasios; Vafeiadis, Thanasis; Zikos, Stelios; Krinidis, Stelios; Votis, Konstantinos; Giakoumis, Dimitrios; Ioannidis, Dimosthenis; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf We propose a framework for energy-based human activity recognition in a household environment. We apply machine learning techniques to infer the state of household appliances from their energy consumption data and use rulebased scenarios that exploit these states to detect human activity. Our decision engine achieved a 99.1% accuracy for real-world data collected in the kitchens of two smart homes.
  • Acoustic scene classification: from a hybrid classifier to deep learning
    Acoustic scene classification: from a hybrid classifier to deep learning Vafeiadis, Anastasios; Kalatzis, Dimitrios; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf This report describes our contribution to the 2017 Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. We investigated two approaches for the acoustic scene classification task. Firstly, we used a combination of features in the time and frequency domain and a hybrid Support Vector Machines - Hidden Markov Model (SVM-HMM) classifier to achieve an average accuracy over 4-folds of 80.9% on the development dataset and 61.0% on the evaluation dataset. Secondly, by exploiting dataaugmentation techniques and using the whole segment (as opposed to splitting into sub-sequences) as an input, the accuracy of our CNN system was boosted to 95.9%. However, due to the small number of kernels used for the CNN and a failure of capturing the global information of the audio signals, it achieved an accuracy of 49.5% on the evaluation dataset. Our two approaches outperformed the DCASE baseline method, which uses log-mel band energies for feature extraction and a Multi-Layer Perceptron (MLP) to achieve an average accuracy over 4-folds of 74.8%.

Click here for a full listing of Raouf Hamzaoui's publications and outputs.

Key research outputs

  • Ahmad, S., Hamzaoui, R., Al-Akaidi, M., Adaptive unicast video streaming with rateless codes and feedback, IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, pp. 275-285, Feb. 2010.
  • Röder, M., Cardinal, J., Hamzaoui, R., Efficient rate-distortion optimized media streaming for tree-structured packet dependencies, IEEE Transactions on Multimedia, vol. 9, pp. 1259-1272, Oct. 2007.  
  • Röder, M., Hamzaoui, R., Fast tree-trellis list Viterbi decoding, IEEE Transactions on Communications, vol. 54, pp. 453-461, March 2006.
  • Röder, M., Cardinal, J., Hamzaoui, R., Branch and bound algorithms for rate-distortion optimized media streaming, IEEE Transactions on Multimedia, vol. 8, pp. 170-178, Feb. 2006.
  • Stankovic, V., Hamzaoui, R., Xiong, Z., Real-time error protection of embedded codes for packet erasure and fading channels, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, pp. 1064-1072, Aug. 2004.
  • Stankovic, V., Hamzaoui, R., Saupe, D., Fast algorithm for rate-based optimal error protection of embedded codes, IEEE Transactions on Communications, vol. 51, pp. 1788-1795, Nov. 2003.
  • Hamzaoui, R., Saupe, D., Combining fractal image compression and vector quantization, IEEE Transactions on Image Processing, vol. 9, no. 2, pp. 197-208, 2000.
  • Hamzaoui, R., Fast iterative methods for fractal image compression, Journal of Mathematical Imaging and Vision 11,2 (1999) 147-159.

 

Research interests/expertise

  • Image and Video Compression
  • Multimedia Communication
  • Error Control Systems
  • Image and Signal Processing
  • Pattern Recognition
  • Algorithms

Areas of teaching

Signal Processing

Image Processing

Data Communication

Media Technology

Qualifications

Master’s in Mathematics (Faculty of Sciences of Tunis), 1986

MSc in Mathematics (University of Montreal), 1993

Dr.rer.nat (University of Freiburg), 1997

Habilitation in Computer Science (University of Konstanz), 2004

Courses taught

Digital Signal Processing

Mobile Communication

Communication Networks

Signal Processing

Multimedia Communication

Digital Image Processing

Mobile Wireless Communication

Research Methods

Pattern Recognition

Error Correcting Codes

Membership of professional associations and societies

IEEE Senior Member

IEEE Signal Processing Society

IEEE Multimedia Communications Technical Committee 

Current research students

Mohamed Al-Ibaisi, PT, PhD student since January 2017

Thaeer Kobbaey, FT, PhD student since April 2014

Professional esteem indicators

Editorial Board Member IEEE Transactions on Multimedia (since 2017)

Technical Program Committee Co-Chair, IEEE MMSP 2017, London-Luton, Oct. 2017.

Editorial Board Member IEEE Transactions on Circuits and Systems for Video Technology (2010-2016)

RaoufH

Search Who's Who

 

 
News target area image
News

DMU is a dynamic university, read about what we have been up to in our latest news section.

Events target area image
Events

At DMU there is always something to do or see, check out our events for yourself.

Mission and vision target area image
Mission and vision

Read about our mission and vision and how these create a supportive and exciting learning environment.