Skip to content

Professor Raouf Hamzaoui

Job: Professor in Media Technology

Faculty: Technology

School/department: School of Engineering and Sustainable Development

Research group(s): Centre for Electronic and Communications Engineering (CECE)

Address: De Montfort University, The Gateway, Leicester, LE1 9BH

T: +44 (0)116 207 8096

E: rhamzaoui@dmu.ac.uk

W: http://www.tech.dmu.ac.uk/~hamzaoui/

 

Personal profile

Raouf Hamzaoui received the MSc degree in mathematics from the University of Montreal, Canada, in 1993, the Dr.rer.nat. degree from the University of Freiburg, Germany, in 1997 and the Habilitation degree in computer science from the University of Konstanz, Germany, in 2004. He was an Assistant Professor with the Department of Computer Science of the University of Leipzig, Germany and with the Department of Computer and Information Science of the University of Konstanz. In September 2006, he joined DMU where he is a Professor in Media Technology and Head of the Signal Processing and Communications Systems Group in the Institute of Engineering Sciences. Raouf Hamzaoui is an IEEE Senior member. He is a member of the Editorial Board of the IEEE Transactions on Multimedia. He has published more than 90 research papers in books, journals, and conferences. His research has been funded by the EU, DFG, Royal Society, and industry and received best paper awards (ICME 2002, PV’07, CONTENT 2010, MESM’2012, UIC-2019).

Research group affiliations

Institute of Engineering Sciences (IES)

 

Publications and outputs 

  • Satisfied user ratio prediction with support vector regression for compressed stereo images
    Satisfied user ratio prediction with support vector regression for compressed stereo images Fan, Chunling; Zhang, Yun; Hamzaoui, Raouf; Ziou, Djemel; Jiang, Qingshan We propose the first method to predict the Satisfied User Ratio (SUR) for compressed stereo images. The method consists of two main steps. First, considering binocular vision properties, we extract three types of features from stereo images: image quality features, monocular visual features, and binocular visual features. Then, we train a Support Vector Regression (SVR) model to learn a mapping function from the feature space to the SUR values. Experimental results on the SIAT-JSSI dataset show excellent prediction accuracy, with a mean absolute SUR error of only 0.08 for H.265 intra coding and only 0.13 for JPEG2000 compression.
  • Subjective assessment of global picture-wise just noticeable difference
    Subjective assessment of global picture-wise just noticeable difference Lin, Hanhe; Jenadeleh, Mohsen; Chen, Guangan; Reips, Ulf-Dietrich; Hamzaoui, Raouf; Saupe, Dietmar The picture-wise just noticeable difference (PJND) for a given image and a compression scheme is a statistical quantity giving the smallest distortion that a subject can perceive when the image is compressed with the compression scheme. The PJND is determined with subjective assessment tests for a sample of subjects. We introduce and apply two methods of adjustment where the subject interactively selects the distortion level at the PJND using either a slider or keystrokes. We compare the results and times required to those of the adaptive binary search type approach, in which image pairs with distortions that bracket the PJND are displayed and the difference in distortion levels is reduced until the PJND is identified. For the three methods, two images are compared using the flicker test in which the displayed images alternate at a frequency of 8 Hz. Unlike previous work, our goal is a global one, determining the PJND not only for the original pristine image but also for a sequence of compressed versions. Results for the MCL-JCI dataset show that the PJND measurements based on adjustment are comparable with those of the traditional approach using binary search, yet significantly faster. Moreover, we conducted a crowdsourcing study with side-by-side comparisons and forced choice, which suggests that the flicker test is more sensitive than a side-by-side comparison.
  • Coarse to fine rate control for region-based 3D point cloud compression
    Coarse to fine rate control for region-based 3D point cloud compression Liu, Qi; Yuan, Hui; Hamzaoui, Raouf; Su, Honglei We modify the video-based point cloud compression standard (V-PCC) by mapping the patches to seven regions and encoding the geometry and color video sequences of each region. We then propose a coarse to fine rate control algorithm for this scheme. The algorithm consists of two major steps. First, we allocate the target bitrate between the geometry and color information. Then, we optimize in turn the geometry and color quantization steps for the video sequences of each region using analytical models for the rate and distortion. Experimental results for eight point clouds showed that the average percent bitrate error of our algorithm is only 3.7%, and its perceptual reconstruction quality is better than that of V-PCC. The Publisher's final version can be found by following the DOI link.
  • SUR-FeatNet: Predicting the Satisfied User Ratio Curve for Image Compression with Deep Feature Learning
    SUR-FeatNet: Predicting the Satisfied User Ratio Curve for Image Compression with Deep Feature Learning Lin, Hanhe; Hosu, Vlad; Fan, Chunling; Zhang, Yun; Mu, Yuchen; Hamzaoui, Raouf; Saupe, Dietmar The satisfied user ratio (SUR) curve for a lossy image compression scheme, e.g., JPEG, characterizes the complementary cumulative distribution function of the just noticeable difference (JND), the smallest distortion level that can be perceived by a subject when a reference image is compared to a distorted one. A sequence of JNDs can be defined with a suitable successive choice of reference images. We propose the first deep learning approach to predict SUR curves. We show how to apply maximum likelihood estimation and the Anderson-Darling test to select a suitable parametric model for the distribution function. We then use deep feature learning to predict samples of the SUR curve and apply the method of least squares to fit the parametric model to the predicted samples. Our deep learning approach relies on a siamese convolutional neural network, transfer learning, and deep feature learning, using pairs consisting of a reference image and a compressed image for training. Experiments on the MCL-JCI dataset showed state-of-the-art performance. For example, the mean Bhattacharyya distances between the predicted and ground truth first, second, and third JND distributions were 0.0810, 0.0702, and 0.0522, respectively, and the corresponding average absolute differences of the peak signal-to-noise ratio at a median of the first JND distribution were 0.58, 0.69, and 0.58 dB. Further experiments on the JND-Pano dataset showed that the method transfers well to high resolution panoramic images viewed on head-mounted displays. The file attached to this record is the author's final peer reviewed version.
  • Feature learning for human activity recognition using convolutional neural networks: A case study for inertial measurement unit and audio data
    Feature learning for human activity recognition using convolutional neural networks: A case study for inertial measurement unit and audio data Cruciani, Federico; Vafeiadis, Anastasios; Nugent, Chris; Cleland, Ian; McCullagh, Paul; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf The use of Convolutional Neural Networks (CNNs) as a feature learning method for Human Activity Recognition (HAR) is becoming more and more common. Unlike conventional machine learning methods, which require domain-specific expertise, CNNs can extract features automatically. On the other hand, CNNs require a training phase, making them prone to the cold-start problem. In this work, a case study is presented where the use of a pre-trained CNN feature extractor is evaluated under realistic conditions. The case study consists of two main steps: (1) different topologies and parameters are assessed to identify the best candidate models for HAR, thus obtaining a pre-trained CNN model. The pre-trained model (2) is then employed as feature extractor evaluating its use with a large scale real-world dataset. Two CNN applications were considered: Inertial Measurement Unit (IMU) and audio based HAR. For the IMU data, balanced accuracy was 91.98% on the UCI-HAR dataset, and 67.51% on the real-world Extrasensory dataset. For the audio data, the balanced accuracy was 92.30% on the DCASE 2017 dataset, and 35.24% on the Extrasensory dataset. open access article
  • Model-based joint bit allocation between geometry and color for video-based 3D point cloud compression
    Model-based joint bit allocation between geometry and color for video-based 3D point cloud compression Liu, Qi; Yuan, Hui; Hou, Junhui; Hamzaoui, Raouf; Su, Honglei In video-based 3D point cloud compression, the quality of the reconstructed 3D point cloud depends on both the geometry and color distortions. Finding an optimal allocation of the total bitrate between the geometry coder and the color coder is a challenging task due to the large number of possible solutions. To solve this bit allocation problem, we first propose analytical distortion and rate models for the geometry and color information. Using these models, we formulate the joint bit allocation problem as a constrained convex optimization problem and solve it with an interior point method. Experimental results show that the rate distortion performance of the proposed solution is close to that obtained with exhaustive search but at only 0.66% of its time complexity. The file attached to this record is the author's final peer reviewed version.
  • Highly Efficient Multiview Depth Coding Based on Histogram Projection and Allowable Depth Distortion
    Highly Efficient Multiview Depth Coding Based on Histogram Projection and Allowable Depth Distortion Zhang, Yun; Zhu, Linwei; Hamzaoui, Raouf; Kwong, Sam; Ho, Yo-Sung Mismatches between the precisions of representing the disparity, depth value and rendering position in 3D video systems cause redundancies in depth map representations. In this paper, we propose a highly efficient multiview depth coding scheme based on Depth Histogram Projection (DHP) and Allowable Depth Distortion (ADD) in view synthesis. Firstly, DHP exploits the sparse representation of depth maps generated from stereo matching to reduce the residual error from INTER and INTRA predictions in depth coding. We provide a mathematical foundation for DHP-based lossless depth coding by theoretically analyzing its rate-distortion cost. Then, due to the mismatch between depth value and rendering position, there is a many-to-one mapping relationship between them in view synthesis, which induces the ADD model. Based on this ADD model and DHP, depth coding with lossless view synthesis quality is proposed to further improve the compression performance of depth coding while maintaining the same synthesized video quality. Experimental results reveal that the proposed DHP based depth coding can achieve an average bit rate saving of 20.66% to 19.52% for lossless coding on Multiview High Efficiency Video Coding (MV-HEVC) with different groups of pictures. In addition, our depth coding based on DHP and ADD achieves an average depth bit rate reduction of 46.69%, 34.12% and 28.68% for lossless view synthesis quality when the rendering precision varies from integer, half to quarter pixels, respectively. We obtain similar gains for lossless depth coding on the 3D-HEVC, HEVC Intra coding and JPEG2000 platforms. The file attached to this record is the author's final peer reviewed version.
  • Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection
    Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection Vafeiadis, Anastasios; Fanioudakis, Eleftherios; Potamitis, Ilyas; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf Speech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions.
  • Comparing CNN and Human Crafted Features for Human Activity Recognition
    Comparing CNN and Human Crafted Features for Human Activity Recognition Cruciani, Federico; Vafeiadis, Anastasios; Nugent, Chris; Cleland, Ian; McCullagh, Paul; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf Deep learning techniques such as Convolutional Neural Networks (CNNs) have shown good results in activity recognition. One of the advantages of using these methods resides in their ability to generate features automatically. This ability greatly simplifies the task of feature extraction that usually requires domain specific knowledge, especially when using big data where data driven approaches can lead to anti-patterns. Despite the advantage of this approach, very little work has been undertaken on analyzing the quality of extracted features, and more specifically on how model architecture and parameters affect the ability of those features to separate activity classes in the final feature space. This work focuses on identifying the optimal parameters for recognition of simple activities applying this approach on both signals from inertial and audio sensors. The paper provides the following contributions: (i) a comparison of automatically extracted CNN features with gold standard Human Crafted Features (HCF) is given, (ii) a comprehensive analysis on how architecture and model parameters affect separation of target classes in the feature space. Results are evaluated using publicly available datasets. In particular, we achieved a 93.38% F-Score on the UCI-HAR dataset, using 1D CNNs with 3 convolutional layers and 32 kernel size, and a 90.5% F-Score on the DCASE 2017 development dataset, simplified for three classes (indoor, outdoor and vehicle), using 2D CNNs with 2 convolutional layers and a 2x2 kernel size.
  • Image-based Text Classification using 2D Convolutional Neural Networks
    Image-based Text Classification using 2D Convolutional Neural Networks Merdivan, Erinç; Vafeiadis, Anastasios; Kalatzis, Dimitrios; Hanke, Sten; Kropf, Johannes; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf; Geist, Matthieu We propose a new approach to text classification in which we consider the input text as an image and apply 2D Convolutional Neural Networks to learn the local and global semantics of the sentences from the variations of the visual patterns of words. Our approach demonstrates that it is possible to get semantically meaningful features from images with text without using optical character recognition and sequential processing pipelines, techniques that traditional natural language processing algorithms require. To validate our approach, we present results for two applications: text classification and dialog modeling. Using a 2D Convolutional Neural Network, we were able to outperform the state-ofart accuracy results for a Chinese text classification task and achieved promising results for seven English text classification tasks. Furthermore, our approach outperformed the memory networks without match types when using out of vocabulary entities from Task 4 of the bAbI dialog dataset.

Click here for a full listing of Raouf Hamzaoui's publications and outputs.

Key research outputs

  • Ahmad, S., Hamzaoui, R., Al-Akaidi, M., Adaptive unicast video streaming with rateless codes and feedback, IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, pp. 275-285, Feb. 2010.
  • Röder, M., Cardinal, J., Hamzaoui, R., Efficient rate-distortion optimized media streaming for tree-structured packet dependencies, IEEE Transactions on Multimedia, vol. 9, pp. 1259-1272, Oct. 2007.  
  • Röder, M., Hamzaoui, R., Fast tree-trellis list Viterbi decoding, IEEE Transactions on Communications, vol. 54, pp. 453-461, March 2006.
  • Röder, M., Cardinal, J., Hamzaoui, R., Branch and bound algorithms for rate-distortion optimized media streaming, IEEE Transactions on Multimedia, vol. 8, pp. 170-178, Feb. 2006.
  • Stankovic, V., Hamzaoui, R., Xiong, Z., Real-time error protection of embedded codes for packet erasure and fading channels, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, pp. 1064-1072, Aug. 2004.
  • Stankovic, V., Hamzaoui, R., Saupe, D., Fast algorithm for rate-based optimal error protection of embedded codes, IEEE Transactions on Communications, vol. 51, pp. 1788-1795, Nov. 2003.
  • Hamzaoui, R., Saupe, D., Combining fractal image compression and vector quantization, IEEE Transactions on Image Processing, vol. 9, no. 2, pp. 197-208, 2000.
  • Hamzaoui, R., Fast iterative methods for fractal image compression, Journal of Mathematical Imaging and Vision 11,2 (1999) 147-159.

 

Research interests/expertise

  • Image and Video Compression
  • Multimedia Communication
  • Error Control Systems
  • Image and Signal Processing
  • Pattern Recognition
  • Algorithms

Areas of teaching

Signal Processing

Image Processing

Data Communication

Media Technology

Qualifications

Master’s in Mathematics (Faculty of Sciences of Tunis), 1986

MSc in Mathematics (University of Montreal), 1993

Dr.rer.nat (University of Freiburg), 1997

Habilitation in Computer Science (University of Konstanz), 2004

Courses taught

Digital Signal Processing

Mobile Communication 

Communication Networks

Signal Processing

Multimedia Communication

Digital Image Processing

Mobile Wireless Communication

Research Methods

Pattern Recognition

Error Correcting Codes

Membership of professional associations and societies

IEEE Senior Member

IEEE Signal Processing Society

IEEE Multimedia Communications Technical Committee 

Current research students

Sergun Ozmen, PT PhD student since July 2019

Mohamed Al-Ibaisi, PT PhD student since January 2017

 

Professional esteem indicators

Guest Editor IEEE Open Journal of Circuits and Systems, Special Section on IEEE ICME 2020.

Guest Editor IEEE Transactions on Multimedia, Special Issue on Hybrid Human-Artificial Intelligence for Multimedia Computing.

Editorial Board Member IEEE Transactions on Multimedia (since 2017)

Editorial Board Member IEEE Transactions on Circuits and Systems for Video Technology (2010-2016)

Area Chair for Multimedia Communications, Networking and Mobility, IEEE ICME 2021, Shenzhen, July 2021

Workshops Co-Chair, IEEE ICME 2020, London, July 2020.

Technical Program Committee Co-Chair, IEEE MMSP 2017, London-Luton, Oct. 2017.

 

 

 

 

Search Who's Who

 

 
News target area image
News

DMU is a dynamic university, read about what we have been up to in our latest news section.

Events at DMU
Events

At DMU there is always something to do or see, check out our events for yourself.

Mission and vision target area image
Mission and vision

Read about our mission and vision and how these create a supportive and exciting learning environment.