Granted STSM

The following Short Term Scientific Missions (STSM) were granted under the COST Action IC1105 3D-ConTourNet (statistic data -> here).

Cristian Perra, Coding and transmission of 3D video content
Ricardo Monteiro, Compatible frame formats for light field video coding
Philippe Becquet​,
Impact Enhancement, Systematic Dissemination and Outreach for 3D content capture, analysis and delivery
Milad Mehrfam,
Systematic comparison between 3D scanning method of Faro free hand style high resolution 3D scanner and Kinect 2
Daniel Moreno González​,
​Adaptation of 3D Reconstruction based on printing materials
Elijs Dima,
Camera synchronization in multidimensional capture arrays
Krzysztof Wegner
, Processing and displaying of natural content on Super-multiview / Light-field displays
Pradip Paudyal, Creation of a Light Field (LF) database for LF image quality assessment
Tomasz Grajek, Compression of Light Field images with the help of 3D HEVC
Krzysztof Wegner, Compression of Light Field images with the help of 3D HEVC
Dragana Djordjevic,
Modeling of the effects of 3D visual discomfort to human emotional state using data mining techniques
Pradip Paudyal, Watermarking in Light Field Images
Mattia Bonomi
, Emotions extraction from videos using video magnification algorithm and cardiac activity measurements
Evgeny Belyaev, Real-time coding of 3D light fields based on 3D wavelets
Marcus Bednara, Sensor fusion for 3D content generation on embedded systems
Miguel Barreda Ángeles, Cognitive Biases in the Assessment of 3D Image Quality of Experience
Tao Wei, Evaluating the performance of skeleton fusion of Multiple Kinects for 360 degree motion tracking
Cristian Perra, Robust coding of holoscopic video for heterogeneous networks
Janko Calic,
Quality of Experience in 3D panoramic photography
Aleksander Väljamäe,
Studying the emotional modulation of 3D sound-induced circular vection
Maciej Kurc,
Time-of-Flight camera calibration and data processing
Aleksander Väljamäe, 
Studying the emotional modulation of 3D sound-based enhancement of circular vection
Luis Lucas,
Multiview video codec design
Antoine Dricot,
Subjective evaluation of Super Mulit-View (SMV) compressed contents on high-end 3D displays
Hana Khamfroush,
Network Coded Cooperation for Time-Sensitive Content
Janko Calic,
QoE aspects of 2D to 3D conversion of photos
Mitra Damghanian, 
Evaluation of the plenoptic camera using a model based approach 
Sebastian Schwarz, 
Time-of-Flight depth sensing for 3D content generation
Emil Dumic
Objective 3D metrics research 
Pedro Correia, 
Advanced MDC techniques for 3D and multiview video coding
Miguel Barreda Ángeles, 
Exploring the effects of visual discomfort on 3DTV viewers’ emotional experience
Caroline Conti, 
Application of High-Efficiency Video Coding on Light-Field 3D Video Content
Emilie Bosc, 
3D Quality of Experience: a focus on visual discomfort 
Klemen Peternel, 
Contextual aspects of Quality of Experience (QoE)
Nelson Francisco, 
Video codec design and assessment collaboration




Cristian PerraCoding and transmission of 3D video content
 April 18th – April 29th 2016
Home: University of Cagliari, Cagliari, Italy
Host: ​Eindhoven University of Technology, Eindhoven, Netherlands

Recent advances on 3D video acquisition, processing and display can enable realistic depth illusion. However, network delivery services using such type of visual information are still far from being at acceptable levels of maturity and quality. The aim of this STSM was to study and define a 3D video coding and transmission process and a software demonstrator to be used as a baseline for a successive evaluation of the compromise between coding parameters and objective quality metrics, subjective quality metrics, quality of service, and quality of experience.

In the work developed during the STSM, standard 3D coding tools were exploited for the development of the research activities with particular attention to the definition of the main coding parameters affecting the bitstream to be transmitted in terms of bitrate and network packet format. The work consisted both in theoretical analysis and experimental analysis of software 3D encoders.

A standard reference data set hase been selected for the experimental phase. The dataset was encoded with different coding parameters and results are reported in the following. The work  developed in this STSM will be used as a basis for the development of joint publications between the University of Cagliari and the Eindhoven University of Technology in areas as video coding, video transmission, subjective and objective quality evaluation, and quality of experience.



Ricardo MonteiroCompatible frame formats for light field video coding
March 28th – April 22nd 2016
Home: Instituto de Telecomunicações / ISCTE-IUL – University Institute of Lisbon, Portugal
Host: ​Multimedia Telecommunications Group, Poznan University of Technology, Poland

The purpose of the STSM was to extend the work done by the applicant on compatible frame formats for light field image coding to light field video. The main advantage of this kind of approach, is the fact that coding efficiency can be improved, relatively to straightforwardly encoding the light field content with using a video coding standard, by only changing the representation.

By applying a pre- and post-processing step, before the encoder and after the decoder, respectably, each light field image is decomposed into a set of segments that are going to be interpreted by the coding standard as individual frames. Therefore, the generated bitstream by the used coding standard produces a compliant bitstream, only needing some metadata to be transmitted for a correct reconstruction. The proposed frame compatible formats consist in either, decomposing the light field image into stripes of micro-images, or on a decomposition based on a checkered pattern.

From the developed framework for light field image coding, using HEVC, it is known that inter-frame prediction tools are very efficient to exploit the redundancy between segments. However, 3D-HEVC was the chosen standard for this extension for light field video, because it exploits the redundancy of a multiview signal on three fronts, spatial, by using intra coding tools; temporal, by using inter-frame coding tools; and viewpoint, by using inter-view coding tools.

Each front has a lot of prediction tools available and this type of content is mainly different from a standard multiview signal. Because of this, a referential change step has been implemented, to optionally be applied before decomposing the content. In this referential change step the (x,y,t) axis, that are the referential for the light field video, can be converted to any other combination of these axis, changing the way that the micro-images are organized.

Regarding the 3D-HEVC configuration, it is possible to define a large number of prediction structures within the time or view domain. In the time domain, the standard hierarchical configuration was chosen, however in the view domain three 2D approaches are proposed. These approaches consist in a rectangular grid, rectangular and diagonal grid and hierarchical grid.

The results using several combinations of the proposed approaches, are going to be compared with non-compliant state-of-the-art approaches for light field image and video. This comparison will be compiled in a joint scientific paper.


Philippe Becquet​, Impact Enhancement, Systematic Dissemination and Outreach for 3D content capture, analysis and delivery
April 15th – April 30th 2016
Home: Carinthia University of Applied Sciences, Villach (AT)
Host: ​Polytechnic University of Madrid ­ ETSIT, Madrid (ES)

3D technologies, in the broad sense, are now very present in our daily lives, and the computer methods associated to it gave rise to an astonishing improvement of product quality in the field of animation movies, video games, art and architecture design, toy conception, etc. The science of 3D scanning and 3D printing brings even more possibilities for designing, optimizing, creating, experimenting, educating, perform reverse engineering, and many more. Though quite recent in the industrial world, this new technology is growing viral and will soon be implemented in every institute and company related to engineering and technology, art, architecture, 3D animation, and even for private use at home. It is therefore primordial to act as pioneers of such new techniques, early acquire this knowledge and be able to share it as our domain of expertise.

Once this technology is mastered, one must turn to the industrial world and seek for potential future partners to develop ourselves and jointly grow, exploiting the new possibilities going along with 3D technologies, as well as to receive financial and industrial support for further development. In this regard, the step of ​ Outreach​ is essential in the process of research and innovation, as well as the preparation of dissemination materials to support this action.

The main purpose of the STSM was the focus on the preparation of the Outreach process, via the creation of dissemination materials to be later presented to professionals from Industry, with the aim of increasing the awareness and possible collaborations, and build mutually beneficial long­lasting relationships with industrial partners. As a part of this process, we generated a database of industrial companies and institutes whom we should reach out as end users of the research results, in order to simplify the actual process of Outreach with potential future partners.

The more detailed structure of the work done during the STSM, as well as the results, are as follow:

Investigation on the targeted sectors of applications in 3D printing and 3D scanning to facilitate the potential partners research:
● Research on broad applications of 3D technologies
● Classification in eight categories, potentially overlapping, which conveniently divide the whole area (Aerospace & Engineering, Art & Sculpture, Entertainment, Science & Education, Jewellery & Fashion, Food, Medical & Dental, Transport)
● Non­exhaustive list of examples in each application area in order to easily spot the strategic plan of action for each different targeted industry

Learning of Outreach methods and strategies in order to better structure the process and to transmit the overall learned knowledge
● Determination of the main goals of Outreach (or Outreaching)
● Outreach methods or material differ depending on the target group
● The relationships built during Outreach aim toward being mutually beneficial and long­lasting, with eventual cross­promoting of content
● Outreach audience must be built long before starting the actual campaign
● We can use several tools available on the Internet to determine the domain of expertise and competence of a company, their popularity on social media if needed, their already built network, their needs, their involvement in scientific papers if applicable, etc.

Research of potential industrial partners from neighboring European Countries, then extended to few companies out of the E.U.
● Examination of the public website of the Austrian Chamber of Commerce as a start, with industries and businesses sortable by sectors (using keywords)
● Selection of companies from related sectors (3D printing/scanning) as well as other businesses potentially interested by our technology (3D design, Architecture, mechanical engineering, etc.)
● Generation of database with relevant informations (name, location, activity, contact…)
● Extension to other European Countries, and few cases out of European Union

Research on the state of the art technologies in the field of 3D Data Acquisition, Content Creation and Delivery
● Recording devices (Cameras, Kinect, etc.)
● Post­acquisition treatment and analysis (existing softwares, methods, etc.)
● Current and future applications (overlapping with part 1)

Structuration of the dissemination materials, or teaching content, that will be used, on the one hand to give an easier access and awareness to/of our expertise in 3D technologies, and on the other hand for the further training of other experts and associates. Contains much information gathered in part 4
● Introduction of 3D technologies, their development, future expectations, applications
● Material and devices (Hardware), 3D Printing & 3D Scanning setups, comparison between different equipments (limitations, advantages, costs)
● Data treatment (Software) from/to the devices
● Examples of application (optimization, reverse engineering, manufacturing, quality control,information technology, videogames, monitoring & surveillance, etc.)

This STSM has been highly successful in many points, by properly performing the planned and required actions, increasing the sharing of knowledge and strengthening the collaborative research work between the two institutes, Carinthia University of Applied Sciences and the Polytechnic University of Madrid, building good relationships between several research team members and developing a wider scientific and professional network. This cooperation therefore has several positive impacts, and brings benefits to the Action 3D ConTourNet, finalizing the investigations carried out during the overall duration of the program and preparing for future joint projects and cooperations. As an additional result of this STSM, at least one joint paper will be discussed and submitted to a scientific journal and/or conference.

Such collaborations between several European Institutes are primordial for the advance of science and the strengthening of international relationships by building tight connexions, partnerships, as well as common goals and interests.

Milad Mehrfam, Systematic comparison between 3D scanning method of Faro free hand style high resolution 3D scanner and Kinect 2
April 15th – April 30th 2016
Home: Carinthia University of Applied Sciences­ FH Kärnten, Villach (AT)
Host: Polytechnic University of Madrid ­ ETSIT, Madrid (ES)

In the last recent years 3D printing industry has had an inevitable impact in both scientific research and technology. This technology is widely used for rapid prototyping, material testing, development of biomaterials and etc. Progress in 3D printing industry led to simultaneous development in 3D scanning methods and devices as well. The idea was not only to be able to design objects in computers and get the product from 3D printers but also the reverse way, to get data of an object in our computers.

The second process, 3D scanning, provides us the possibility to easily get reconstructable data In our computers from an object which already exists. Therefore we have the possibility to change, optimize or develop its shape and consecutively make a prototype of the new design with 3D printers. This is the idea behind 3D ConTourNet.

Currently there are various types of products for purpose of 3D scanning. The problem that most of Industries and research groups are facing is to choose the optimum device according to their Budget. For this reason getting more insight about different features and limitations of each system including efficiency, abilities, user friendliness, compatibility and etc. is necessary. The aim of this study is to make a comparison between some various 3D scanners. Therefore we will have the possibility to choose the ideal option within special budget while receiving the required quality.

This study was done on two devices; Faro free hand style high resolution 3D scanner and Kinect 2. Three steps was performed as following:
· Thorough assessment of Faro 3D scanner, checking the weak and strong features out.
· Precise inspection of Kinect 2 3D scanner, for the weak and strong points.
· Comparison of the features of the two mentioned device.
following is the summary of tests and assessments of this research:
1. Comparison of user friendliness in both hardware and software between two devices.
● Comparison of difficulty of operation with the device (ease of operation).
● Comparison of scan mistakes while working with the device.
● Comparison of complication of using the related softwares to process the images.
● Processing the result from point cloud to mesh.
2. Cost effectiveness was taken into consideration.
● Proportionality of the price of device with the expected operation.
● Comparison of the accessories required.
● Maintenance, service and repair requirements.
3. Resolution of each system was inspected.
● Assessment of resolution and the scanning speed for each device.
● The effect of shadow and light on the object was tested.
● Sensitivity of each device to to motions.
4. Precision of each system was compared
● Comparison between precision of each device with same scanning speed.
● Effective items on reduction of precision, i.e. bad movement of operator, etc.
● Precision­distance ratio of each device.
5. Scanning speed of two systems was compared.

Along with all the assessments in practice some literature review was carried out about similar works in the past. The detailed result of this study is going to be published as a paper in close future. Some new ideas were also developed about new generation of 3D scanners, software developments and motion capture. Further collaboration of the two institutes was another achievement of this mission.

Daniel Moreno González​, Adaptation of 3D Reconstruction based on printing materials
April 15th – April 30th 2016
Home: Universidad Politécnica de Madrid (UPM) – ­ ETSIT, Madrid (ES)
Host:​ Carinthia University of Applied Sciences, Villach (AT)
3D reconstruction and printing are widely used technologies nowadays because of its multiple applications. Its uses at several subjects, like Medicine, have resulted in a very important advance in recent years.

Printing and scanning, separately, have evolve providing increasingly higher quality. The problem is, however, finding a balanced system based on scanning and printing, working together, as cheaply as possible.

The purpose of this STSM was to design a cheap 3D scanning system (working with three ASUS Xtion PRO Live cameras), ​ to test this system and to print the results with differents 3D printers. Moreover, we compared the scanning results obtained with various 3D scanning systems (​ Kinect, Kinect 2, FARO Freestyle…​ ), and printing results using different materials (filament and resin).

During the STSM, following tasks and activities were performed:
● Designing and testing a 3D scanning system based on multiple cameras with depth sensors:
○ Different shapes, sizes and distances tested.
○ Diverse lightning and environment conditions.
○ Interference of light within X­rays emitted from the cameras.

● Learning different software to manipulate and prepare scanned objects for printing:
○ Point clouds­to­mesh transformation.
○ Cleaning and smoothing tasks.

● Learning software to print with resin and filaments materials.

● Comparing quality of different printers in the light of time­performance­use relation.

● Comparing original printed shapes with scanned and printed shapes.

● Preparation and printing of quadcopter pieces according to aerodynamic factors.
○ Testing resin and filament printers.

As result of this STSM, at least one joint paper will be considered to be submitted in a conference or journal. Since knowledge and new ideas were exchanged, a future collaboration was also discussed.

Elijs Dima, Camera synchronization in multidimensional capture arrays
April 13th – April 25th 2016
Home: Department of Information and Communication Systems, Mid Sweden University, Sundsvall (SE)
Host: Chair of Multimedia Telecommunications and Microelectronics, Poznan University of Technology, Poznan (PL)

The intent of this STSM was to investigate the problem of multiple-camera synchronization, and other issues relating to content capture, using the knowledge base of Poznan University of Technology’s researchers.

When capturing multiview video of a scene with object motion, the camera shutter synchronization influences the accuracy of captured motion data, and determines a “maximum speed limit” at every point in the scene. The projections of objects exceeding this speed limit will cross pixel boundaries on camera sensors, resulting in de-coherence of objects’ apparent positioning from multiple camera perspectives, due to the high delay between the multiple camera shutter trigger events.

No explicit evaluations have been made on how synchronization delay affects the capture environment limits. It has been commonly accepted that better synchronization accuracy is more desirable, but there exists no strict classification for acceptable synchronization delay thresholds.

To investigate synchronization delay, and to examine how researchers at Poznan University of Technology handled their 10-camera system synchronization, the following actions were undertaken:
– Examination of Poznan’s camera systems.
– Documentation of Poznan’s 10-camera systems’ data processing chain.
– STSM presentations to Poznan’s researchers.
– Individual discussions on camera capabilities, scene parameters and properties, capture requirements.
– Creation of a multi-camera delay simulation to calculate, visualize the effect of delay in camera system on the maximum supported speed at specific scene points.

When using cameras with hardware shutter synchronization support, the camera sync delay is several orders of magnitude lower than the camera shutter speed setting. Thus, with hardware synchronization, effect of synchronization delay on motion capture / speed limit in scene is negligible. Software synchronization cannot reach equivalent precision, with best reported accuracy under optimal conditions cited as half the time interval between two successive frames (20ms with a standard, 25 frames-per-second video capture). This synchronization delay is significantly higher and has noticeable effect on the permissible speed limit in the scene.

By introducing random camera delay in [0ms,  20ms] interval for a 10-camera arc-based layout (consistent with the multi-camera system used at Poznan University of Technology), the speed limit in a circular scene target area decreased 2.5 to 6 times, depending on distance to camera and other cameras’ view overlaps. This decrease showed that synchronization delay places strict limitations on the capture capabilities of a multi-camera system. By specifying a certain speed of motion that the multi-camera system must support throughout the scene / target area, the maximum permissible camera-to-camera delay can be calculated. From this calculation, camera synchronization algorithms can be categorized as satisfactory or not, and thereby assessed for suitability in multi-camera capture systems.


Krzysztof Wegner, Processing and displaying of natural content on Super-multiview / Light-field displays
August 23rd – August 29th 2015
Home: Multimedia Telecommunications Group, Poznan University of Technology, Poznan (PL)
Host: Holografika KFT,Budapest, Hungary

The purpose of the STSM was to develop procedures and methodology for displaying camera-captured natural content on super-multiview / light-field displays.

Currently produced light filed displays can display image with continuous motion parallax. But in order to accomplish this it requires many (up to hundreds) views of displayed scene. Those views are feed to a multiview-to-light-field conversion algorithm which interpolates and rearranges light rays into light-field slices, which are then projected by a set of projectors on a custom-made holographic screen.

Currently, most of content that can be displayed on such displays comes from computer graphics. This results from difficulty of recording of the scene with sufficiently high number of views. Capturing a high number of views necessary for a light-field display is extremely difficult. For example, for a 45-degree viewing angle display, about 90 views are needed. It means that one view must be recorded every 0.5 degree. It is very difficult to construct an acquisition system composed of 90 views arranged around a scene every 0.5 degree, not only because of technical issues (like cameras synchronization, necessary bandwidth for video data, etc.) but simply because of size of camera body itself.

In order to overcome this problem, an another solution has been proposed. Instead of recording the scene with ultra-dense camera array, we have recorded a scene only with limited number of cameras, and created additional necessary views synthetically basing on the captured views.

During the STSM two multiview sequences: Poznan Blocks, and Poznan Services, captured at Poznan University of Technology by a sparse set of high resolution cameras, has been converted to a dense set of views required by super multiview display in order to provide high quality of 3D experience with reasonable depth range and smooth motion parallax. Recorded videos have been processed: first to create multiview plus depth representation of a recordings, and then to generate 91 views of a recorded scene as required by the used light-field display.

Basing on the conducted experiments, the following conclusion have been drawn.

  • A set of views rendered from estimated 3D model of a captured natural moving scene is of sufficient quality to provide good 3D impression when displayed on novel SMV displays.
  • Noise and distortion introduced to the light-field images should have negative disparity, in order to appear at the back of the displayed scene. Then it appears less disturbing.

In future collaboration, we are planning to test more multiview sequences that will be recorded at Poznan University of Technology. We plan to further exchange knowledge about the best possible configurations of sparse multiview camera system and its application to recording light-filed images.



Pradip Paudyal, Creation of a Light Field (LF) database for LF image quality assessment
September 15th – November 15th 2015
Home: Università degli Studi Roma Tre – Engineering Department, Rome (IT)
Host: Mid Sweden University, Sundsvall (SE)

Light Field (LF) camera can be considered as a relay system, where the main lens creates a main image in the air, then this main image is re-mapped to the sensor by micro-lens array, thus providing multiple views of the image in a single shot. By exploiting this methodology, camera synchronization issues are reduced. The multiple views of the single scene can be used for many applications: post shot refocusing, stereo acquisition, extended depth of field estimation, etc. Meanwhile, the LF imaging requires high computational power.
The LF cameras, Lytro and Raytrix, allow to the end user to exploit such a technology. The rapidly improving LF technology and consumer interest towards this technology is pushing the need for quality evaluation of such content. In this scenario, availability of a LF dataset is needed for training, testing and benchmarking LF image processing algorithms and for quality evaluation purposes.
In this STSM, a LF image database has been created. In more detail:

  • A state-of-the-art have been performed on similar content image databases. From the outcomes of this analysis it is shown the needs for novel, well-defined general purpose LF image database.
  • A database design framework (image content/scene selection criteria, number of reference images/source sequences (SRCs), etc.) has been defined.
  • Based on the defined framework, the images have been captured by using Lytro Illum LF camera.
  • The captured LF images have been analyzed by exploiting general key image quality attributes, spatial information, colorfulness, and texture. The results show that the considered LF images are sufficient to cover the wide range of attributes.
  • Future work plan has been defined for further collaboration between home and host institutions aiming to conduct a research towards the subjective quality assessment of processed LF images.
  • Finally, the processed LF images with annotated subjective scores will be available for the research community.


Tomasz Grajek, Compression of Light Field images with the help of 3D HEVC
Period: September 20th – September 26th , 2015
Home: Multimedia Telecommunications Group, Poznan University of Technology, Poznan (PL)
Host: The Politechnic Institute of Leiria, Leiria (PT)

The purpose of the STSM was to perform experiments related to compression of 3D holoscopic images in a form of a light filed (LF) images with the help of currently available compression technology. In particular, a comparison of the performance of the state-of-the-art methods with the novel 3D video compression technology, namely 3D-HEVC, was the main goal of this STSM.

As 3D-HEVC was designed to cope with multiview images, light filed images in multiview representation have been used. The 3D-HEVC has been used in its most efficient mode HEVC_EXT 2. For the purpose of comparison , several state-of-the-art approaches including the so-called locally linear embedding-based prediction has been used.

During the experiments, the following configurations have been tested:

Configurations applied for multiview representation

  • Concatenation of all views into a single “big” image
  • Simulcast (independent compression of each view)
  • Interview prediction from the center of the grid
  • Interview prediction along grid
  • Interview prediction from the center and along the grid
  • Interview prediction from the center, along the grid and with all cross grid prediction direction
  • Locally linear embedding-based prediction applied on concatenated “big” image.

Configurations applied for microimages representation

  • As is. Light field image in microimage representation is a single image, so the easiest approach is to compress it with HEVC.
  • Locally linear embedding-based prediction.

The research was started with small light field image in a form of grid of 3×3 views. It was encoded multiple times with all the above mentioned configurations. Comparison between each of the configurations was done by using the Bjontegaard matric (BD-rate). After experiments with a grid of 3×3 views, the experiments have been repeated for a grid of 5×5 views.

The 3D-HEVC provides about 85% bitrate reduction comparing with simulcast scenario regardless of the size of the grid used. What is interesting, compression of a light field in microimage representation provides about 34% bitrate reduction in comparison to just transmitting all the views in simulcast or as a one big image. Specialized tools implemented on top of HEVC such as “Linnear Embeding based Prediction” can provide 30% bitrate reduction for microimage representation and about 3% for views representation.

Basing on the conducted experiments, the following conclusions have been drawn.

  • 3D-HEVC provides a clear advantage in case of compressing light filed images in multiview representation.
  • Multiview representation is more efficient for the analyzed image.
  • With the increase of the number of views, compression gains of 3D-HEVC over simulcast increase as well.

In future collaboration, further exploitation of the potential of 3D-HEVC for light filed images compression, especially in context of JPEG pleno, and MPEG FTV activities, is planned. Moreover, further experiments with a greater number of light field images are expected. A joint paper presenting obtained results is currently in preparation. Moreover, joint contributions to the next meeting in February 2016 were initiated.


Krzysztof Wegner, Compression of Light Field images with the help of 3D HEVC
Period: September 20th – September 26th , 2015
Home: Multimedia Telecommunications Group, Poznan University of Technology, Poznan (PL)
Host: The Politechnic Institute of Leiria, Leiria (PT)

Nowadays, light field sequences are commonly captured by a specialized camera equipped with microlenses array. Many techniques were developed especially for this type of images. On the other hand, light field image can be converted into a set of images from adjacent viewpoints, resulting in 2D rectangular array of views. Such a multiview image can be very effectively compressed by many of the state-of-the-art compression technologies, e.g. 3D-HEVC.

3D-HEVC is an extension of MPEG-H part 2 (HEVC) standard, designed for efficient compression of a small number linearly arranged views. Efficient interview motion prediction with advanced residual prediction allows for almost 40% bitrate reduction in comparison to simulcast (independent) transmission of those views.

In a course of this STSM, the reference implementation of the 3D-HEVC encoder has been modified to relax some build-in limitations, like the number of views to be encoded simultaneously or the number of Picture Parameter Sets be used.

During experiments nine different prediction schemes (configurations) have been examined for 7×7 views and 9×9 views light field images.

The 3D-HEVC provides about 92% bitrate reduction comparing to simulcast scenario, regardless of particular prediction structure used in 7×7 or 9×9 cases. Changing the structure of prediction can improve compression performance by 10.49%.

In order to compare 3D-HEVC with other methods we have used “PlaneAndToy” LF image acquired by cameras with microlenses. It was converted into a grid of 28×28 views, which has been further divided into 16 groups of 7×7 views. Each group of views, containing 49 views, was encoded independently into a single bitstream by 3D-HEVC encoder.

Even in that case 3D-HEVC can provide about 60%-65% bitrate reduction comparing to simulcast scenario. Changing prediction structures can provide 12% bitrate savings over most straightforward prediction structure.

In future collaboration, further exploitation of the potential of 3D-HEVC for light filed images compression, especially in context of JPEG pleno, and MPEG FTV activities, is planned. Moreover, further experiments with a higher number of light field images are expected. A joint paper presenting obtained results is currently in preparation. Moreover, joint contributions to the next meeting in February 2016 were initiated.


Dragana Djordjevic, Modeling of the effects of 3D visual discomfort to human emotional state using data mining techniques

Period: 4th – 13th June 2015
Home: R&D Institute RT-RK, Faculty of Techical Science, University of Novi Sad, Novi Sad (RS)
Host: Polytech Nantes/Université de Nantes IRCCYN/IVC,Nantes (FR)

The quality assessment of any multimedia content includes the impact of two different types of
factors: quality of service and quality of experience. One of the main ideas of this STSM is
focused to quality of experience (QoE) in the field of 3D video content, because this kind of
content could produce various visual unpleasantness and discomfort for the viewer even the
quality of service is satisfied. In order to avoid such viewer’s visual discomfort it is necessary to
adjust the 3D video content to viewer’s preferences and needs, but such fine-tuning is only
possible if the physical, physiological and emotional state of human is followed in the real
conditions. Since emotional states play an important role in the consumption of 3D video
content, it is necessary to involve them in the design process of 3D video content. The real
conditions of human behavior could be achieved only by subjective testing and objective
measurement of parameters of human body. To be sure that the experiment for collecting data
of the objective measurements of human state is good for usage in assessment of QoE in 3D
video content, it is necessary that all data is collected from the number of participants in
various ages, genders and etc. On the other hand, such testing and experiments could produce
a lot of fuzzy raw data which must be preprocessed for scientific usage. The process of
preprocessing and collecting or even making the useful data set for future manipulation is the
first step of data mining. The main purpose of this STSM is to use the existing knowledge of
behavior in the field of 3D video QoE assessment of different data mining /machine learning
techniques and take advantage of them for the purposes of mapping the visual discomfort /
emotional state to different type of 3D content. The final output has as purpose to make
investigation which data mining technique is the best choice for classifying the visual
discomfort based on objective measures of the physical, physiological and emotional state of
human or even classification of content based on this kind of data. After collected knowledge
the model or models for data classification could be used the various software for
recommending some content or preventing it, firstly in the TV industry and consumer electronic
devices and secondly may be in the infotainment software in automotive. The real work on the
STSM was divided in several steps. Some of the steps were performed during the period of visit
of the host institution because of need for joint work and other in home institution.
Firstly the all necessary discussion about dataset and experimental setup were performed. The
psycho-physiological data that were discussed: heart rate (HR), electro-dermal activity (EDA),
facial electromyography (EMG) and brain activity (BA). The brain activity measure was
discarded due to the fact that the data is fuzzy and not useful for the process of data mining.
The electro-dermal activity was divided in two measures: the difference of potential between
the 2 electrodes put on the left lower eyebrow and the difference of potential between the 2
electrodes put on the left upper major zigomatic. The heart rate was measured using electrode
HR and the electro-dermal activity or GSR (Galvanic Skin Response) using difference between
two electrodes labeled as EDA. After the raw data was analyzed and realized that for each of
psycho-physiological measure for each participant the sample rate is big (there was a 2500 samples per 1 second), the dataset was reduced to sample rate of 1 Hz, 1 sample per second.
The reduction was performed using low pass filter. After the all information is collected the
prior art papers and studies are used for deeper understanding of the field of physiological and
physical conditions of human body based on environment changed or some other effect
Second step was performing the nonlinear regression in the same way that the multiple linear
regressions is used for discovering does the visual discomfort of 3D content affect the viewers
emotional arousal. The results showed, as it was expected that nonlinear regression could give
us information that the visual discomfort and emotional arousal are in correlation.
The third and the hardest step were to extract and recognize the useful features from the raw
data. The mathematical analyze were finding the set of parameters for every participant and for
every video and every objective measure (heart rate, electro-dermal activity, facial
electromyography): mean, median, standard deviation, skewness, kurtosis. All of the results are
collected for the purpose of finding some pattern for every measure and universal features for
participants. Those results are used for selection of model in final stage.
Finally, the last stage is selection of data mining techniques for the classification and the model
setup and collecting the output results. The last stage is still in development and progress due
to the short time spent in the host institution and will be completed within the home
institution. The models which are used for classification and pattern recognition are Artificial
Neural Network, fuzzy logic and Support Vector Machine.


Pradip Paudyal, Watermarking in Light Field Images

Period: 13th – 24th July 2015
Home: Università degli Studi Roma Tre – Engineering Department, Rome (IT)
Host: ISCTE – University Institute of Lisbon, Lisbon (PT)

One of the most promising technologies developed in the imaging field in the last decade is the plenoptic camera. The most appealing feature of this equipment is that even a single snapshot can provide photos where focus, exposure, and even depth of field can be adjusted after the picture is taken.  Among the several challenges posed by the new technology, copyright protection and data security must be considered for allowing a trustful delivering of these data. To this aim, novel watermark techniques need to be devised.

In this STSM, the robustness of a recently proposed watermarking scheme versus state-of-the-art light field compression techniques has been studied. To this aim, a test set based on original and compressed plenoptic images has been built.

The experimental results shows that the performances of the proposed watermarking schemes, evaluated by computing the correlation between the original and the extracted watermark, are interesting when low compression rates are considered. This result is common for the considered compression techniques: JPEG, JPEG 2000, HEVC intra and HEVC SS. In particular, the watermarking scheme is more robust against JPEG 2000 than compared to other methods.

As result of this STSM, the basis for a joint extended cooperation on this topic has been set.


Mattia Bonomi, Emotions extraction from videos using video magnification algorithm and cardiac activity measurements

Period: 16th – 27th February 2015
Home: University of Trento, Trento (IT)
Host: Barcelona Media, Barcelona (ES)

The main purpose of this STSM was to investigate the best techniques to infer people emotional reactions towards multimedia contents by means of webcam-based cardiac activity measures such as Heart Rate (HR) and Heart Rate Variability (HRV). This STSM is the first part of a longer research project aiming at analysing the effectiveness of non invasive techniques to extract accurate physiological signals.

The activities that have been done are as follows:

(1)     We exploited some specific computer vision algorithms in order to estimate people’s physiological signals in a non-invasive way, just using a common camera without the need of any extra equipment.

(2)     With the so obtained complete electrocardiogram, we validated the accuracy of the camera measure by comparing it to ground truth physiological measurement obtained by using common medical sensors.

(3)     In order to understand which is the link between physiological measurements and human emotions, we investigated some papers describing how and when HRV can be a useful measure for understanding human psychological state.

In this STSM we designed specific experiments to collect a complete dataset: the need was to have video recordings of people’s faces while doing some stressing tests. At the same time we recorded HR activity by using common electrodes. Two experiments were taken into account: one based on showing images taken from GAPED Dataset, and the other one based on Montreal Imaging Stress Task. Our need was to stress participants in order to understand if the HRVs measured by the sensors were highlighted by camera based techniques too.

After acquisition, we proceeded by analysing data with different algorithms, among which Color-based Amplification, Motion-based Amplification and Phase-based Motion Amplification algorithms. We defined some environmental conditions and algorithm parameters to gain better results.

As a result, we have collected and evaluated a first set of signals and opened the path for next more complete and extended experiments analysing the effectiveness of non invasive techniques to extract physiological signals, as the hart rate.


Evgeny Belyaev, Real-time coding of 3D light fields based on 3D wavelets

Period: March 23rd– 29th 2015
Home: Tampere University of Technology, Tampere (FI)
Host: Holografika Kft., Budapest (HU)

1. Purpose of the STSM
The purpose of the STSM is discussion related to possible modification of multi-view video coding scheme based on three-dimensional discrete wavelet transform (3-D DWT) for real-time 3D light fields encoding, storage, transmission and decoding.

2. Description of the work carried out during the STSM
The following work has been carried out during the STSM:

  • Presentation of the results in the field of low-complexity multi-view video coding based on 3D wavelets by Dr. Evgeny Belyaev for Holografika Kft. research team.
  • Discussion of the state-of-the-art approaches for 3D light fields coding which are used by Holografika Kft based on JPEG and H.264 image/video coding standards.
  • Rate-quality-complexity comparison of 3D light fields coding based on 3-D DWT and JPEG/H.264.
  • Discussion of possible modifications of 3-D DWT video coding algorithm to add some 3D light fields specific tools, such as view bit stream scalability and partial frame decoding.

3. Description of the main results obtained
During STSM the following results have been obtained:

  • It was shown that 3-D DWT coding algorithm provides up to 4 times faster encoding speed in comparison with existing software implementations of H.264 standard (x.264 codec). Therefore, it can be used instead of H.264 to reduce amount of computers needed for real-time 3D light fields encoding or to increase video frame resolution or video frame rate.
  • It was shown that introduction of inter-field wavelet transform (4-D DWT) can improve compression performance of 3-D DWT codec from 10-25%.
  • It was shown that 3-D DWT coding algorithm provides similar decoding speed in comparison with existing software implementations of H.264 standard (x.264 codec). Therefore, there is no any significant advantage of current version of
    3-D DWT codec in case of 3D light fields playback.
  • It was found out that for a content playback by holographic displays it is enough to decode at each projector just a small part of a frame. It was suggested to modify 3-D DWT algorithm to make possible this partial frame decoding. It was estimated, that the partial frame decoding can increase the decoding speed up to 2-4 times without significant degradation of compression performance. It means that after this modification the 3-D DWT codec will be more preferable than H.264 for 3D light fields playback.
  • It was found that 3-D DWT codec has approximately 0.5 sec algorithmically delay caused by frame accumulation at the encoder side. Therefore, it cannot be used for holographic conferencing. Taking this into account, the 3-D DWT codec should be modified so that the temporal transform is performed without this delay.

4. Future collaboration with the host institution (if applicable)
In future collaboration, we are planning to modify 3-D DWT codec using the STSM results (introduction of inter-field transform, partial frame decoding and low-delay temporal transform), implement the end-to-end system for 3D light fields encoding/decoding based on 3-D DWT codec for Holografika Kft and publish results.


Marcus Bednara, Sensor fusion for 3D content generation on embedded systems

Period: March 9th – 13th 2015
Home: Fraunhofer Institute for Integrated Circuits, Erlangen (DE)
Host: Mid Sweden University, Sundsvall (SE)

Main subject of this STSM was to investigate methods for efficiently performing a sensor fusion algorithm (called Time-Of-Flight resolution upscaling) on an embedded system. This algorithm allows to virtually increase the spatial resolution of a time-of-flight range sensor by evaluating texture data from a traditional color image sensor. Originally, the algorithm was optimized for gaining the maximum image quality but without embedded systems or real time aspects in mind. The upscaling method is based on a global least squares optimization of a weighted sum of error energy terms. This leads to a large system of linear equations that has to be solved for each input image. The number of equations depends on the resolution of the color sensor and is usually in the range of several millions. The matrix of the equations system is sparse but non-square and has a complicated structure, so the factorization of the matrix (using the QR-decomposition method) is computationally intensive and thus currently not applicable even for high-performance embedded systems.

During the STSM, we have developed an advanced approach that is based on a gradient computation before building the system of equations. For the cost of computing the partial derivatives of the error energy sum, we obtain a system of linear equations with the following benefits compared to the original solution:

  • The number of equations is significantly smaller (approximately 2.5 times less).

  • The equations matrix has a much simpler structure and consists only of a small number of diagonals, which allows for a very efficient exploitation of the matrix structure during factorization.

  • Presumably, a faster factorization method than QR-decomposition can be used (e.g. Cholesky method). This depends on the conditioning of the matrix which is subject to future work.

Moreover, we have discussed several aspects of parallelization of the algorithm which is compulsory for achieving the performance required for a real time implementation. Parallelization can be achieved by several strategies:

  • The input image can be subdivided into approximately equal-sized tiles which are processed simultaneously.

  • In an adaptive approach the tile size may be varied depending on the local TOF variance. This will reduce visible artifacts but requires for a load balancing between the processor cores and lead to unpredictable timing behavior.

  • The third parallelization strategy requires no tessellation of the input image but relies on the inherent parallelism of linear algebra operations like vector or vector-matrix arithmetic. Since we operate on large vectors and matrices, this arithmetic can be efficiently distributed among the available number of processor cores while minimizing the communication between the cores.

Besides the pure software solution we have discussed some basic ideas on how reconfigurable logic (FPGA) can be used for improving the performance. At least, FPGA can be efficiently used in the preprocessing, i.e., filtering of the sensor input data and computing the weights of the error energy terms. This subject requires further analysis.


Miguel Barreda Ángeles, Cognitive Biases in the Assessment of 3D Image Quality of Experience

Period: 31st May – June 13th 2015
Home: Barcelona Media, Barcelona (ES)
Host:Polytech Nantes – Université de Nantes (FR)

The goal of this STSM was to start initiate an exploration of how well-known cognitive biases that affect daily decisions can also shape evaluations of image quality in the context of 3D media. This goal comprised three specific objectives:

(1)   To conduct a review of the relevant literature on cognitive psychology, and, more specifically, on prospect and decision theory, as well as in 3D QoE, in order to determine cognitive biases that are likely to affect the assessment of the quality of 3D contents. This objective also includes reviewing whether the existence of those biases has been previously detected in research on QoE, and, if so, how their analysis has been tackled.

(2)   To design and implement one or more experiments for analyzing the effects of cognitive biases on the evaluations of quality.

(3)   To set up the basis for future collaborations between the home and host institutions to investigate the effects of cognitive biases in 3D QoE assessment, which eventually will lead to key improvements in existing methods in QoE research.

In parallel to these objectives, the STSM also served for providing support for the analysis of changes in viewers’ physiological reactions related to emotions and QoE when consuming 3D media contents, conducted by other members of the 3DConTourNet Action.

The work carried out within the STSM allowed obtaining a detailed picture of the state of the art related to the effects of cognitive biases in 3D QoE research. The results show that, although this matter had not been systematically analyzed, there is plenty of evidence on the effects of many of the cognitive biases described in the literature on psychology affecting QoE perceptions. However, some biases, such as the narrowing of attention related to emotional contents, have not been explored in the context of 3D viewing. Consequently, during the STSM period a design for an experiment exploring this issue was elaborated, as well as possible new analysis on data form previous experiments regarding this question. Furthermore, this review work showed the relevance of the concept of subjective enjoyment (which can be shaped by the cognitive biases analyzed) in the overall perceived QoE. In order to clarify the relationship between 3D quality, cognitive biases, and enjoyment, tasks related to the measurement of enjoyment and 3D QoE were described, and will be proposed to be included in the experiments carried out within the work of the Task Force on 3D and Emotions.


Tao Wei, Evaluating the performance of skeleton fusion of Multiple Kinects for 360 degree motion tracking

Period: February 1st – 15th, 2015
Home: Software Research Institute, Athlone Institute of Technology (IE)
Host:Informatics Telematics Institute, Centre for Research and Technology Hellas (GR)

A multiple Kinects fusion prototype which supports 360 degree skeleton tracking was developed in Athlone Institute of Technology (AIT). It can estimate the motion and direction of the user in 360 degree movement environment. The initial tests show that the fusion results are stable and fluent. However, for evaluating the performance of the multiple Kinects fusion prototype, it is necessary to compare the fusion results against a gold standard optical motion capture system, e.g. Vicon.
The aim of the present STSM was to evaluate the performance of the multiple Kinects fusion prototype developed in AIT using facilities provided by Centre for Research and Technology Hellas (CERTH).
During the STSM, following tasks and activities were performed:

  1. Knowledge exchange between the researchers of the host institute and the beneficiary in areas related to multiple Kinects fusion and comparison.
  2. Recording different types of data (motionless, simple movement and dance motions) of the Vicon system and the multiple Kinects fusion system at the same time.
  3. Acquiring and understanding the Vicon data.

After calibration between the Vicon and Kinect coordinate systems, preliminary results showed that the Vicon data can be transformed to Kinect joint coordinates. As a gold standard, the recorded Vicon data is stable enough for the future evaluating tasks. Moreover, modifying the missed markers of Vicon system, synchronizing and comparing Vicon and Kinect data were also discussed and stated as future work.
As result of this STSM, at least one joint paper will be considered to be submitted in a multimedia related conference or journal. Since knowledge and new ideas were exchanged, a collaboration on Horizon 2020 proposal submission was also discussed.


Cristian Perra, Robust coding of holoscopic video for heterogeneous networks

Period: February 16th – 27th, 2015
Home: Department of Electrical and Electronic Engineering, University of Cagliari, Italy (IT)
Host:Instituto de Telecomunicações, Leiria (PT)

Recent advances on holoscopic imaging acquisition and display enable realistic depth illusion without the need of 3D glass technologies. However, multimedia delivery services using such type of visual information are still far from being at acceptable levels of maturity and quality. The huge amount of data comprising holoscopic video signals and the computational methods required to extract useful information from them (e.g. views, focus planes, depth) require efficient coding combined with specific quality assurance in order to guarantee that relevant light field information is not severely distorted when delivered over noisy channels and/or lossy networks.

The main activities that have been done during the STSM are as follows: a) analysis of the Lytro Light Field camera; b) dataset definition and light field reconstruction; c) definition of experimental scenarios for studying coding performance and robust coding of holoscopic signals for transmission over heterogeneous networks. The first two point are the basis for the development of the third point.

The Lytro Illum is a new professional Light Field Camera for creating pictures that can be re-focused after shooting. The Lytro Illum is the second generation, after the first consumer orientated Lytro Light Field Camera. The Lytro Illum is able to produce almost 4 times as much detail as the original due to the use of a new larger sensor. At the heart of the Lytro Illum is the Lytro Light Field Sensor composed by a micro-lens array made up of thousands of tiny lenses designed to record and measure light from multiple directions. Developed by Ren Ng, founder of Lytro, the Light Field Technology allows for re-focusing images after taking them, as well as alter perspective and generate 3D images from photos.

A dataset of images have been defined and several pictures have been captured with the Lytro Illum Camera. Light field reconstruction techniques have been explored in order to define appropriate methodologies and reference software per further experimentation.

Finally, some experimental scenarios for studying robust coding of holoscopic signals for transmission over heterogeneous networks has been defined and proposed for future implementation of experimental setups.


Janko Calic, Quality of Experience in 3D panoramic photography

Period: July 17th – 31st, 2014
Home: I-Lab, CVSSP, University of Surrey (UK)
Host: Faculty of Technical Sciences, Novi Sad (Serbia)

The STSM is a continuation of the coordinated research activities between the University of Surrey and Faculty of Technical Sciences focusing on the Quality of Experience of 3D photographic content. The host institution representative, Dr Dragan Kukolj is a key contributor to 3D multimedia QoE research within the Qualinet COST action (IC1003), while Dr Janko Calic is the leader of the Joint Task Force between the 3DConTourNet (IC1105) and the Qualinet COST actions.

The STSM focused on 3D photography and panoramic photography, whose proliferation has brought a plethora of interesting applications and devices but has not been thoroughly studied. Consumer devices have been able to capture and create these types of 3D content, but their Quality of Experience has not been addressed. Thus, a joint research between the two institutions and the two COST actions has addressed this problem.

The STSM focused on developing a metric for estimating deformations of the visual content induced by either capturing device or image processing that generates the photographic panoramas.

In order to support this study, a small reference dataset was created during this STSM. Each set comprises:

  • A hi-res single shot of the scene
  • A series of overlapping images (50%) covering the scene
  • Automatically generated panorama using the series of overlapping images above by automated image stitching
  • A panorama of the scene generated by continuous swipe panoramic capture

In addition to the research coordination between the two institutions and COST actions, this STSM has investigated a number of potential ideas and for future project proposals. Two area of joint research have been identified: framework for personalisation, contextualisation, optimal delivery and applications of HDR image and video technology; and the design of an entire digital ecosystem of user-generated 3D content.


Aleksander Väljamäe, Studying the emotional modulation of 3D sound-induced circular vection

Period: June 16th – June 30th, 2014
Home: DEP lab, Department of Behavioural Sciences and Learning, Linkoping University (SE)
Host: Perception and Cognition research group, Barcelona Media, Barcelona (ES)

The present STSM is a direct continuation of the previously granted STSM entitled “Studying the emotional modulation of 3D soundbased enhancement of circular vection” that has been carried out during the period of June 16th to June 30th 2014. That STSM has led to the first audio-visual experiment design and execution. The current STSM is concerned with solely auditory aspects of 3D content, namely, with spatial sound quality.

The aim of the present STSM is to continue experiments exploring the emotional modulation of 3D sound-induced circular vection (illusory self-motion) through subjective, behavioral and peripheral physiology measures such as self-reports, vection onset times, electrodermal activity (EDA), heart rate (HR) and facial electromyography (EMG).

Unlike the first audio-visual experiment design in the previous STSM, here we are using only auditory rotating scenes of ecological nature. We are using acoustic landmark type sounds with different emotional valence (e.g. positive or negative human voice). Each created soundscape has 3 naturalistic sound objects. The sounds were pre-screened before the main experiment to select the most extreme valence conditions. The main source of the emotional sounds was International Affective Digitized Sound system (IADS) by Bradley and Lang (2007), however, we also used similar sounds from FreeSound data base (

Experiments in the first STSM used abstract visual content with embedded emotional content (IAPS images) and this revealed possible source of confusion for the participants. In such experimental design, the attention should be shifted between vection inducing stimuli and emotional images. The new paradigm still assesses 3D content quality using self-motion, however, now vection inducing stimuli and emotional content are now joined together.

This STSM can be broken down into several specific objectives:

  • To continue the multisensory experiment that was designed in the previous STSM in June 2014;
  • To analyze the requirements, to design and to implement an auditorily induced vection experiment (either loudspeaker or binaural-technology based);
  • To continue the collaboration on topics related to 3DTV QoE in the frame of the COST Action 1105, by putting together the experience on research on cognitive and emotional processing of media content of the host institution (Barcelona Media – Innovation Center) and the extensive experience in 3D sound and emotion processing of the host institution (Decision, Emotion and Perception lab, Department of Behavioural Sciences and Learning, Linkoping University).

We also involved an external collaborator in the discussions of these experiments, Dr. Takeharu Seno, Faculty of Design, Institute for Advanced Study, Kyushu University, Japan, for his experiments on vection-emotion links. Dr. Valjamae also visited Dr. Seno for 3 weeks period during September 2014. One joint experiment on vection and emotion was jointly executed in Japan and the journal publication is now under the review.


Maciej Kurc, Time-of-Flight camera calibration and data processing

Period: November 3rd – December 14th, 2014
Home: Multimedia Telecommunications Group, Poznan University of Technology, Poznan (PL)
Host: Department of Signal Processing, Tampere University of Technology, Finland (FI)

The goal of the short term scientific mission was to investigate possibilities of calibrating Time-of-Flight (ToF) range sensing cameras in order to achieve full and correct 3D scene representation. Most challenging tasks covered calibration of distance measurement non-linearity and fusion of ToF distance data with visual data from video camera(s) in order to add color information to the 3D scene representation. Most attention was paid to a video + ToF acquisition system which consist of a single video camera along with a ToF camera mounted on a rigid camera rig.

During the STSM period several aspects of the topic were investigated:
– Estimation and correction of ToF measurement non-linearity.
– ToF camera extrinsic parameter estimation via Iterative Point Cloud
(ICP) method.
– Estimation of relative ToF and color camera positions for a color + depth camera system.

The proposed non-linearity estimation procedure consists of two steps.
During the first step, intrinsic parameters of the ToF camera are estimated using the well known Zhang’s algorithm and a calibration board. These parameters are necessary for determining camera to board relation which, in turn, is going to be used as a ground truth distance information. During the second step multiple images of the calibration pattern, placed in front of the camera at different distances, are taken. The knowledge of intrinsic parameters of the camera allows to estimate the precise position of each checkerboard corner in 3D space.
Finally, the computed Z coordinates (distances) of the checkerboard corners are compared to their measured correspondents. This allows to estimate the relation between measured distance and distance computed using the camera model. Once the relation is known, it can be inverted and applied as a transfer function to the measured distance.

Data provided by a ToF camera can be seen as a 3D point cloud. Point clouds, that come from different ToF cameras should resemble the same scene, even if the cameras differs in their characteristics. Relative positions of these two cameras can be found by aligning their point clouds. This can be done by the Iterative Closest Point (ICP) method.
However, in order for the ICP method to work a good starting point (initial rotation and translation) of one point cloud with respect to another is required. The idea is to derive a starting point for the ICP algorithm by using feature points that can be found in ToF infrared image. By finding correspondence between feature points on different images, it is possible to identify and match corresponding points in 3D scene. At least four feature point pairs are required to estimate relative rotation and translation between coordinate systems of the cameras.

When considering a single video plus single ToF camera system, a problem arises when the relative position of video camera is to be find for two different system placements. The main idea of the solution is to use feature point matching for images from video camera with guessed distances. The ToF camera is to be used as a distance guessing guide for feature point pairs. Each feature point pair is represented by a single feature point in 3D space of the reference video camera position. The X and Y coordinates are derived from feature point coordinates on image and known video camera intrinsic parameters. The Z coordinates are guessed randomly using depth map from the ToF camera using histogram of depth map as a probability distribution function. Feature points are then projected onto the second video camera position image and RMS reprojection error is computed. The relative camera rig rotation is known (from ToF camera relative parameters) and the relative translation is initialized to a guessed value provided by the user. The reprojection error is then minimized using the Levenberg-Marquardt optimization algorithm. As a result feature points Z coordinates and video camera relative translation are found. These parameters are defined up to a scaling factor.

During the relatively short STSM period, not all of possible research topics were addressed. For example: the problem of extrinsic parameter estimation in a video + ToF acquisition system was solved only partially. Many ideas have arose during investigation of the problem.
Therefore further collaboration between Poznan Univ. of Technology and Tampere Univ. of Technology is most likely to be expected.


Aleksander Väljamäe, Studying the emotional modulation of 3D sound-based enhancement of circular vection

Period: December 1st – December 15th, 2014
Home: DEP lab, Department of Behavioural Sciences and Learning, Linkoping University (SE)
Host: Perception and Cognition research group, Barcelona Media, Barcelona (ES)

The use of a clearly defined and measurable scenario when studying perceptual, emotional and cognitive cues contributing the overall experience from presented multisensory 3DTV content allows to assess individual contributions of these cues. One of such scenarios is an illusory self-motion (also referred to as vection), where one experiences a locomotion sensation relative to a “stable” surrounding environment (e.g. presented rotating visual scene). For example, Dr. Väljamäe previous work on rotational multisensory vection has shown that spatial sound enhances visually induced vection. Importantly, the quality of spatial sound and its impact depended on visual stimuli qualities, for example, smaller field of view increase 3D sound influence.

The driving hypothesis of this STSM has been inspired by two recent studies by Prof. Takeharu Seno and his team that showed interaction between experienced visually induced vection sensation and emotional experience. Given that auditory processing has a strongest links to emotional processing, we want to examine: a) how emotional state of users can influence audio-visually induced vection in a horizontal plane; and b) whether emotions can modulate the 3D sound-based enhancement of visually-induced rotation observed before (testing different resolution of spatial sound).

The aim of the present STSM was to prepare and start a series of experiments exploring the emotional modulation of 3D sound-based enhancement of circular vection through subjective, behavioral and peripheral physiology measures. This objective can be broken down into  more specific objectives:

(a)    To analyze the requirements to implement a series of experiments studying interaction effects between audio-visual quality of vection inducing 3D scenes and viewers’ emotional state;

(b)   To design and implement the first experiment that allows exploring this research question;

(c)    To establish the base of future collaboration on topics related to 3DTV QoE and the possible effect of emotional modulation of 3D sound-based enhancement of vection in the frame of the COST Action 1105, by putting together the experience on research on cognitive and emotional processing of media content of the host institution (Barcelona Media, Dr. Pereda group) and the extensive experience in 3D sound and emotion processing of the host institution (Decision, Emotion and Perception lab, Department of Behavioural Sciences and Learning, Linkoping University).

During the STSM, we pre-tested and refined the experimental methodology, specifically visual and auditory stimuli. Pre-tests showed that there are differences between negative, positive and neutral images embedded into the rotating visual stimuli. The experiment has been programmed in PsychoPy and linked with Biopac data acquisition system for physiological signals recording. Currently we are collecting data (over 20 participants) and designing the second experiment involving emotional spatial sound. We also involved an external collaborator, Prof. Takeharu Seno, Faculty of Design, Institute for Advanced Study, Kyushu University, Japan, in the discussions on the experimental design given his previous experiments addressing vection-emotion links.


Luis Lucas, Multiview video codec design

Period: May 17th – June 17th, 2014
Home: Instituto de Telecomunicações, Leiria (PT)
Host: Multimedia Telecommunications Group, Poznan University of Technology, Poznan (PL)

In the last months, a depth map coding algorithm was proposed by the applicant. During this period, some contacts with the research team of Multimedia Telecommunications Group at Poznan University of Technology have been done, aiming a joint collaborative work to improve and evaluate the algorithm coding performance. In this context, the main purpose of this one month STSM was to strengthen the existing collaborative work on depth map coding for efficient compression of 3D/multiview video content.

The proposed algorithm, denominated Predictive Depth Coding (PDC), was specifically developed to efficiently exploit the characteristics of depth maps, mostly composed by smooth areas delimited by sharp edges. At its core, PDC involves a sophisticated intra prediction framework and a straightforward residue coding method, combined with an optimized flexible block partitioning scheme. The performance of PDC was evaluated based on the quality of the synthesized views using the encoded depth maps and original texture views. The results show a higher rate-distortion efficiency of the PDC algorithm over the current state-of-the-art depth map coding solution used by the 3D extension of the High Efficiency Video Coding (3D-HEVC) standard. Furthermore, an average reduction of 25% in computational complexity was observed over the 3D-HEVC standard, for depth map coding only.

During this STSM, a scientific paper about PDC was finished and submitted in collaboration with M.Sc.Eng. Krzysztof Wegner from the host institution. A contribution document for the next MPEG meeting in Sapporo, Japan at July 2014, was also initiated.

A seminar about PDC algorithm was given to the research members of the Poznan University of Technology that were not familiarized with this algorithm. Several technical discussions about potential improvements to PDC algorithm were also conducted with M.Sc.Eng. Krzysztof Wegner, who has a deep knowledge on depth map coding techniques and on the 3D-HEVC standard. From these discussions it was decided to investigate the view synthesis optimization method (VSO) for PDC algorithm, initiating several implementations of this technique. Since VSO method was originally proposed by the host institution researchers to the 3D-HEVC standard, the applicant had the opportunity to learn a lot with the technical discussions on VSO and better understand the impact of VSO in the PDC algorithm. The research on VSO in the PDC algorithm is still being conducted and preliminary results are being evaluated.

The investigation of the PDC techniques for the 3D-HEVC standard was also discussed during this STSM. From these discussions additional future work was defined, specifically the implementation of two new techniques in 3D-HEVC standard. These techniques are based on the constrained depth modeling mode and an adaptive use of intra directional modes, originally proposed for the PDC algorithm.

Overall, this STSM was important to strengthen the collaborative research work that has been previously initiated between both home and host institutions, as well as to plan future joint research works.


Antoine Dricot, Subjective evaluation of Super Multi-View (SMV) compressed contents on high-end 3D displays

Period: May 26th – June 13th, 2014
Home: Orange Labs Guyancourt (FR)
Host: Holografika, Budapest (HU)

Efficient video compression of SMV (Super Multi View Video) contents is a key factor for enabling future 3D video services. The in-depth understanding of the interactions between video compression and display is of prime interest. Orange Labs has a deep expertize in video compression, as a major actor in the recent HEVC standardization phase. Holografika provides high-end TV sets able to display SMV contents with high immersion. As a consequence, two main goals are foreseen for this STSM:
1. Assess the impact of compression on perceived quality for light field video content.
2. Understand the visual artefacts related to the compression of lightfield video content and lightfield display systems.

Experiments and goals:

1.1. Select efficient coding configurations:
Different encoding configurations lead to very different bitrates/quality ratios. Before proceeding in heavy experimentations and quality evaluations, it is required to select the most efficient coding configuration(s) from a subjective point of view. Basic coding configuration will be tested, where the GOP structure and/or the number of synthesized views will vary.

1.2. Estimate the range of bitrates acceptable for future light field video content services:
The visualization on Holografika’s displays of SMV contents compressed at different bitrates (with configurations obtained from 1.1) will allow to assess a range of bitrates that provides an “acceptable” quality for lightfield content displays.

1.3. Assess the “level of relevance” of PSNR for lightfield video contents:
For 3D, depth based rendering and synthesized views makes the PSNR less relevant than for 2D. The goal is to evaluate how much the use of the PSNR remains relevant for SMV content, when compared to subjective evaluation results.

1.4. Assess the impact of view synthesis on the general quality:
For typical 3D contents, the average quality is not an average of the texture and synthesized views qualities. It is useful to understand how much synthesized views disturb the general quality evaluation (both subjectively and objectively).

1.5. Understand the JNDs (Just Noticeable Difference) and transparency level:
It is of interest to try to define what bitrate changes are perceptible or not, and if the PSNR still can be used to reflect some levels of coding improvements.
In addition, defining a “transparency point” corresponding to the lowest bitrate that provides similar perceived quality as the uncompressed sequence is relevant for future video compression studies.

2. List artefacts specific to compressed lightfield content and displays, refine the methodology for lightfield content subjective quality evaluation:
The goal is to list possible new compression artefacts that affect the specific aspects of visualization of lightfield content like the motion parallax, the perception of depth, etc.

These experiments will give a cue on the feasibility of transmitting lightfield content on future networks and on the percentage of bitrate decrease that has to be targeted for the next SMV codec generation.


Hana Khamfroush, Network Coded Cooperation for Time-Sensitive Content

Period: May 1st – July 31th, 2014
Home: Porto University – IT LAB (PT)
Host: Aalborg University, Department of Electronic Systems (DK)

This STSM is carried out in the Future Vision and Mobile Devices group at the Department of Electronic systems of Aalborg University (AAU), Denmark hosted by Prof. Frank Fitzek and Prof. Daniel Lucani. This research group has been very active in the area of multimedia streaming and communication, particularly using network coding techniques for this purpose. The AUU team has also developed a flexible C++ network coding library to design and implement network coded cooperative algorithms and protocols for multimedia broadcast/multicast services (MBMS). These techniques could help to decrease the cost of multisession transmissions, which is one of the main scenarios of video streaming and 3D video in the future.

This STSM aims at the extension of the research work carried out by the beneficiary, Hana  Khamfroush, in the last years of her PhD at the institute de Telecomunicações, Portugal. Her research focuses on the optimal use of network coding for cooperative communications in multicast services, specifically addressing practical heuristics that use Random Linear Network Coding (RLNC) for decreasing the cost of packet transmission in such wireless scenarios. The goal of the STSM is to study and develop network coding approaches for delivering time-sensitive content exploiting cooperation in mobile environments. More specifically, we will focus on using cooperative network coding schemes to increase the QoS of 3D stereoscopic video streaming.

The proposed STSM shall have the following specific goals:

I) Strengthen collaboration between the beneficiary and the researchers of the host institution in order to exchange their knowledge in the area of network coded cooperative communication for multiple-user streaming scenarios.

II) Implement some of the heuristics proposed by the beneficiary for the aforementioned multicast scenarios in AAU’s wireless test-bed consisting of 60 Raspberry pi nodes that is available in the host institute. In order to implement network coding, the beneficiary is expected to use the KODO C++ network coding library that was developed by AAU researchers to speed up the process of the implementation.

III) Write a joint research paper with AAU team including the results of the performance analysis of the proposed heuristics in a real environment.


Janko Calic, QoE aspects of 2D to 3D conversion of photos

Period: April 5th – 20th, 2014
Home: I-Lab, CVSSP, University of Surrey (UK)
Host: Alinari Photographic Archive, Florence (IT)

This STSM addressed challenges of using the photographic data in 3D multimedia systems, and is a continuation of the coordinated research activities between the Centre for Vision, Speech and Signal Processing at the University of Surrey and Fratelli Alinari. Founded in Florence in 1852, Fratelli Alinari is the oldest firm in the world working in the field of photography, the image and communication. In addition to the coordinated research efforts, this STSM coordinated efforts of the Joint Task Force between 3DConTourNet and Qualinet COST actions.

This STSM addressed challenges of utilising 2D photos in 3D multimedia systems. The adopted methodology was user-centric, where users were chosen from a range of stakeholders involved in capture, creation and exploitation of visual media, both 2D and 3D.

Initially, a survey of existing technologies for 2D to 3D conversion has been conducted. During the visit, the focus of the investigation has widened to a broader role of photography in 3D multimedia.

Following the technology review, a user requirement study into practices and applications of photography in 3D multimedia systems was conducted. It has been implemented as a series of informal interview with Alinari staff and clients. These interviews have informed the final focus of the research to exploitation of photography, especially archived heritage photography, in creation of new 3D content. After the analysis of stakeholder requirements, a narrow focus on the objective evaluation of QoE of 2D to 3D conversion systems was widened to developing challenges and opportunities of the photographic data in 3D multimedia systems.

A reference dataset was created, focusing on synthesis of photo-realistic 3D models from 2D photographs.

Finally, a number of potential ideas and for future project proposals have been developed. These range from hybrid synthesis of historical and present-time photos to FET Horizon 2020 scheme, 3D Cultural heritage to Reflective 7a part of the Horizon 2020 programme, as well as some knowledge transfer ideas closer to the market.


Mitra DamghanianEvaluation of the plenoptic camera using a model based approach

Period: November 15th – December 15th, 2013
Home: Department of Information and Communication Systems, Mid Sweden University (SE)
Host: Raytrix GmbH, Kiel (DE)

Plenoptic cameras are bringing new possibilities to the area of image capturing with the aid of computational photography techniques and the increase in computational power. Evaluation of the complex capturing systems as plenoptic cameras is challenging considering that traditional measurements used to quantify for example captured image spatial resolution are ill suited when it comes to evaluate the spatial resolution rendered by a computational camera. This STSM is focused on evaluation of the plenoptic cameras using a model based approach as the key point.

Performance analysis of the plenoptic cameras is necessary to illustrate the capabilities and limitations of the available plenoptic camera implementations. Among different methodologies, proper model based approaches with desired accuracy and allowed complexity levels are sought-after to provide a reliable and straightforward means for the performance analysis of the plenoptic cameras. Model based approaches will ease and in some cases skip the necessity of the practical measurements, which are elaborate though costly solution for evaluating the plenoptic cameras performance in many applications.

The aim of this STSM is to foster collaboration and strengthen existing networks as well as to exchange the knowledge and expertise between MIUN (Sweden) and Raytrix GmbH (Germany). The participating institutions have been very active in the field of plenoptic cameras, Raytrix as the pioneering developer and producer and MIUN in developing the knowledge and providing research and education material in this field.

On the technical side, camera evaluation regarding high level camera performance parameters such as spatial resolution and depth resolution is conducted using the model based and the empirical approaches. The plenoptic camera R29 is chosen as the state of the art product and the measurements and model based results will both look into the resolution terms as a function of x, y and z parameters in the object space. Comparing the empirical and the model based results will help tuning the model as well as defining the validity range of the model.

This STSM provides the opportunity to perform a detailed study over the state of the art plenoptic camera technology and offers the chance to apply a theoretical knowledge on a very interesting practical case. The main outcome will be a better insight on important plenoptic camera characteristics such as resolution terms, depth of field, and efficient usage of the captured plenoptic data in the reconstruction stage. Possible publication(s) are also expected as the academic results of this STSM.



Sebastian SchwarzTime-of-Flight depth sensing for 3D content generation 

Period: November3rd – December 13th, 2013
Home: Department of Information and Communication Systems, Mid Sweden University (SE)
Host: Institute of Signal Processing, Tampere University of Technology (FI)

Many image processing applications in manufacturing, security and surveillance, product quality control, robotic navigation, or three-dimensional (3D) media entertainment rely on accurate scenery depth data. Acquiring this depth information is a fundamental task in computer vision, yet complex and error-prone. Dedicated range sensors, such as the Time-of-Flight camera (ToF), can simplify the scene depth capture process and overcome short-comings of traditional solutions.
Stereo analysis is a common approach to scene depth extraction. Feature and area analysis between two camera views allows for the reconstruction of depth information based on the camera geometry. However, if parts of the scenery are occluded in one view or areas have low or repetitive texture, stereo matching produces erroneous results. Other depth capture solutions, such as structural lighting, e.g. the Microsoft Kinect, can provide reliable scene depth in such cases, but has a limited depth range and suffers from strong inaccuracies at object boundaries. Modern ToF cameras can overcome most of these shortcomings. They capture scene depth in real-time, independent from texture structure and occlusions. However, they can only deliver limited spatial resolution and suffer from sensor noise.
In this STSM we combine the ToF expertise within 3D-ConTourNet for a collaborative depth sensing evaluation over a wide range of modern ToF sensors. The participating institutions are highly active in the fields of 3D content creation, i.e. time-of-flight (ToF) capture. On the technical side, we join the different ToF equipment available in Tampere and Sundsvall and analyze individual characteristics on a common test bed. We hope to gain new insights on important ToF characteristics such as sensor noise, depth precision or operating range, and how the different camera models are affected. To our knowledge, there has been no such analysis yet and such an evaluation over a wide range of state-of-the-art ToF cameras would be an important guideline for future 3D content creation. Furthermore we aim at creating a series of ToF test sequences and make them publicly available. These kinds of sequences are very important to compare inter-institute research and could become a popular tool to evaluate different ToF related research, e.g. noise filtering, depth upscaling or sensor fusion.
The STSM strengthens the 3D-ConTourNet collaboration focused on 3D content generation (WG1). We hope to extend our joint efforts further to other 3D-ConTourNet working groups, such as 3D media coding (WG2), 3D quality of experience evaluation (WG4) and next generation 3D multimedia (WG6), e.g. plenoptic imaging.
Concluding, we see the highest benefit of this STSM in the personal exchange between host & guest: Working together and sharing our different expertise, experience new approaches, discuss different ideas, and search for novel solutions in a less familiar research environment. This can only benefit the creativity of our research.



Emil DumicObjective 3D metrics research

Period: October 1st – October 31st, 2013
Home: University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb (HR)
Host: Instituto de Telecomunicações, Universidade de Coimbra, Coimbra (PT)

The STSM was conducted in a period of one month, and was hosted by the Instituto de Telecomunicações, Universidade de Coimbra, Coimbra, Portugal. During the mission, different ideas were exchanged between researchers in home and host institution, giving the basis of future collaborations on objective and subjective metrics research. Also, different equipment from host institution was presented to the guest researcher.
The main objectives of the short term mission focused on the study and development of full, reduced and no reference objective 3DTV quality metrics, as well as subjective assessment on existing databases. Multiresolution analysis based techniques were also tested. Although several quite good image and 2D video quality measures have been developed, true 3D video quality measures have been far less researched and proposed, especially for larger combinations of different degradation types. Different degradation types, including frame freezing, compression artifacts and depth map losses, were tested using several objective measures and several databases. New no-reference measures were proposed for specific degradation types (frame-freezing, depth map losses), as well as incorporating them in other objective metrics.
During the STSM, following tasks and activities were performed:
I. Comparison of objective quality measures on the 3D Nantes (NAMA3DS1-COSPAD1) database;
II. Experimentations with no-reference objective quality measures for frame freeze degradations;
III. Comparison of objective quality measures and their correlations with the subjective assessment values of impaired 3D video sequences, obtained from the University of Coimbra;
IV. Identification and description of research problems to be explored under future collaboration efforts;
V. Planning of publication in written form of the STSM’s results.



Pedro CorreiaAdvanced MDC techniques for 3D and multiview video coding

Period: June 23rd – July 7th, 2013
Home: Instituto de Telecomunicacoes, Coimbra (PT)
Host: Signal Processing Laboratory-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro (BR)

This STSM was carried out in the Signal Processing Laboratory-COPPE/Federal University of Rio de Janeiro, Brasil, hosted by Professor Eduardo da Silva. This laboratory  has been quite active in the signal processing area, particularly in image and video coding techniques, development of several high efficient image and video compression techniques, based on novel algorithms. Such algorithms present an alternative to transform-based encoders achieving a rate-distortion performance competitive with that from hybrid video codecs, including the H.264/AVC standard. These techniques can be useful for future compression standards, namely in 3D and multiview video coding.

This STSM extended the research work carried out by the beneficiary, Pedro Correia, in the last few years at the Instituto de Telecomunicacoes, Portugal. The field of research is focused on Multiple Description for Advanced Video Coding (MDC), specifically addressing MDC coding efficiency, error resilience and rate control methods. New 3D-MDC methods for resilient video transmission over future media networks have been investigated in order to be developed in the near future.

The present STSM addressed the following main areas:

(i) dissemination and knowledge exchange between the benefeciary and researchers of the host institution in fields related to advanced MDC techniques and new paradigms in video coding algorithms. This is also intended to increase mutual awareness about recent developments and results achieved by the researchers from both sides.

(ii) technical meetings with researchers of the host institution with the aim of defining possible approaches to extend current MDC techniques to 3D and multiview video coding. Possible definition of a research direction for multiple description scalar quantisation (MDSQ) with application in the forthcoming HEVC multiview extension is included in this objective.

(iii) investigation and definition of viable approaches to design MDC rate adaptation schemes for 3D and mutiview coded video subject to perceptually-driven factors. This includes multiples description splitting schemes and unbalanced MDC operating directly on 3D video coded streams using the HEVC extension.



Miguel Barreda ÁngelesExploring the effects of visual discomfort on 3DTV viewers’ emotional experience

Period: June 17th – July 1st, 2013
Home: Barcelona Media – Innovation Center, Barcelona (ES)
Host: Ecole Polytechnique de l’Université Nantes, Nantes (FR)

The general aim of this STSM was to initiate an exploration of how visual discomfort can affect the emotional experience of viewers’ while watching 3DTV. Much of the research on 3DTV Quality of Experience has dealt with the effects of image features over users’ visual discomfort. Nevertheless, the relationships between visual discomfort and other high-level factors such as the viewers’ emotional reaction have not been yet systematically explored.  Since emotions play a key role on the media reception processes, and especially in entertainment contents consumption, this question deserves to be addressed.  Thus, the specific objectives of the STSM were: (a) To analyze the technical and experimental requirements needed to implement an experiment on the effects of visual discomfort on viewers’ emotions; (b) To design and implement an experimental design allowing to explore the interaction between comfort and emotional factors; and (c) To establish the basis of future collaborations on this research topic between the home and host institutions.

The STSM was conducted in a period of two weeks on June 2013, and was hosted by the Image and Video Communication Research Group at the Institut de recherche en Communications et Cybernetique (University of Nantes, France). The work carried out during the STSM focused on the following tasks. First, a review of literature on ways to elicit and measure emotions during television watching was conducted. Then, we described an experimental design, in which participants would watch a series of contents while some self-reported and psychophysiological measures of emotions are registered. The proposed experiment is a mixed within-subjects design including two independent variables (calm/arousing contents) and three levels of visual discomfort (low/middle/high). Some short arousing sequences were selected and extracted from commercial stereoscopic movies, while calm sequences were obtained from a publicly available database. Different types of distortions (parallax increase, Gaussian blur, video coding) were introduced to the contents, to manipulate the levels of visual discomfort experienced by users. Finally, methods to measure emotional reactions were selected, including: self-reported questionnaires (SAM), peripheral psychophysiological measures (electrodermal activity, heart rate, and facial electromyography) and brain activity measures (EEG, ERP). Available applications to record and analyze these measurements were explored, obtained and adapted to the particular needs of the proposed experiment.

As a result of this work, we dispose of a first approach to an experimental design, a set of materials and a measurement and analysis protocol and tools, which will allow conducting the experimental sessions with participants during next months. Further collaboration between the host and home institution will include the analysis of the resulting data, and may also allow exploring different aspects of this research topic. Furthermore, the possibility of start the creation of emotional stereoscopic database is also been considered.



Caroline ContiApplication of High-Efficiency Video Coding on Light-Field 3D Video Content

Period: June 3rd – June 28th, 2013
Home: ISCTE – University Institute of Lisbon, Lisbon (PT)
Host: Holografika, Budapest (HU)

One of the main challenges for the 3D light-field imaging approach to provide 3D content with convenient resolution lies in the massive amount of visual information involved. For instance, as opposed to transmitting 2 views, as seen in current 3DTV systems, Holografika’s displays require the equivalent of 100+ views as input. Consequently, adequate coding tools are essential for efficient transmission and storage of this large amount of data involved. In this context, the main purpose of this STSM was to exchange experiences and knowledge in different 3D light-field video representation and compression (namely, 3D holoscopic and Holografika’s 3D lightfield contents) as well as to study and evaluate alternative coding tools based on the High Efficient Video Coding (H.265/MPEG-HEVC) for efficient compression of Holografika’s 3D light field video content.

During the stay at Holografika Kft, Caroline Conti had the opportunity to be acquainted with the light field technology used by Holografika as well as to better understand the requirements for efficient compression of this type of 3D video content. She presented to the host researchers and engineers the state-of-the-art in 3D video coding technologies and a discussion was conducted with Tibor Balogh and the research and development group so as to propose possible coding solutions to be studied and evaluated. From this discussion, it was decided to make use of the multiview video (MVV) representation plus depth/disparity information and to explore the multiview geometry, notably, the geometrical relation between the disparity vectors in different views considering a parallel camera setup.

Two coding schemes were proposed and implemented into the HEVC. In the first scheme, the motion estimation process is replaced by a direct 3D geometric-based disparity vector calculation. In this derivation, a disparity vector is derived from the depth map for each 4× 4 texture block. Then, when coding an enhancement view, all the prediction block (PB) partition patterns existing in an Inter coded frame of HEVC were enabled. The goal was to analyze and have a better feeling of how far the disparity vector calculation is comparable in terms of rate-distortion gains with the motion estimation process. In the second scheme, the disparity information is only coded and transmitted for the Intra coded base view. For this, disparity vectors are included in the prediction information of the Intra coded frames. For the Inter coded views, the disparity information is derived from the disparity map of the base view by using multiview geometry. Therefore, an early Skip mode is used for blocks where a valid disparity vector is derived. On the other hand, for the occluded missing blocks (i.e., where there isn’t a calculated disparity vector, since these blocks are in occluded areas) all prediction modes existing in an Inter coded frame of HEVC are allowed. Preliminary results showed that the second coding scheme is more advantageous, as it exhibits less residual information, notably in occluded areas. Moreover, alternatives to improve the performance of the second coding scheme were also discussed and stated as future work.

Overall, the work done under the scope of this STSM was interesting and useful for both the Applicant and the Host, since knowledge and new ideas were exchanged as well as future collaborations were discussed.



Emilie Bosc3D Quality of Experience: a focus on visual discomfort

Period: June 2nd – June 8th, 2013
Home: Ecole Polytechnique de l’université Nantes, Nantes (FR)
Host: Università degli Studi Roma Tre – Engineering Department, Rome (IT)

The success of 3D video applications, namely 3D Television (3D TV) or Free viewpoint Video (FVV), depends on 3D systems ability to provide high quality content. 3D video thus involves considerable task forces in order to develop reliable assessment tools. A known issue is that of the visual discomfort occurrence in stereoscopic sequences. This short term mission aims at investigating the factors involving visual discomfort in stereoscopic video sequences. This mission is hosted by the Telecommunication Lab, COMLAB, University of Roma Tre.

We plan to focus on binocular rivalry related factors in order to improve objective assessment tools of stereoscopic contents. For the past months, we already  started exploring the assessment of visual discomfort in stereoscopic content through the consideration of factors such as disparity, planar and in-depth motion in stereoscopic sequences. Considering additional factors implying visual discomfort such as binocular rivalry related factors is expected to improve objective assessment tools of stereoscopic sequences.

To do this, the determination of typical binocular rivalry conditions is necessary for a correct modeling. We should rely on existing subjective database for the determination of typical binocular rivalry conditions. For the second concern, experiments should be set up before the face-to-face meeting which aims at facilitate the scientific discussions and the analysis of our results. In other words,  we plan to include the binocular rivalry control in an existing visual discomfort model (already including disparity, planar and in-depth motion).

The benefit of our collaboration is expected to rely on a better understanding of the human vision system in stereoscopic viewing, new proposals for objective quality assessment tools for stereoscopic content.



Klemen PeternelContextual aspects of Quality of Experience (QoE)

Period: March 1st – March 29th, 2013
Home: Faculty of Electrical Engineering, University of Ljubljana, Ljubljana (SI)
Host: Tampere University of Technology, Tampere (FI)

Klemen Peternel has spent one month (1.3.2013 – 29.3.2013) at Tampere University of Technology (Finland) within a short-term-scientific mission under the supervision of Dr. Satu Jumisko-Pyykkö. The aim of this STSM was to develop the contextual data sets needed for giving full description of context from human perspective. The work covers the modeling of context, methodological development of methods to combine sensor data with subjective data, and experimentation in the topic. While this still being an early stage research they have continued to build solid foundation for our future work through the following steps:

1. Research on related existing work, focusing on user context within various scenarios of content consumption and well-being (e-Health).
2. Identification of relevant data types based on most studied components of context.
3. Implementation of defined data sources.
4. Definition of experimental plan.

During one month of this STSM they have investigated and prepared tools that will allow them to continue with their research in a way of performing experiments with participants. First results will be available in the autumn 2013. They will be followed by analysis of data obtained in the experiment. They have additionally defined a broad spectrum of opportunities for future collaboration, which also includes a return visit from dr. Jumisko-Pyykkö. The overall experience on visit of Klemen Peternel was overwhelming and satisfied all his prior expectations. Dr. Jumisko-Pyykkö was a great host and made him feel very welcome. After this exchange Klemen Peternel expressed that he can understand even better how important it is to participate in programs like STSM.



Nelson FranciscoVideo codec design and assessment collaboration

Period: January 22nd – February 19th, 2013
Home: Instituto de Telecomunicações, Leiria (PT)
Host: Multimedia Telecommunications Group, Poznan University of Technology, Poznan (PL)

The main objectives of this STSM were not only to introduce some of the compression techniques developed in the home institution, the Institute of Telecommunications – Portugal, to the host institution researchers, but also to learn with their expertise in areas such as multiview video compression, acquisition and quality assessment.

In the last few years, researchers from Instituto de Telecomunicações have developed several pattern matching-based image and video non-standard coding algorithms, able to outperform the compression efficiency of state-of-the-art transform-based codecs, including H.264/AVC. This STSM was an opportunity to disseminate and discuss those innovative techniques, in order to obtain some feedback and contributions from a highly skilled research staff, which have been collaborating with the MPEG group in the development of the latest video compression standards.

For that purpose, two seminars were conducted during this STSM, focused on the Multidimensional Multiscale Parser (MMP) algorithms and its wide range of applications for data compression. These seminars were opened not only to the host institution researchers, but also to students from the Poznan University of Technology. Several technical meetings with the researchers from the host institution complemented these activities, contributing with some innovative proposals that allowed further increasing in the performance of some of the developed MMP-based video codecs.

This STSM was also an opportunity to learn some 3D video signals quality assessment and acquisition techniques. This included an introduction to the operation and setup of the multi-camera array developed in the host institution, which contributed with several 3D test sequences to the research community.

As a result of this STSM, a new collaboration was started between both institutions, aiming the development of a new depth map encoding algorithm for multiview video coding applications. This collaboration is expected to take advantage of the existing knowledge from the host institution staff regarding the 3D-HTM reference software, with the new still image depth map encoding approach developed at the home institution. This new technique, based on flexible hierarchical prediction and quantization was already evaluated for still images, presenting promising results. Its integration on the 3D-HTM reference software will allow to further evaluate its performance for multiview video sequences compression. The result from such collaboration will be considered either to be submitted as a 3D-HEVC norm proposal, or in alternative to a peer-reviewed journal or high impact conference.