When researchers attend an international scientific conference, they often miss interesting presentations held in parallel and don’t always have time to read all of the interesting papers. The main motivations behind our webcasting company, Klewel, were to go beyond text-based conference proceedings to offer a user-friendly way to browse and replay presentations, access slides, and search for specific information within a recording. We wanted to automate slide exchange and provide continued access to conference presentations. As the project got under way, the team came together, and eventually an innovative webcasting solution saw the light of day. (For more information on Klewel and its product offering, see the related description below).
But how exactly did we move from research to technology?
Creating the Right Environment
Everything started in 2002 at the Idiap Research Institute, when Hervé Bourlard —the Idiap Director— attracted several major projects, funded by the Swiss National Science Foundation and the European Commission. As a result, many young and enthusiastic multimedia developers and researchers (including three of us— Jean-Marc Odobez, Alessandro Vinciarelli, and Maël Guillemot) joined Idiap and formed a highly multidisciplinary research environment.
As part of a 12-year project funded by the Swiss National Science Foundation, Bourlard defined two main lines of focus:
- develop computer systems capable of understanding humans, their behavior, and their interactions through the multimodal analysis of signals captured with multiple sensors; and
- develop systems and infrastructure for effective access to large repositories of data resulting from human interactions and activities (such as meetings and presentations).
The project, called IM2 (Interactive Multimodal Information Management), was supported through the Swiss National Centre of Competence in Research.
An important outcome of IM2 was the Smart Meeting Room (SMR), a facility that could host, record, index, and store meetings involving up to four people (Ref 1). In particular, the SMR allowed Idiap to collect numerous meeting recordings that, through fully automatic processes, were first semantically struc- tured and then made browsable and searchable via advanced computer interfaces. Given that meetings take a large fraction of many people’s work time, and given that meeting content must be analyzed to capitalize on enterprise knowledge, the European Commission further supported related research activities through projects such as those that developed the Multi-Modal Meeting Manager and Augmented Multi-party Interaction.
It is in this scientific environment that our attention was directed toward oral presentations in which one speaker addresses the audience with the help of projected slides. We observed that such presentations were not only a natural form of communication and an efficient way to convey information but also sufficiently structured and organized, making them suitable for effective computer processing. These simple observations were the first step toward the development of Klewel’s core technologies.
The next steps required multidisciplinary research, real-world prototyping and testing, and collaboration. We also had to find ways to understand client needs and then align those needs with research and development activities.
Klewel
Klewel offers a comprehensive webcasting solution (referred to as Triskel), including an audio-visual capture station (see Figure A) linked to an online hosting, editing, publishing, and browsing platform. The latter automatically processes recorded audio and video signals: text is automatically recognized and indexed from slide images, transcripts are extracted from audio files, and the slides are synchronized with the presentation.
The platform offers access to recorded content ubiquitously on desktop computers, mobile phones, and tablets. Video owners can hide slides and trim the beginning and end of a recording in a user-friendly and reversible way. As soon as editing actions are performed, users get a preview on the desktop version and the resulting mobile-specific videos are automatically regenerated. Triskel’s user-friendly interface lets customers independently manage the flow from the capture to the online distribution, without needing a degree in media editing.
Following an agile development methodology, the Triskel product is continuously enhanced and tailored according to client needs. Triskel strengths lie in its fast online distribution mechanism, making recording content auto- matically available, only a few hours following the recorded event. Clients can also purchase Klewel’s capture station, and record their events independently, while relying on Klewel’s Web services for automatic processing and content publishing. Klewel’s target clients include audio-video agencies, companies organizing private and public conferences, international organizations and associations, universities interested in webcasting their lectures, as well as research labs or training agencies specialized in blended learning.
Multidisciplinary Research
The three of us who joined Idiap eventually became Klewel co-founders, but originally, we were working individually on various technologies. Once those technologies were integrated into a single system, they gave origin to the Presentation Acquisition System (PAS), the earliest prototype of Klewel’s commercial solution.
Guillemot was investigating approaches for segmenting videos into semantically coherent segments, possibly organized hierarchically, that appeared to be ideal to structure the sequence of slides, videos, and animation effects used in presentation slides (Ref 2–4). In parallel, Odobez was developing an approach to auto- matically extract text from pictures, which proved to be particularly effective on projected slide images (Ref 5). Finally, Vinciarelli was working on technologies for the indexing and retrieval of noisy texts —that is, digital texts that include errors resulting from automatic transcription processes (Ref 6) — Automatic Speech Recognition, for example. Such a component was appropriately applied to help index and search the text that was extracted from the slides and automatically transcribed.
Figure 1. The conceptual scheme of the Presentation Acquisition System highlights the importance of multidisciplinary research. PAS combines Maël Guillemot’s research into segmenting videos, Jean-Marc Odobez’s approach to automatically extracting text from pictures, and Alessandro Vinciarelli’s study of technologies for indexing and retrieving noisy texts.
Figure 1 shows how we integrated these three technologies into the PAS (Ref 7) The idea was simple yet effective: first, segment the video stream that a computer sends to the projector into intervals corresponding to individual slides using video-shot-change technology (Ref 2). Then, convert each slide’s video segment into a digital (noisy) text using an image/text character- recognition system (Ref 5) Finally, index the text resulting from the previous steps, making it retrievable using noisy-text categorization and information retrieval techniques (Ref 6,8). In this way, there was no need to ask speakers for their slides, and slides (or anything else projected through the beamer) could become searchable access points for the audio-visual presentation content.
Figure 2. The early search interface for PAS. Early prototyping and testing in real-usage contexts is crucial when it comes to improving core technologies.
Klewel thus strongly relied on collaboration with relevant multidisciplinary research part- ners in developing and improving its core technologies, based on the latest research outcomes. Yet just as important was early prototyping and testing in real usage contexts; connecting with potential clients and promoting the prototype were crucial to Klewel’s development.
Real-World Prototyping and Testing
From the beginning, even before the idea of creating a company was concretized, the three co-founders seized every opportunity to test the early prototype (PAS) in real situations and different contexts (see Figure 2). The team recorded and experimented with different events, both small and large—such as Idiap weekly inter- nal talks and locally organized conferences, including Machine Learning for Multimodal Interaction (MLMI 2004). Then came the opportunity to webcast the ACM UIST 2006 in Switzerland. ACM CHI 2007 was the next recorded conference, held in San Jose, California. The early system was used in four parallel sessions over the four conference days.
After CHI 2007, we three co-founders decided to create the company Klewel, with the administrative support of The Ark Swiss Foundation. Then, in 2009, following experiments with the first research prototype, Klewel developed its standalone player and presentation browser, showing recorded content along with synchronized slides. By clicking on a slide, users could jump to the corresponding recording instance and search the recording slides and transcript for a particular word. This initial solution started as a Master’s degree project conducted by Vincent Bozzo, who has been leading Klewel’s technical development.
Following this first commercial solution, Triskel v1.0 was deployed. It consists of a comprehensive webcasting solution, including an audio-visual capture station communicating with an automatic back-end processing system and online editing and publishing platform.
Figure 3. Klewel’s public conference webcast portal. Its webcasting solution, Triskel, initially centered on each recorded webcast, without the ability to search across different talks or webcasts sharing similar topics. Klewel’s agile development team continues to improve Triskel based on expressed client needs.
Collaboration
Active collaboration with relevant research partners let us exploit cutting-edge research to improve Triskel’s core technologies (including its optical character recognition; recommenda- tion and speech recognition algorithm; and indexing, search, and innovative visualization techniques). Relevant research projects are par- ticularly important for small companies that cannot afford to have their own research and development team.
Klewel strongly contributed to the inEvent European project — a project that developed a plethora of visual and audio analytics algorithms callable via Web services. In the context of inEvent, Klewel experimented with new visualization paradigms to navigate across recordings. The evaluation of these innovative paradigms was conducted by HCI experts from the project consortium.
In the context of a national research subproject (IM2 Automatic Recommendation of Lectures and Snippets), Klewel benefited from the expertise of one Idiap research team led by Andrei Popescu-Belis. The team helped develop a recommender system able to suggest complete talks as well as talk snippets based on content-based similarity.
Addressing Client Needs
As its solution was adopted by more clients, Klewel invested in further development to enhance the solution and develop new features based on client needs. Klewel’s agile development team continues to improve Triskel’s online platform, based on expressed client needs. An automatic talk recommendation service and a fully scal- able search engine were integrated to let users explore transcribed audio content, textual slide representation, and metadata.
Today, we have a cutting-edge cloud-based solution that is complete and simple to use and built incrementally. Development continues based on constructive partnership with our cli- ents, to best serve their evolving needs.
Figure 4. The interface of the Klewel product.
Triskel’s webcasting solution initially centered on each recorded webcast, without the ability to search across different talks or webcasts sharing similar topics. Although this model served the needs of those hosting private recordings and webcasts, a more sophisticated solution was needed to satisfy clients hosting public events and seeking to improve visibility and knowledge reuse. A new Web-based portal is now available online (www.conference-webcast.com), where users can search for a keyword within a talk or across talks of the same or dif- ferent public webcasts. This interface is responsive and accessible from any devices (see Figure 4), letting users play back the video along with the slides in sync, and browse quickly through the slides of a lecture in the same way as they would through the chapters of a book.
Users can limit the scope of the search to the talks of a specific private webcast or extend it to the pool of public webcasts. The latest online portal also offers recommendation based on content similarity to increase the visibility and findability of webcasted recordings and increase the return on investment. Social media features, such as sharing and quick commenting, are also offered to encourage user interactions around recordings of interest.
Aligning R&D with Client Needs
Aligning our development plans with client needs also constituted an important success factor. Consequently, several features are planned as part of our research and development roadmap.
First, we plan to adopt search engine optimization (SEO) techniques particularly tailored to multimedia content to improve the visibility and findability by external search engines of the public recordings hosted by Klewel.
Second, in the context of a new research project building on inEvent findings, we plan to enhance, further evaluate, and integrate a new graph-based visualization approach for browsing, recording, and exploring relations between different recordings. This new project will also focus on engaging users, who will be able to create and share their own knowledge spaces and express their own views of a topic, using simple drag-and-drop actions to connect talks along different dimensions, such as content similarity and presentation styles.
Third, we plan to develop a personalized interface for conference organizers and clients who use Triskel, letting them visualize and meas- ure the return on their investment. Statistical visual information will be displayed, showing how users interacted over time with recorded content according to default or user-defined cri- teria. This interface will go beyond external analytics services by discovering the most popular (or controversial) talks as well as talk snippets, and by matching user profiles with talks or webcasts of interests. This should help set the trend of the future conferences in the same domain and target the relevant audience. Examining user logs as well as relying on the expertise of Idiap researchers in sentiment-based analysis will serve as the core analytics component behind this interface.
Finally, we’re investing in developing a lighter capture station and in enabling users to record their webcasts using personal equipment that the system can support (such as built-in microphones and webcams).
Klewel has continued to grow step by step, without big investments and through auto-financing, thanks to research projects and services offered to our client base, which continues to expand. Klewel’s “success formula” has relied on several main ingredients, includ- ing collaboration with relevant multidiscipli- nary research partners, early prototyping and testing in real usage contexts, finding ways to connect with potential clients and sell the prototype, and aligning the development roadmap to client needs.
References
- D. Moore, “The IDIAP Smart Meeting Room,” Idiap Communication Report, Idiap-Com-07-2002, 2002.
- J.-M.Odobez, D.Gatica-Perez, and M.Guillemot, “Spectral Structuring of Home Videos,” Proc. Int’l Conf. Image and Video Retrieval (CIVR), 2003.
- M.Guillemot et al.,“A Hierarchical Keyframe User Interface for Browsing Video over the Internet” Proc. 9th Int’l Conf. Human-Computer Interaction (INTERACT), 2003.
- P.Wellner,M.Flynn,andM.Guillemot,“Browsing Recorded Meetings with Ferret,” Idiap Research Report, Idiap-RR-32-2004, 2004.
- D. Chen and J.-M. Odobez, “Video Text Recognition Using Sequential Monte Carlo and Error Voting Methods,” Pattern Recognition Letters, vol. 26, no. 9, 2005, pp 1386–1403.
- A. Vinciarelli, “Noisy Text Categorization,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, 2005, pp. 1882–1895.
- A.Vinciarelli and J.-M.Odobez,“Application of Information Retrieval Technologies to Presentation Slides,” IEEE Trans. Multimedia, vol. 8, no. 5, 2006, pp. 981–995.
- A. Vinciarelli,“Application of Information Retrieval Techniques to Single Writer Documents,” Pattern Recognition Letters, vol. 26, nos. 14–15, 2005, pp. 2262–2271.
This article was written by Maël Guillemot, Jean-Marc Odobez, Alessandro Vinciarelli, and Sandy Ingram, and published in the media impact section of the magazine IEEE MultiMedia, vol.22, no. 4, pp. 94-99, Oct.-Dec. 2015.
Maël Guillemot is the managing director and a co-founder of Klewel.
Jean-Marc Odobez is head of the Perception and Activity Understanding group at the Idiap Research Institute and is a co-founder of Klewel.
Alessandro Vinciarelli is an associate professor in the School of Computing Science and is an associate academic of the Institute of Neuroscience and Psychology at the University of Glasgow. He is also a co-founder of Klewel.
Sandy Ingram is the R&D lead engineer at Klewel.
CITATION
Mael Guillemot, Jean-Marc Odobez, Alessandro Vinciarelli, Sandy Ingram, “Klewel Webcast: From Research to Growing Company”, IEEE MultiMedia, vol.22, no. 4, pp. 94-99, Oct.-Dec. 2015, doi:10.1109/MMUL.2015.80