As is the case with any science, it is important to define the main terms in order to create an equal basis for discussion. For this purpose, a list of important terms for my research is given below. The definitions are taken from “The Handbook of Multimodal-Multisensor Interfaces” by Cohen and from the scientific paper by Karpov and Yusupov “Multimodal Interfaces of Human-Computer Interaction”.

Multimedia OutputSystem output involving two or more types of information
received as feedback by a user during human-computer interaction, which may
involve different types of technical media within one modality like vision—still
images, virtual reality, video images—or it may involve multimodal output such as
visual, auditory, and tactile feedback to the user.
Multimodal Inputinvolves user input and processing of two or more modalities—such as speech, pen, touch and multi-touch, gestures, gaze, head and body movements, and virtual keyboard. These input modalities may coexist together on an interface, but be used either simultaneously or alternately. The input may involve recognition-based technologies (e.g., speech, gesture), simpler discrete input (e.g., keyboard, touch), or sensor-based information (e.g., acceleration, pressure). Some modalities may be capable of expressing semantically rich information and creating new content (e.g., speech, writing, keyboard), while others are limited to making discrete selections and controlling the system display (e.g., touching a
URL to open it, pinching gesture to shrink a visual display). These interfaces aim to support processing of naturally occurring human communication and activity patterns. They are far more flexible and expressively powerful than past keyboard and-
mouse interfaces, which are limited to discrete input.
Multimodal Interface support multimodal input, and they may also include sensorbased controls. In many cases they may also support either multimodal or multimedia output. Cohen
Multimodal-Multisensor Interface combine one or more user input modalities with sensor information (e.g., location, acceleration, proximity, tilt). Sensor-based cues may be used to interpret a user’s physical state, health status, mental status, current context, engagement in activities, and many other types of information. Cohen
Multimodal Output involves system output from two or more modalities, such as a visual display combined with auditory or haptic feedback, which is provided as feedback to the user. This output is processed by separate human sensory systems and brain areas. Cohen
Modality Information exchange Method during Human-Computer Interaction Yusopov, Karpov
Sensor Inputaims to transparently facilitate user-system interaction, and adaptation to users’ needs.Cohen