Inclusive Design Glossary

The following definitions and explanations of various general terms, access technologies, and media affordances provide an overview and a shared context for which to think about and design specific solutions, initiatives, and inclusive undertakings.

These are not recommendations, but instead aim to serve as informal working definitions. Some terms, like Zoom, are not defined, as their definition is obvious; instead an explanation offering some additional nuance is provided to help the reader understand how these approaches work together. Terms are grouped semantically and are not explicitly ordered alphabetically.

We keep this page updated and iterate upon it as our work reveals new insights, thoughts, and use cases. Please feel free to use, cite, and communicate with us as needed.


Words like affordance, accessibility, and othering come up a lot in our work. To ensure consistency and clarity in these terms, we are defining each as follows:


Disability is the result of a variance that may be physical, cognitive, mental, sensory, emotional, developmental, or some combination of these. When discussing disability, two models are most often discussed: the medical model and the social model of disability.

Medical Model of Disability

In the medical model of disability, if someone has a disability, it is treated as something to be fixed or healed. When the perceived impairment can’t be fixed, society treats the individual as being broken, as having the impairment, as being different or “the other.” This puts the burden on the individual because the individual is viewed as the problem.

Social Model of Disability

The social, or environmental, model of disability considers the environment disabling, rather than the individual as being the problem. This model places the responsibility on society, instead of the individual, to design and create environments and systems where disabled people can participate fully.

Inclusive Design

Inclusive design is a design process in which the fact that all individuals interact with the world differently is placed at the heart of the process. Individual people, with their own lived experience, prior knowledge, and variances, will interact with what we make and put into the world, so we relax our assumptions about the abilities of the user and design with compassion, flexibility, and inclusion at the heart of our practice.


Accessibility is the ability of all people to participate in an environment and use a service or product regardless of their disability. Accessibility is intended to mitigate the results of disability within environments not originally designed with disabled people’s needs in mind.


Othering is any act that makes a person, or group of people, feel essentially different. Often, othering is not the intention of a design, however, if that is the outcome, the impact must be addressed regardless of the original intention.

For Example

  • A museum has stairs at the front entrance. In order for a wheelchair user to enter the building, they have to use a back or side entrance which creates a segregated experience.

General Terms


Affordance refers to the interaction between a person and a physical or digital interface. An affordance is an attribute indicating how something is used and what actions can be taken. In the context of inclusive design and accessibility, an affordance is a feature, function, or digital asset that facilitates access to the experience.

For Example
  • A touch object or set of touch experiences to provide an understanding of an artifact in a case is tactile affordance for an otherwise purely visual experience.


Surfacing refers to how an affordance is going to be provided to the user. Understanding the distinction between an affordance and its surfacing mechanism will facilitate greater access by examining how affordances are surfaced using multimodal tactics; including but not limited to those defined in this glossary such as captions, braille, ASL on-screen, etc.

For Example
  • The daughter assets of all linear media include captions, American Sign Language, and audio description
  • Including captions for a linear media presentation would be a visual affordance. Placing those captions along the lower sixth of the presentation screen would be where and how the visual affordance is surfaced.


Having or involving several physiological senses. Multisensory design acknowledges that people experience and learn in multiple different ways and employs more than one of the five senses to relay information, create a sense of place, engender emotion, etc.

For Example
  • An installation has visual, audio, and smell all included in the experience.


A particular manner in which something is experienced or expressed. Interpretive, instructional, and directional information can be delivered through a variety of modalities that can be visual or auditory, passive or active, physical or digital, and so on.


Having or involving several modes of doing or experiencing something. Multimodal design utilizes multisensory affordances and design tactics to surface interpretive, instructional, and directional information via audio, tactile, visual, and other modalities.

Multisensory design tactics are not intrinsically accessible and inclusive. Only when the multisensory design is conceived and designed through an inclusive design lens, making it truly multimodal, is inclusion and accessibility achieved. The result of such consideration is that any design intent or affordance within the environment is redundantly mapped to multiple modalities. If an offering has vibration, sound, and visuals, that is multisensory, but not multimodal or inclusively designed. If that same offering ensures that each sound is perceivable also visually and through vibration, that each visual is perceived also through vibration, sound, and so forth, then it is multimodal and inclusively designed.

Accessible Zone

The accessible zone is a volume of space in which all primary content must be displayed so as not to disadvantage those who do not have access to the viewing angles afforded by having a nominal height.

This zone especially benefits those who are of small stature or are in a seated position.


The meaning and interpretation of words, signs, and sentence structure.

Semantic Structure

Semantic structure refers to the organization and categorization of meaning and information. In inclusive design, it refers to how that structure is developed and mapped to digital and physical environments to account for the various ways people receive and perceive information. The development of semantic structures through an inclusive design lens ensures that there are clear and usable systems for content and information delivery as well as navigation across digital and physical spaces. This allows all visitors to focus their cognitive energy on engaging in the content or experience, rather than focusing on accessing and navigating the content or experience.

In both digital and physical environments, semantic structures facilitate inclusion and access by ensuring clearly defined information hierarchies and architectures that prioritize and categorize types of information through graphic treatment, layout, and placement. When implemented consistently, these semantic structures also facilitate inclusive interaction and interface design.

In digital environments, inclusively designed semantic structures are also critical to ensuring navigation pathways that are usable by screen readers, refreshable braille displays, and other forms of access technology.


Static Media

Static media is analogue or digital media with no time component. Static media includes images, photographs, paintings, non-moving projections, and more.

Visual Description

Visual description is a textual representation of static media or objects most commonly intended for those who may not be able to see the media or objects, but benefit all people. There are two general types of visual descriptions:

Short Description

A short description provides a very brief and general sense of the content of an image or object. The length of a short description should be up to 60 words max.

Surface: Short descriptions are most commonly used on the web or other electronic formats and are mapped to alt text fields.

Note: Alt text is a commonly misused term derived from the term “alternative text” that means a short visual description.

Long Description

A long description should comprehensively describe the most important elements of an image augmented by a few prioritized details. The length can range from one to multiple paragraphs depending on the complexity of the image and the context. 

Surface: Long descriptions must be surfaced accessibly providing maximum user customization using a bespoke solution. They should also be surfaced visually so that all visitors can access them.

Audio Description

Audio description is the narrated visual description of time-based media. It is an additional audio track, often phrased as an additional language track, and is in the same language as the source media.

Tactile Affordances

Touch is an effective modality to convey spatial information when used correctly. Explanations of tactile alternatives, manipulatives, and reproductions are below.

Tactile Alternative

Tactile alternatives are touchable representations, but not full reproductions, of a piece of source media.

For Example
  • An embossed version of a graph, a simplified architectural model as a tactile diagram, a touchable piece of fabric from a garment that cannot be touched due to conservation/preservation reasons, and much more.
Tactile Manipulative

A tactile manipulative is an object that is typically held or placed on a flat surface for examination.

These are often components of a touch tour or other similar service offering.

Tactile Reproduction

A tactile reproduction is a tactile alternative that significantly prioritizes accuracy, fidelity, and the comprehensive presentation of the nuances of an object.

Guided Tactile Description

Guided tactile description includes elements of a visual description and assumes that the consumer of the description is touching the thing that is being described. The description will include information on how something feels and tactile landmarks to facilitate wayfinding.


Sonification is the use of non-speech-based audio to convey spatial information.

Sonification is a reasonably well-established field with a body of academic research that is ever-evolving.

For Example
  • An audible tone whose pitch is mapped to the Y-values on a graph, facilitating greater understanding for someone who is unable to see the image.

Time-Based Media (Non Navigable)

Time-based media refers to any media that has a non-zero duration, e.g., animated gifs, audio, video, animations, and more presented without user controls.


Captions are the real-time textual representation of spoken language and any non-speech sound in a piece of media or live setting. Captions are displayed in the same language as the language of the media and are not subtitles.

Closed Captions

Closed captions are not burned into a video. They are stored digitally as text, either in a separate file from the source media or as part of a container format.

Closed captions are preferred because they can be read by screen readers—think of a deafblind individual accessing captions via braille.

Open Captions

Open captions are captions that are burned into a video. They are not stored as text, but as pixels that comprise the frames of the media.

They are not preferred in most situations (see Closed Captions), but they are sometimes necessary—think of an unsophisticated video playback device in a public setting that doesn’t support closed captions.


Subtitles are used when the language of the source media needs to be translated for the visitor.

In addition to being in a translated, and therefore different, language than the source media, subtitles do not indicate non-speech audio such as sounds, music, and more.

For Example
  • If a video is in Spanish, English subtitles will appear on screen to offer a translation for English speakers.


Transcripts are the static textual representation of the audio of a piece of media.

A transcript for time-based media is a compellation of all the captions of that media. Transcripts can be consumed at an independent rate of speed from the source media, because they are static text.

Enhanced Transcripts

An enhanced transcript is similar to a transcript with the addition that it also includes the audio description.

Enhanced transcripts are especially important for deafblind people who need access to both the transcript and the audio description.

Audio Ducking

Audio description is often delivered via audio ducking, a simple acoustic treatment easily achievable in virtually all editing workflows. Audio ducking is when the source audio is lowered but not eliminated, and the audio description track is played at the source media volume. The audio of the source media is said to duck underneath the narration of visual description; hence why it’s called audio ducking.

Sign Languages

Sign languages use the visual-manual modality to convey meaning. Sign languages are expressed through manual articulations in combination with non-manual elements.

Sign languages are fully formed languages with their own grammar, syntax, idiomatic expressions, and all other elements that make up a language. Sign languages are not tied to spoken languages and have their own history of linguistic development.

Note on D/deaf or “the big D little d discussion- The letter “D” is capitalized when refering to people who identitfy culturally as part of the Deaf community and not capitalized when referring to people who don’t identify with cultural Deafness. When refering to both groups of people, it is common to write D/deaf.

American Sign Language (ASL)

The primary sign language for the signing Deaf community in the United States and English-speaking parts of Canada is American Sign Language (ASL).

American Sign Language (ASL) has its roots in French Sign Language and is much closer to that than British Sign Language (BSL).

Black American Sign Language (BASL)

Black American Sign Language (BASL) is a dialect of ASL and is used by some Deaf Black Americans in the United States.

BASL evolved in the 1800s and 1900s during the segregation of deaf residential schools which created a different process of language socialization and development.


Protactile is a form of language used by D/deafblind folks and is oriented towards communication based in touch on the body.

Protactile was created by the DeafBlind community who adapted American Sign Language from a mostly visual based language, to a primarily tactile based language.


Interpretation is the process of translating one language into another.

Sign Language Interpretation

Sign language interpretation is the use of a signed language to convey auditory information. It uses a range of visual communication to convey meaning between hearing and D/deaf and hard of hearing people.

Deaf Interpretation and Translation

Deaf interpreters and translators (also referred to as Sign Talent) are Deaf specialists trained in interpreting and translating spoken or written text into signed languages and other visual communication for D/deaf, hard of hearing, and D/deafblind folks.

Deaf interpreters and translators have a distinct set of formative cultural, linguistic, and lived experiences allowing for a more nuanced comprehension and interaction. These experiences coupled with professional expertise enables a level of linguistic and cultural bridging that is not often possible when hearing signed language interpreters work alone.

Navigable Media

Navigable media refers to media, static or time-based, that can be interacted with or navigated in any way.

For Example
  • A video playing on a screen is time-based media, but a video with transport controls, e.g., play/pause, rewind, and fast-forward, is navigable media, as are video games, digital interactives, websites, mobile apps, and much more.

Access Technology

Access technology is an additional affordance, mode of operation, or other accommodation to a system that enhances the experience to ensure inclusivity and accessibility.

The below-defined concepts of screen reader, zoom, and high-contrast mode are all access technologies and/or technological implementations.

Screen Reader

A screen reader is an application, which often runs with elevated privileges, on a platform. It keeps track of the user’s focus, or point of regard, and announces, via synthetic speech, what the user is currently interacting with, what can be done from this point, and how to perform any desired actions.

Synthetic speech is achieved by software called a text to speech (TTS) engine. Screen readers can also drive a braille display, which is a tactile interface that can display braille on an often single-line display consisting of between 10 to 80 characters (20 to 40 characters is most common). Those who are deaf-blind rely primarily or solely on a braille display for all aspects of communication; therefore, thinking about braille support, either via a provided braille display or by following standards such that a third-party display can be used, is necessary so that we can make sure not to exclude this often-ignored population.

Text to Speech

Text to Speech (TTS) refers to the mechanism by which any digital system uses synthetic speech to inform the user of something.

This should not be confused with speech to text, which is when the user speaks, and a computer system responds.

Speed Control

Speed control allows a user to set the speed of the TTS.

Different users of TTS listen at different words per minute (WPM) rates. By providing control of the speed of the TTS output, agency is returned to the visitor over how fast they choose to consume content. This has many implications, from helping those with cognitive disabilities to accommodating those who are first learning how to use a screen reader to power users who wish to rapidly progress through the content.

Volume Control

Volume control allows a user to set their own preferred volume.

Much like how controlling speed is helpful, so too is controlling volume. Volume control is obviously beneficial for ensuring that a continuum of visitors with various hearing abilities are able to successfully consume content, but also to help accommodate unforeseen noise in the environment such as a group of young children visiting a gallery at the same time as someone who depends on speech to navigate through the space and content. Additionally, different headphones and earbuds have various impedance ratings, which means that the same output voltage results in different volumes; therefore, a wide volume range helps accommodate the maximum number of devices. Some visitors may also be sensitive to volume and therefore prefer to lower the volume from a preset nominal level.

Focus Highlight

A visual highlight or focus rectangle provides additional visual affordance with the highlight indicating either the content being explored or the interface elements being used. The highlight/rectangle has a transparent interior so as not to obscure the specific content and an opaque outline. A soft glow around the exterior of the rectangle can also be used to provide additional visual feedback.

This highlight is critical when using the interface via an external device such as a keypad, mobile app, keyboard, or other affordance, and is useful for a sighted companion as it shows where a screen reader’s point of regard is in the user interface (UI).


Zoom refers to the ability to magnify both text and images (the entire interface).

This needs to be thought about ahead of time and is critical for those with low vision but also for anyone who may have forgotten their reading glasses that day, is standing farther from the screen than originally assumed, or for a variety of other reasons. When zoom is engaged on a system, the implication is that a gesture or other affordance exists for panning. This is because if a fixed height and width interface needs to display information at 200% or greater, then both vertical and horizontal panning becomes necessary to explore all the information. A common practice is to reserve single-finger gestures for screen-reading functionality and two-finger gestures for things like pan for a zoom mode. Lastly, visible focus-tracking is critical for zoom on digital interactives. This means that as the user advances across UI elements, the view auto-scrolls to ensure their point of regard is always in view. This is another reason the aforementioned focus highlight is important.

Brightness Control

Brightness control allows a user to change the brightness of a screen.

A control for being able to adjust brightness is helpful in many situations, not only is this an assistance to those with light sensitivity, but it is also helpful for those who rely on high contrast.

Invert Colors

If the interface can be placed in monochromatic mode (either black/white or, more commonly, grayscale), this inversion of colors drastically assists with contrast issues and readability of text as well as exploration of graphics. This will also help those with various forms of color-blindness, but it is not a solution in and of itself for those populations. Making sure to never use color alone as the sole way of conveying information is the conceptual assumption and prerequisite that allows this affordance to be even more helpful for visitors. If nothing relies solely on color (e.g., text and iconography are also used to convey meaning) then allowing the interface colors to be switched into monochromatic mode is a powerful win (consider dark mode on most modern devices to illustrate this point).

Key-Based Input

When an interface can be used by a keyboard or other physical inputs, it not only helps those who cannot or do not prefer to use a touchscreen, but this extensibility also lays the groundwork for supporting switch users in the future.

Height Control

Height control allows a user to change the height of a screen or other presentation-media.

Adjustable height provides much-needed flexibility and makes the environment more usable to those of small stature, children, those who are taller than nominal levels, etc.


Tilt allows a user to adjust the angle of a screen or other presentation-media.

The angle of view is important when considering visitors at all different positions, seated and otherwise. By allowing for even a small range of motion in the vertical and horizontal direction, orthogonal to the visitor’s face, the display of visual information can be made much easier to consume by a variety of visitors. Often, adjusting height can present a logistical challenge, whereas tilt control is easier to achieve and can resolve many similar challenges. Both are preferred, but if height is not adjustable, tilt is a great fallback.