affective caricature

This “affective caricature” approach involves musical symbols, affective notations, and caricature humor. First, rather than inventing new symbols, I adopted existing symbols in Western music notation — including music notes, dynamics, and breath marks. Second, I played around with visual primitives as position, size, shape, and color to elicit a strong emotional response.The third idea of ‘caricature humor’ encouraged me to exaggerate the visual elements to more immediately appeal to emotions.

This example conveys auditory salience with visually striking objects. For instance, very high or very low pitches are brought out by using an “exponent” (inverse-log) frequency scale on the vertical axis, such that if a woman makes a shrill, high-pitched scream in her laughter, it should standout by its positive vertical offset. Saliency in the perceived intensity of laugh pulses (syllables) and breathing noises are expressed using the exaggerated size of red note-heads and blue breath-marks, respectively. The anomalies in the quality of the laugh pulses (syllables) can be conveyed by the unexpected coloring of the note-heads. (For instance, the differences from statistical norm of the first, second, and third formants of the pulse can each be mapped to the R, G, B components, respectively, of the note-head color.) Finally, the possibly-exaggerated intensity contour of the pulses can be conveyed by the black enveloped regions inside the red note-heads, as illustrated above.

This transcription system is clearly a waste of energy if our goal is to accurately represent the laughter signal. However, if we take a perceptual (or perceptually salient) perspective, to communicate visually what is acoustically striking about a laughter, this may be an interesting and fun direction to take, one that complements the traditional approach using waveforms and spectrograms.


descriptive graph

In designing this “descriptive graph”, I tried to follow the “Principles of Graphical Excellence” stated by Edward Tufte in The Visual Display of Quantitative Information:

  • Graphical excellence is the well-designed presentation of interesting data — a matter of substance, statistics, and of design
  • Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency.
  • Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.
  • Graphical excellence is nearly always multivariate.
  • And graphical excellence requires telling the truth about the data.

I combined various types of graphs (a box-plot, intensity graph in the form of a sparkline, and dot plots with regression curves) with phonetic information.

The sound file used as an inspiration for this diagram is from Professor Wallace Chafe’s website: laughter sample (Note: some features like the phonetic symbols have been “made up” and are not accurate to what is actually present in the audio).

The vertical box-plot on the left summarizes the pitch distribution of laughter, and the horizontal intensity plot on the bottom projects the intensity of sound across time. The red dots denote sound “particles”, or the likely pitch based on the energy level taken from a spectrogram, along with their regression curves showing a probable f0 contour. Finally, across the chart in green is a readable phonetic summary, including the closest matching phonetic label for each laugh pulse, and an arrow-diagram denoting inhale/ exhale, voiced/voiceless, and intensity envelope for each laugh pulse (as used in prescriptive mechanism).

prescriptive mechanism

This sketch conveys how to imitate a particular laugh instance:

The sound file used as an inspiration for this diagram is from Professor Wallace Chafe’s website: laughter sample (Note: some features, like the glottal stop, have been added to the sketch, but are not found in the audio).

Laughter includes not only sounds from the vocal tract, but also breathing noise, glottal stops, and occasionally even sniffs and snorts. This system is good for conveying how different components of laughter are produced by separating out the three major sources (approximately nose, mouth, throat).

Additionally, the notation distinguishes an inhale (left-pointing arrows) from an exhale (right-pointing arrows), and voiced (sharp arrow-head) from unvoiced (round arrow-head). The intensity envelope for each pulse unit is conveyed as the thickness of the arrow-body. This part of the design is re-used in my second transcription system, described in the next sub-section.

Unfortunately, it would be difficult to automatically generate this type of notation based on existing techniques, and it would require coming up with a machine learning algorithm for source separation as well as classification into the three possible sources. Moreover, the transcription as shown does not contain any pitch or spectral information beyond what can be inferred from the source.

my first post!

I just created this blog to post sketches, sounds, and ideas related to human laughter.

Your comments, especially constructive criticisms, are welcome!!