Text to Speech Using the Web Speech API in JavaScript

The Web Speech API is used to incorporate voice data into web apps. In this tutorial, we will build a simple webpage that uses the Web Speech API to implement text to speech. You can check the browser compatibility for the Web Speech API here.

Prerequisites

To follow along with this tutorial, you should have:

A basic understanding of HTML and JavaScript.
A code editor. I'll be using Visual Studio Code.
A browser to view the webpage, preferably Google Chrome or Mozilla Firefox.

Project directory

Create a new directory for the project and create two new files called index.html and textToSpeech.js.

project-directory/
|-index.html
|-textToSpeech.js

The HTML page

In the HTML file, let's set up:

An empty select menu. We will fill the empty select menu with the list of voices available using JavaScript.
Range sliders for volume, pitch, and rate.
A textarea to type in.
Control buttons for the speech.

I've used Bootstrap 5 to style the webpage. If you are new to Bootstrap, check out their documentation to get a better understanding.

<html lang="en">
  <head>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0-beta1/dist/css/bootstrap.min.css" rel="stylesheet" />
    <link rel="stylesheet" href="index.css" />
    <title>Text to Speech</title>
  </head>
  <body class="container mt-5 bg-dark">
    <h1 class="text-light">Text to Speech</h1>
    <p class="lead text-light mt-4">Select Voice</p>

    <!-- Select Menu for Voice -->
    <select id="voices" class="form-select bg-secondary text-light"></select>

    <!-- Range Slliders for Volume, Rate & Pitch -->
    <div class="d-flex mt-4 text-light">
      <div>
        <p class="lead">Volume</p>
        <input type="range" min="0" max="1" value="1" step="0.1" id="volume" />
        <span id="volume-label" class="ms-2">1</span>
      </div>
      <div class="mx-5">
        <p class="lead">Rate</p>
        <input type="range" min="0.1" max="10" value="1" id="rate" step="0.1" />
        <span id="rate-label" class="ms-2">1</span>
      </div>
      <div>
        <p class="lead">Pitch</p>
        <input type="range" min="0" max="2" value="1" step="0.1" id="pitch" />
        <span id="pitch-label" class="ms-2">1</span>
      </div>
    </div>

    <!-- Text Area  for the User to Type -->
    <textarea class="form-control bg-dark text-light mt-5" cols="30" rows="10" placeholder="Type here..."></textarea>

    <!-- Control Buttons -->
    <div class="mb-5">
      <button id="start" class="btn btn-success mt-5 me-3">Start</button>
      <button id="pause" class="btn btn-warning mt-5 me-3">Pause</button>
      <button id="resume" class="btn btn-info mt-5 me-3">Resume</button>
      <button id="cancel" class="btn btn-danger mt-5 me-3">Cancel</button>
    </div>
  </body>
  <script src="./textToSpeech.js"></script>
</html>

The JavaScript file

Let's create an instance of the SpeechSynthesisUtterance class. We'll configure this instance with various properties.

let speech = new SpeechSynthesisUtterance();

Properties

Now, let's configure some properties on this SpeechSynthesisUtterance instance.

There are six properties on the SpeechSynthesisUtterance instance that we can tweak.

They are:

Language:

The language property gets and sets the language of the utterance. If unset, the <html lang="en"> lang value will be used, or the user-agent default if the <html lang="en"> lang is unset.

It accepts a DOMString representing a BCP 47 language tag.

speech.lang = "en";

Text:

The text property gets and sets the text that will be synthesized when the utterance is spoken. The text can be provided as plain text. In our case, the text property must be set when the start button is clicked.

Let's add a click listener to the button. When the button is clicked, we should get the text value from the textarea and set it to this property.

You can learn more about event listeners here.

document.querySelector("#talk").addEventListener("click", () => {
  speech.text = document.querySelector("textarea").value;
});

Volume:

The volume property gets and sets the volume of the utterance. It is a float that represents the volume value, between 0 (lowest) and 1 (highest). The default value is 1 if this property is unset.

Let's add an onInput listener to the volume range slider and adjust the volume property when the value of the slider changes. We've already set the min, max, and default value of the slider in the HTML tag.

Let's also set the <span> that displays the value of the volume in the webpage next to the range slider.

document.querySelector("#rate").addEventListener("input", () => {
  // Get rate Value from the input
  const rate = document.querySelector("#rate").value;

  // Set rate property of the SpeechSynthesisUtterance instance
  speech.rate = rate;

  // Update the rate label
  document.querySelector("#rate-label").innerHTML = rate;
});

Rate:

The rate property gets and sets the rate of the utterance. It is a float representing the rate value which can range between 0.1 (lowest) and 10 (highest). The default value is 1 if this property is unset.

Let's add an onInput listener to the rate range slider and adjust the rate property when the value of the slider changes. We've already set the min, max, and default value of the slider in the HTML tag.

Let's also set the <span> that displays the value of the rate in the webpage next to the range slider.

document.querySelector("#volume").addEventListener("input", () => {
  // Get volume Value from the input
  const volume = document.querySelector("#volume").value;

  // Set volume property of the SpeechSynthesisUtterance instance
  speech.volume = volume;

  // Update the volume label
  document.querySelector("#volume-label").innerHTML = volume;
});

Pitch:

The pitch property gets and sets the pitch of the utterance. It is a float representing the pitch value that can range between 0 (lowest) and 2 (highest). The default pitch is 1 if this property is unset.

Let's add an onInput listener to the pitch range slider and adjust the pitch property when the value of the slider changes. We've already set the min, max, and default value of the slider in the HTML tag.

Let's also set the <span> that displays the value of the pitch in the webpage next to the range slider.

document.querySelector("#pitch").addEventListener("input", () => {
  // Get pitch Value from the input
  const pitch = document.querySelector("#pitch").value;

  // Set pitch property of the SpeechSynthesisUtterance instance
  speech.pitch = pitch;

  // Update the pitch label
  document.querySelector("#pitch-label").innerHTML = pitch;
});

Voice:

The voice property gets and sets the voice that will be used to speak the utterance. This should be set to one of the SpeechSynthesisVoice objects. If it is not set, the most suitable default voice available for the utterance's language setting will be used.

To set the voice of the utterance, we need to get the list of available voices in the window object. When the window object loads, the voices will not be available immediately. It's an async operation. An event will be triggered when the voices are loaded. We can set a function that should be executed when the voices are loaded.

window.speechSynthesis.onvoiceschanged = () => {
  // On Voices Loaded
};

We can get the list of voices using window.speechSynthesis.getVoices(). It'll return an array of SpeechSynthesisVoice objects that are available. Let's store the list in a global array and update the select menu on the web page with the list of available voices.

let voices = []; // global array

window.speechSynthesis.onvoiceschanged = () => {
  // Get List of Voices
  voices = window.speechSynthesis.getVoices();

  // Initially set the First Voice in the Array.
  speech.voice = voices[0];

  // Set the Voice Select List. (Set the Index as the value, which we'll use later when the user updates the Voice using the Select Menu.)
  let voiceSelect = document.querySelector("#voices");
  voices.forEach((voice, i) => (voiceSelect.options[i] = new Option(voice.name, i)));
};

Now that we have updated the voice menu, let's add an onChange event listener on it to update the SpeechSynthesisUtterance instance's voice. When a user updates the voice, we will use the index number (which is set as the value for each option) and the global array of voices to update the voice.

document.querySelector("#voices").addEventListener("change", () => {
  speech.voice = voices[document.querySelector("#voices").value];
});

Controls

Let's add controls to the SpeechSynthesis instance.

Start:

We should pass the SpeechSynthesisUtterance instance to the window.speechSynthesis.speak() method when the start button is clicked. This will start converting the text to speech. The text property must be set before calling this method.

NOTE: If you start another text to speech while an instance is already running, it'll get queued behind the one that is currently running.

document.querySelector("#talk").addEventListener("click", () => {
  speech.text = document.querySelector("textarea").value;
  window.speechSynthesis.speak(speech);
});

Pause:

We can pause the SpeechSynthesisUtterance instance that's running at the moment using window.speechSynthesis.pause(). Let's select the pause button and add a click event listener to it and pause the SpeechSynthesisUtterance instance when the button is clicked.

document.querySelector("#pause").addEventListener("click", () => {
  window.speechSynthesis.pause();
});

Resume:

We can resume the SpeechSynthesisUtterance instance that's paused at the moment using window.speechSynthesis.resume(). Let's select the resume button and add a click event listener to it and resume the SpeechSynthesisUtterance instance when the button is clicked.

document.querySelector("#resume").addEventListener("click", () => {
  window.speechSynthesis.resume();
});

Cancel:

We can cancel the SpeechSynthesisUtterance instance that's running at the moment using window.speechSynthesis.cancel(). Let's select the cancel button and add a click event listener to it and cancel the SpeechSynthesisUtterance instance when the button is clicked.

document.querySelector("#resume").addEventListener("click", () => {
  window.speechSynthesis.resume();
});

The final version of textToSpeech.js:

// Initialize new SpeechSynthesisUtterance object
let speech = new SpeechSynthesisUtterance();

// Set Speech Language
speech.lang = "en";

let voices = []; // global array of available voices

window.speechSynthesis.onvoiceschanged = () => {
  // Get List of Voices
  voices = window.speechSynthesis.getVoices();

  // Initially set the First Voice in the Array.
  speech.voice = voices[0];

  // Set the Voice Select List. (Set the Index as the value, which we'll use later when the user updates the Voice using the Select Menu.)
  let voiceSelect = document.querySelector("#voices");
  voices.forEach((voice, i) => (voiceSelect.options[i] = new Option(voice.name, i)));
};

document.querySelector("#rate").addEventListener("input", () => {
  // Get rate Value from the input
  const rate = document.querySelector("#rate").value;

  // Set rate property of the SpeechSynthesisUtterance instance
  speech.rate = rate;

  // Update the rate label
  document.querySelector("#rate-label").innerHTML = rate;
});

document.querySelector("#volume").addEventListener("input", () => {
  // Get volume Value from the input
  const volume = document.querySelector("#volume").value;

  // Set volume property of the SpeechSynthesisUtterance instance
  speech.volume = volume;

  // Update the volume label
  document.querySelector("#volume-label").innerHTML = volume;
});

document.querySelector("#pitch").addEventListener("input", () => {
  // Get pitch Value from the input
  const pitch = document.querySelector("#pitch").value;

  // Set pitch property of the SpeechSynthesisUtterance instance
  speech.pitch = pitch;

  // Update the pitch label
  document.querySelector("#pitch-label").innerHTML = pitch;
});

document.querySelector("#voices").addEventListener("change", () => {
  // On Voice change, use the value of the select menu (which is the index of the voice in the global voice array)
  speech.voice = voices[document.querySelector("#voices").value];
});

document.querySelector("#start").addEventListener("click", () => {
  // Set the text property with the value of the textarea
  speech.text = document.querySelector("textarea").value;

  // Start Speaking
  window.speechSynthesis.speak(speech);
});

document.querySelector("#pause").addEventListener("click", () => {
  // Pause the speechSynthesis instance
  window.speechSynthesis.pause();
});

document.querySelector("#resume").addEventListener("click", () => {
  // Resume the paused speechSynthesis instance
  window.speechSynthesis.resume();
});

document.querySelector("#cancel").addEventListener("click", () => {
  // Cancel the speechSynthesis instance
  window.speechSynthesis.cancel();
});

Result

You can take a look at the project that's been deployed using GitHub Pages here.

You can also check out the final code in this GitHub Repository.

Let's recap

We created an HTML page with a select menu for the voices, a text area, and control buttons.
We created a new JavaScript file and linked it to the HTML file.
We created a new SpeechSynthesisUtterance object.
We tweaked the six properties of the SpeechSynthesisUtterance instance. They are Pitch, Volume, Text, Voice, Rate, and Language.
We added listeners on the control button to control the SpeechSynthesisUtterance instance when they are clicked. They are Start, Pause, Resume, and Cancel.

Congratulations, :partying_face: You did it.

Happy Coding!

I do all my writing in my spare time, so if you feel inclined, a tip is always incredibly appreciated.