Pull to refresh

A Simple Way to Talk To Your Website

Reading time5 min

Technology and markets are going hand in hand today. It's going so close that any whiff of a tech headway and social media is going up in a frenzy about it. Writers are filling pages after pages as if it's already here. Shares ride bull or bear, and newspapers print a string of capital letters.

The pitfall of such journalistic overreaction is that we miss out on many simple tools, many small but innovative ideas that surround our networking space. An eye on the future makes us blind to the opportunity in the present moment.

Web developing companies are urging digital marketing leads to writing more about the possibility of blockchains transforming the world or how Artificial Intelligence is the next biggest thing. They are missing something very innovative and prospective. It is the ability to make your website interactive.

Imagine talking to your website to make your favourite pick the background. Your site answering you could be Artificial Intelligence, but you don’t need to go that far to speak to your website. The tool is there sleeping in your browser, and you are not even aware. We are talking about Google's Web Speech API. Let us first look at some essential elements of speech before we delve deeper into Google Web Speech APIs

Some Fundamentals of Speech

Now, to speak is easy, even a child can talk, but speech recognition is not a child’s play. Our mind and its relation to the brain is much more complicated than was assumed. Hence, computers though exceptional in some aspects is nowhere near human brain in perception. Computers need much help to listen to words as speech is not a walk in the park.

Speech is a complex phenomenon to study. It gets stranger as we go deeper into it. Speech hence is not merely an assortment of words stringed together. Every time we speak our utterance contains packets of sound which is called a phone. For example: when we say the word “MAT” we utter phones ‘m’, ‘a’, ‘t’. But the actual way we speak a sound and how our mind conceives it is entirely different. Do you remember those instances, when you react even before the completion of a sentence? You did it because there are some fundamental blocks of sound that your mind perceives unconsciously, these elements are called phonemes.

In addition to it, there are diverse elements of linguistics that one has to consider. For example, the syntax which elaborates the grammatical structure of a language and the semantics — the meaning of words — and how they churn out their holistic meaning of a sentence.

How Computers Listen To You?

The speech recognition is an interdisciplinary science and combines the subtle concepts from linguistics, signal processing, natural language processing and much more. For simplicity’s sake we need to consider the following approaches to understanding the way computers interpret speech:

1. Pattern Matching

You might remember the computerised voice of your gas booking station asking you to choose by pressing 1 or 2 in your mobile keypad to book a new gas cylinder. It is done using this technique were the computer trained to differentiate ten sound patterns. The “one”, “zero”, “ten” etc. are the sounds that are detected in this pattern matching exercise. A computer matches the blocks of sound already stored in memory to further actions. That’s why you hear “Sorry, We Didn’t Get You” when you speak zero a little casually.

2. Analysis of Feature and Patterns

A typical speech recognition tool can conceive a large vocabulary of sounds. You may wonder how it does it? The moment you speak into your mic an A/D converter (Analogue/Digital), converts the vibrations into digital texts. The spectrogram then plots the digital data into a graph, using a signal processing technique called FTT( Fast Fourier Transform). Then the waveform is broken into overlapping blocks called acoustic frames — the separation created by using a time gap of 1/50th of a second or 1/25th of a second. Here the speech is broken into possible words and is then compared with a phonetic dictionary and thus pinpoint the word spoken.

3. Statistical Method

The way each person utters a word is uniquely different. Even the same person may pronounce the same word differently another time. Hence, a system that has to decipher essential elements out of a large pool has to deal with the problem of variability. The modern speech recognition tools utilise language models to deal with the issue of variability.

Models like the Hidden Markov Model(HMV), uses probabilistic guessing using grammatical laws to arrive at the most likely word. It refines its accuracy by expanding on even the smallest sound that gets captured. The word example is preceded in English by a very selective number of words like, ‘for’, ‘bad’ ,’good’, etc. If the recognition process hangs at say “ It is a ___ example.” And a slight sound like ‘g’ was identified then the system rounds up the blank word to mean ‘good.’

4. Artificial Neural Networks

They are simplified human brains that are capable of learning through examples. Hence, if the ANNs are trained with enough samples, then it can correlate it with previously seen patterns to arrive at the right word. So a fully trained Neural Network can take speech recognition to a different level.

Here’s How You Can Modify Your Site

We will use the Web Speech API that was developed by the W3C community in 2012. Many browsers do not utilise it for one or the other reason. But Chrome and Firefox have integrated this into their browsers, and that’s why you can voice search on Google.

The Web Speech API will be our interface that already has other closely linked aspects of speech like the grammar, vocabulary etc.

Your tool will look like the one above. All you need to do execute this code. The CSS code below gives the design to your recognition feature the colour and display features. Here only a simple model is presented. You can give expression to your creativity by altering the CSS codes.

<!-- CSS Styles -->
 html, body {     
display: flex;     
align-items: center;   
 justify-content: center; 
background-color: lightblue;
.record {   
position: relative;    
width: 246px;    
display: inline-block;  
.record input {
border: 0; 
width: 240px; 
display: inline-block; 
height: 30px;
.record img { 	  
float: right;    
width: 25px;   
height: 25px;    
border: none;    
position: absolute;    
right: 7px;    
top: 3px;
.container {    
display: inline-block; 
text-align: center;  
h1 {    
font-family: constantia;  

The next set of codes will call the API to do the actual speech recognition for you. The necessary HTML and Java scripts are included in the set.

<!DOCTYPE html>
<title>Voice Recognition: Habr</title>
<!-- Search Form -->  
<div class="container">    
<h1>Voice Recognition in HTML</h1>
<div class="record">
<form id="speak-form" method="get" action="https://www.google.com/search">          <input type="text" name="q" id="transcript" placeholder="Speak" />          <img onclick="startRecording()" src="http://icons.iconarchive.com/icons/designbolts/free-multimedia/1024/Studio-Mic-icon.png" />       
<!-- HTML5 Speech Recognition API -->
<script>  function startRecording() {   
if (window.hasOwnProperty('webkitSpeechRecognition')) {     
var recognition = new webkitSpeechRecognition(); 
recognition.continuous = false;   
recognition.interimResults = false;  
recognition.lang = "en-US";
recognition.onresult = function(e) {       
document.getElementById('transcript').value = e.results[0][0].transcript;      
recognition.onerror = function(e) {      

The simple tool described above can open up a new window of opportunity to many sites that are struggling to be interactive and unique. Web development should implement such simple and scalable techniques first. Intelligent web development should find the right balance between crucial web design secrets and such simple integrative tools. The Web API can further be used in mobile app development to enhance smartphones and make them smart. So check out this feature now and have a good time chatting with your website.
Total votes 14: ↑14 and ↓0+14