Making A Speech Recognition System That Understands Malayalam Words

Speech is the most effective mode of communication used by humans. Automatic speech recognition can be defined as a technology which enables a system to recognize the input speech signals and interpret the meaning, after which the system should be able to generate some control signals.

1.1 AIM
Aim of this project is to realize an Automatic Speech Recognition system in hardware which is able to understand limited Malayalam words spoken into the microphone. The system works well in room environment (approximately 20dB SNR). For the proper functioning of the system, there should be distinct pauses between the words i.e. isolated words. Due to the memory constrains in the handheld device, the vocabulary supported by the system is limited i.e. it is a limited vocabulary speaker independent isolated word recognition system.

The first phase of this work is to simulate the system which recognizes limited Malayalam numerals from one to six in PYTHON. In order to increase the accuracy of recognition, a pre-processing technique called voice activity detection, which detect the start and end points of a words, needs to be implemented. In the second phase its hardware implementation has to be done in RASPBERRY PI.
Nowadays, innovation in scientific research is focused much more on the interactions between humans and technology and automatic speech recognition is a driving force in this process. Speech recognition technology is changing the way information is accessed, tasks are accomplished and business is done. Automatic speech recognition (ASR) is the ability of a machine to convert spoken language to recognized words.

ASR can be classified in several ways: speaker dependent or independent, discrete or continuous, and small or large vocabulary [1].

2.1.1 Speaker Independent (SI) systems or Speaker Dependent (SD) system
System can recognize a variety of speakers, without any training. But speaker dependent system can only recognize the speech of users it is trained to understand. Here the speaker needs to train the system before use, by speaking several times each of the commands to be recognized.

2.1.2 Discrete ASR or Continuous ASR
Discrete ASR recognizes isolated utterances. Although the user must speak unnaturally, leaving distinct pauses between each word, discrete ASR is useful in many applications, such as command and control or under high noise conditions. In Continuous ASR, the user can speak naturally, with normal conversational pauses, but it is more difficult for the system to detect the word boundaries.
2.1.3 Small vocabulary or large vocabulary system
In small vocabulary ASR, all the words in the vocabulary are trained at least once, whereas large vocabulary systems recognize sounds rather than whole words and are able of recognizing words that have never been in the training set.


