Last updated on 14/01/2024
It is nice and useful to interact by voice with a PC to quickly obtain information in a natural way. This is very easy when a finite number of possible questions about a topic are expected, because the problem is to identify the asked question out of a possible set of questions. Once the question is identified, the proper answer can be provided.
The following Python script allows you to interact by voice with a PC. Its goal is to identify the question posed out from a set of possible questions, and to provide the corresponding answer. This artificial intelligence system is therefore not open-ended. It rather focuses on providing the appropriate answer in the most direct and effective way.
This Python script uses the default microphone to listen to a question. Then the voice is converted to a string using gTTS, a Python library to interface with Google Translate’s text-to-speech API.
The string is decomposed into a list of words. Generic words, which do not expectedly convey much specific information, are removed from the list. The resulting simplified list is then compared to a list of keywords, each list being representative of one of the possible questions.
The candidate question is then identified as the one whose keywords best overlap with the simplified list. A pointer to the question is returned, which can be used to access the requested information necessary to deliver a proper answer.
How information is organized
The script uses three files, which must be saved in the same directory as the script:
1) cold_words.txt
2) questions_keywords.txt
3) questions_clear.txt
Any word, which is not informative, should be saved in cold_words.txt. Such “non-informative” words might be of the like of: “if,” “or“, “but“, “the“, etc.
The files “questions_keywords.txt” and “questions_clear.txt” are strictly dependant from each other, because the keywords related to the question in the k-th line of questions_clear.txt should be in the k-th line of questions_keywords.txt.
There is no upper limit to the number of entries k: as many questions as needed or desired can be added to the files.
How to access data for answering
The variable question_asked identifies the asked question and should be used to retrieve and deliver the corresponding set of information [this feature is not included in the script, because it should be implemented as needed].
Example of possible cold_words.txt content:
the of or in is and because but course if why
Example of questions_keywords.txt
apple apples banana bananas pear pears
Example of questions_clear.txt
how much are apples? how much are bananas? how much are pears?
Python script
import speech_recognition as sr # for speech-to-text
from gtts import gTTS # for text-to-speech
from io import BytesIO # for text-to-speech
import pygame # for text-to-speech
import os # for reading files
pygame.init()
pygame.mixer.init()
# Function to speak out the answer:
def speak(text, language='en'):
mp3_fo = BytesIO()
tts = gTTS(text, lang=language)
tts.write_to_fp(mp3_fo)
pygame.mixer.music.load(mp3_fo, 'mp3')
pygame.mixer.music.play()
rec = sr.Recognizer()
with sr.Microphone() as source:
print("I am getting used to ambient noise, just a second, please. ")
rec.adjust_for_ambient_noise(source, duration=1)
print("Ok, I'm ready. Please ask your question.")
recorded_audio = rec.listen(source)
print("Got it. ")
try:
text = rec.recognize_google(recorded_audio, language="en-US", \
show_all=False, with_confidence=False)
# print("Decoded Text : {}".format(text))
except Exception as exc:
print("Sorry, I encountered an error: ",exc)
# print("This is what I heard:", text)
# Upload the file containing "cold words", i.e. words that presumably do not
# convey useful information to detect the question asked:
f = open('cold_words.txt', 'rt')
cold_words=f.readlines()
f.close()
# Create a list of "cold words":
cold_words_list = []
for i in range(0,len(cold_words)):
cold_words_list += cold_words[i].split()
# Split the spoken text into a list of single words:
text_tokens = text.split()
# Remove from "text" all the "cold words":
clean_text_tokens = [i for i in text_tokens if i not in cold_words_list]
# Upload the file with keywords useful to identify questions:
f = open('questions_keywords.txt', 'rt')
questions=f.readlines()
f.close()
# For each entry in the keywords file, check how many hotwords
# are in the spoken text:
Q = []
count = []
for i in range(0,len(questions)):
Q.append(questions[i].split())
count.append(sum(el in Q[i] for el in clean_text_tokens))
# The index of the presumabli asked question is the one with the highest score:
question_asked = count.index(max(count))
# Read the corresponding complete question:
f = open('questions_clear.txt', 'rt')
questions_clear=f.readlines()
f.close()
# String to be spoken:
say = "You are probably asking: " \
+ questions_clear[question_asked]\
+ " My answer to this question is very interesting, as shown in \
the dataset number " + str(question_asked+1) # +1 because
# we count from 1, not from 0, as Python does
print(say)
speak(say)