Press "Enter" to skip to content

Talking to your PC. Asking (somewhat) specific questions and getting (really) specific answers. A Python script.

Last updated on 14/01/2024

It is nice and useful to interact by voice with a PC to quickly obtain information in a natural way. This is very easy when a finite number of possible questions about a topic are expected, because the problem is to identify the asked question out of a possible set of questions. Once the question is identified, the proper answer can be provided.
The following Python script allows you to interact by voice with a PC. Its goal is to identify the question posed out from a set of possible questions, and to provide the corresponding answer. This artificial intelligence system is therefore not open-ended. It rather focuses on providing the appropriate answer in the most direct and effective way.

This Python script uses the default microphone to listen to a question. Then the voice is converted to a string using gTTS, a Python library to interface with Google Translate’s text-to-speech API.
The string is decomposed into a list of words. Generic words, which do not expectedly convey much specific information, are removed from the list. The resulting simplified list is then compared to a list of keywords, each list being representative of one of the possible questions.
The candidate question is then identified as the one whose keywords best overlap with the simplified list. A pointer to the question is returned, which can be used to access the requested information necessary to deliver a proper answer.

How information is organized

The script uses three files, which must be saved in the same directory as the script:
1) cold_words.txt
2) questions_keywords.txt
3) questions_clear.txt
Any word, which is not informative, should be saved in cold_words.txt. Such “non-informative” words might be of the like of: “if,” “or“, “but“, “the“, etc.
The files “questions_keywords.txt” and “questions_clear.txt” are strictly dependant from each other, because the keywords related to the question in the k-th line of questions_clear.txt should be in the k-th line of questions_keywords.txt.
There is no upper limit to the number of entries k: as many questions as needed or desired can be added to the files.

How to access data for answering

The variable question_asked identifies the asked question and should be used to retrieve and deliver the corresponding set of information [this feature is not included in the script, because it should be implemented as needed].

Example of possible cold_words.txt content:

the of or in is and
because but
course if why

Example of questions_keywords.txt

apple apples
banana bananas
pear pears

Example of questions_clear.txt

how much are apples?
how much are bananas?
how much are pears?

Python script

import speech_recognition as sr # for speech-to-text
from gtts import gTTS # for text-to-speech
from io import BytesIO # for text-to-speech
import pygame # for text-to-speech
import os # for reading files

pygame.init()
pygame.mixer.init()

# Function to speak out the answer:
def speak(text, language='en'):
	mp3_fo = BytesIO()
	tts = gTTS(text, lang=language)
	tts.write_to_fp(mp3_fo)
	pygame.mixer.music.load(mp3_fo, 'mp3')
	pygame.mixer.music.play()

rec = sr.Recognizer()

with sr.Microphone() as source:
    print("I am getting used to ambient noise, just a second, please. ")
    rec.adjust_for_ambient_noise(source, duration=1)
    print("Ok, I'm ready. Please ask your question.")
    recorded_audio = rec.listen(source)
    print("Got it. ")

try:
    text = rec.recognize_google(recorded_audio, language="en-US", \
    show_all=False, with_confidence=False)

#    print("Decoded Text : {}".format(text))

except Exception as exc:
    print("Sorry, I encountered an error: ",exc)
    
# print("This is what I heard:", text)

# Upload the file containing "cold words", i.e. words that presumably do not
# convey useful information to detect the question asked:
f = open('cold_words.txt', 'rt')
cold_words=f.readlines()
f.close()

# Create a list of "cold words":
cold_words_list = []
for i in range(0,len(cold_words)):
    cold_words_list += cold_words[i].split()

# Split the spoken text into a list of single words:
text_tokens = text.split()

# Remove from "text" all the "cold words":
clean_text_tokens = [i for i in text_tokens if i not in cold_words_list]

# Upload the file with keywords useful to identify questions:
f = open('questions_keywords.txt', 'rt')
questions=f.readlines()
f.close()

# For each entry in the keywords file, check how many hotwords 
# are in the spoken text:
Q = []
count = []
for i in range(0,len(questions)):
    Q.append(questions[i].split())
    count.append(sum(el in Q[i] for el in clean_text_tokens))

# The index of the presumabli asked question is the one with the highest score:
question_asked = count.index(max(count)) 

# Read the corresponding complete question:
f = open('questions_clear.txt', 'rt')
questions_clear=f.readlines()
f.close()

# String to be spoken:
say = "You are probably asking: " \
    + questions_clear[question_asked]\
    + " My answer to this question is very interesting, as shown in \
        the dataset number " + str(question_asked+1) # +1 because 
        # we count from 1, not from 0, as Python does

print(say)

speak(say)