A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.
Airport SelfService Kiosk demo, to demonstrate how microphone streaming to GCP works, from a web application.
It makes use of the following GCP resources:
- Dialogflow & Knowledge Bases
 - Speech to Text
 - Text to Speech
 - Translate API
 - (optionally) App Engine Flex
 
In this demo, you can start recording your voice, it will display answers on a screen and synthesize the speech.
A working demo can be found here: http://selfservicedesk.appspot.com/
I wrote very extensive blog articles on how to setup your streaming project. Want to exactly learn how this code works? Have a start here:
Blog 1: Introduction to the GCP conversational AI components, and integrating your own voice AI in a web app.
Blog 2: Building a client-side web application which streams audio from a browser microphone to a server.
Blog 3: Building a web server which receives a browser microphone stream and uses Dialogflow or the Speech to Text API for retrieving text results.
Blog 4: Getting Audio Data from Text (Text to Speech) and play it in your browser.
There's a presentation and a video that accompanies the tutorial.
- 
apt-get install nodejs -y - 
apt-get npm 
sudo npm install -g @angular/cli
- 
git clone https://github.com/dialogflow/selfservicekiosk-audio-streaming.git selfservicekiosk - 
Set the PROJECT_ID variable: export PROJECT_ID=[gcp-project-id]
 - 
Set the project:
gcloud config set project $PROJECT_ID - 
Download the service account key.
 - 
Assign the key to environment var: GOOGLE_APPLICATION_CREDENTIALS
 
LINUX/MAC
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account.json
WIN
set GOOGLE_APPLICATION_CREDENTIALS=c:\path\to\service_account.json
- 
Login:
gcloud auth login - 
Open server/env.txt, change the environment variables and rename the file to server/.env
 - 
Enable APIs:
 
 gcloud services enable \
 appengineflex.googleapis.com \
 containerregistry.googleapis.com \
 cloudbuild.googleapis.com \
 cloudtrace.googleapis.com \
 dialogflow.googleapis.com \
 logging.googleapis.com \
 monitoring.googleapis.com \
 sourcerepo.googleapis.com \
 speech.googleapis.com \
 mediatranslation.googleapis.com \
 texttospeech.googleapis.com \
 translate.googleapis.com
- 
Build the client-side Angular app:
cd client && sudo npm install npm run-script build - 
Start the server Typescript app, which is exposed on port 8080:
cd ../server && sudo npm install npm run-script watch - 
Browse to http://localhost:8080
 
- 
Create a Dialogflow agent at: http://console.dialogflow.com
 - 
Zip the contents of the dialogflow folder, from this repo.
 - 
Click settings > Import, and upload the Dialogflow agent zip, you just created.
 - 
Caution: Knowledge connector settings are not currently included when exporting, importing, or restoring agents.
Make sure you have enabled Beta features in settings.
- Select Knowledge from the left menu.
 - Create a Knowledge Base: Airports
 - Add the following Knowledge Base FAQs, as text/html documents:
 
- https://www.panynj.gov/port-authority/en/help-center/faq/airports-faq-help-center.html
 - https://www.schiphol.nl/en/before-you-take-off/
 - https://www.flysfo.com/faqs
 
- As a response it requires the following custom payload:
 
{ "knowledgebase": true, "QUESTION": "$Knowledge.Question[1]", "ANSWER": "$Knowledge.Answer[1]" }- And to make the Text to Speech version of the answer working add the following Text SSML response:
 
$Knowledge.Answer[1] 
This demo makes heavy use of websockets and
the microphone getUserMedia() HTML5 API requires
to run over HTTPS. Therefore, I deploy this demo
with a custom runtime, so I can include my own Dockerfile.
- 
Edit the app.yaml to tweak the environment variables. Set the correct Project ID.
 - 
Deploy with:
gcloud app deploy - 
Browse:
gcloud app browse 
The selfservice kiosk is a full end to end application. To showcase smaller examples, I've created 6 small demos. Here's how you can get these running:
- 
Install the required libraries, run the following command from the examples folder:
npm install - 
Start the simpleserver node app:
npm --EXAMPLE=1 --PORT=8080 --PROJECT_ID=[your-gcp-project-id] run start 
To switch to the various examples, edit the EXAMPLE variable to one of these:
- Example 1: Dialogflow Speech Intent Detection
 - Example 2: Dialogflow Speech Detection through streaming
 - Example 3: Dialogflow Speech Intent Detection with Text to Speech output
 - Example 4: Speech to Text Transcribe Recognize Call
 - Example 5: Speech to Text Transcribe Streaming Recognize
 - Example 6: Text to Speech in a browser
 
- Browse to http://localhost:8080. Open the inspector, to preview the Dialogflow results object.
 
The code required for these examples can be found in simpleserver.js for the different Dialogflow & STT calls. - example1.html - example5.html will show the client-side implementations.
Apache 2.0
This is not an official Google product.