Overview
This documentation describes how to create a Text-to-Speech module as a cloud function. While the example uses Google Cloud Functions, other cloud providers can be used. The cloud function acts as an adapter for a text-to-speech service (e.g., PlayHT) and implements a predefined interface to integrate with the Add-on Mart as a Text-to-Speech module.
Supported cloud functions
Supported cloud functions are: Oracle Cloud Functions, AWS Lambdas, Azure Functions, Google Cloud Functions
Oracle cloud function example
Oracle cloud function specific example
How to create Google Cloud Function
Google cloud function example
Google cloud function specific example
Authentication options for Google Cloud Function
- Allow function to be triggered without authentication from specific IP addresses list.
- To implement IP-based access control you can utilize Google Cloud Armor in conjunction with an HTTP(S) Load Balancer.
- Bearer Token Authentication.
Deploy the function with the
--no-allow-unauthenticated
flag.Assign appropriate roles (e.g.,
Cloud Functions Invoker
) to the service account.Generate a key and share it with the PortaOne team so that the dispatcher application can generate a Bearer Token.
- Refer to Secure your Cloud Run function for detailed Google documentation.
- Static Token Authentication. Used in the example code. (May be deprecated in the future).
Use a predefined token shared between the caller and the Cloud Function.
Store the token securely as an environment variable in the Cloud Function.
Validate the incoming request by comparing the
Authorization
header with the static token.
Code for the Google Cloud Function
Below is the example implementation of the required endpoints for the Text-to-Speech adapter using Golang. This code serves as a reference and may be adapted to suit other implementations or platforms; however, it must strictly adhere to the predefined interface to ensure compatibility.
Link to the repository: https://gitlab.portaone.com:8949/read-only/playht-adapter
General Guidelines
- Project Structure: Organize the project based on the cloud function provider and chosen programming language. Refer to Write Cloud Run functions for detailed Google documentation.
HTTP Invocation: Ensure the function is invoked via an HTTP(S) request.
Error Handling: Return an appropriate error code along with a clear and descriptive error message. Use standard HTTP status codes (e.g. 4xx for client errors, 5xx for server errors).
Required Endpoints
The following endpoints must be implemented to meet the requirements for the T2S subsystem:
- /getLanguages - Returns a list of supported languages.
- /getVoices - Returns a list of available voices for the specified language.
- /synthesizeSpeech - Synthesizes speech from the given text using the specified voice.
Configure the function to serve all endpoints under a single URL. For example, in Go, use switch
cases to handle different paths. See the provided example code.
Request handlers
The function acts as an adapter to external Text-to-Speech service, enabling requests to that service by implementing PortaOne interfaces. The function shall accept input in the POST HTTP request in JSON format and returns output in JSON format. See the example of request handlers implementation.
If the external service does not provide an API for supported languages, maintain a static list locally. Validate incoming payloads to ensure all required fields are present. Respond with 400 Bad Request
for missing or malformed data.
Data structures
To meet the requirements of the PortaOne interface and integrate with external Text-to-Speech services, the function must adhere to predefined data structures across all endpoints. Define request and response structures for all endpoints to ensure compatibility. Example: /getVoices
should receive a GetVoicesListRequest
and return a GetVoicesListResponse
.Translate data from external APIs into the internal structures required by PortaOne. Example: Convert voice data from an external API into the GetVoicesListResponse
format.
Refer to the example data structure definitions.
Logging
Format for logging is not strictly limited but at least the following information should present in each log message:
- timestamp when this particular log message is printed;
- log level (info, warn, error, debug);
- "trace_id" - extracted from the "x-portaone-trace-id" request header.
Main purpose of usage "trace_id" property is to add ability to combine log entities related to some action triggered by user to single group in order to make log entities analysis easier. Write all logs to stdout
in JSON format.
Deployment example
To deploy the sample function, ensure the following environment variables are configured:
Mandatory environment variables:
- PLAYHT_API_URL - The base URL for the PlayHT API
- STATIC_AUTH_TOKEN - A static token used for authentication
Optional environment variables:
- API_TIMEOUT - Specifies the timeout duration for API calls (optional, default value is 30 seconds)
An example of a function deployment command:
gcloud functions deploy playhtAdapter \ --gen2 \ --region=europe-west1 \ --runtime=go122 \ --entry-point=adapter \ --trigger-http \ --allow-unauthenticated \ --set-env-vars PLAYHT_API_URL=https://api.play.ht/api/v2,STATIC_AUTH_TOKEN=******************
Text-to-Speech interface open API description
Request Examples
Note that the cloud function example expects PlayHT credentials in the following format: API_KEY::USER_ID
curl -s -X POST "https://<your-function-endpoint>/getLanguages" \ -H "Authorization: Bearer ***" \ -H "Content-Type: application/json" \ -d '{ "configuration_info": { "auth_info": { "api_key": "<API_KEY::USER_ID>" } } }' | jq { "success": true, "error": "", "languages": [ "en-US", "en-CA", "en-IN", "en-IE", "en-GB", "en-AU", "en-ZA", "en-FI", "en-FR", "en-IT", "en-MX", "en-NZ" ] }
curl -s -X POST "https://<your-function-endpoint>/getVoices" \ -H "Authorization: Bearer ***" \ -H "Content-Type: application/json" \ -d '{ "configuration_info": { "auth_info": { "api_key": "API_KEY::USER_ID" } }, "language_code": "en-NZ" }'| jq { "success": true, "error": "", "voices": [ "ID: s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json, Name: Ruby, Language: English (NZ), LanguageCode: en-NZ" ] }
curl -s -X POST "https://europe-west1-playht-adapter.cloudfunctions.net/playhtAdapter/synthesizeSpeech1" \ -H "Authorization: Bearer ***" \ -H "Content-Type: application/json" \ -d '{ "configuration_info": { "auth_info": { "api_key": "API_KEY::USER_ID" } }, "input": "Test of text to speech Google Cloud Function", "voice": "s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json" }' | jq { "success": true, "error": "", "audio_content": "<base64-encoded string>" }