feat: add TTS Live Client #306

SandraRodgers · 2024-07-03T16:41:09Z

Introduce live speak functionality to existing speak client. Maintain existing interface for batch.

Summary by CodeRabbit

New Features
- Added detailed documentation for using the Deepgram Text-to-Speech API, including REST and WebSocket methods.
- Introduced live text-to-speech functionality, allowing real-time audio synthesis.
- Added new classes for managing text-to-speech requests and live connections, enhancing user interaction.
Bug Fixes
- Simplified export structure for the SpeakRestClient class.
Documentation
- Updated README with practical examples for both synchronous and asynchronous TTS implementations.
- Expanded documentation with event handling details for live text-to-speech synthesis.

src/packages/SpeakLiveClient.ts

src/packages/AbstractLiveClient.ts

examples/node-speak-live/index.js

src/packages/AbstractLiveClient.ts

src/packages/index.ts

src/packages/SpeakClient.ts

lukeocodes

Do we want to release this on alpha/beta or straight to main while the API is still in development?

README.md

jpvajda · 2024-09-06T17:01:02Z

@lukeocodes Things we need to change in this PR:

Rename of reset to clear
Clear message now has a response type.
We want to remove the container field

cc @dvonthenen

dvonthenen · 2024-09-06T21:28:41Z

@lukeocodes Things we need to change in this PR:

Rename of reset to clear

Clear message now has a response type.

We want to remove the container field

cc @dvonthenen

Yep, confirmed those should be the only changes between now and towards the end of July. 👍

coderabbitai · 2024-09-15T01:11:23Z

Walkthrough

The changes enhance the text-to-speech (TTS) functionality using the Deepgram API by adding documentation for both REST API and WebSocket methods in README.md. A live TTS implementation is introduced in index.js, alongside an enumeration for live TTS events in LiveTTSEvents.ts. The SpeakClient and SpeakLiveClient classes are added to facilitate interactions with TTS services. Additionally, export statements are updated to include these new components, ensuring their availability throughout the application.

Changes

Files	Change Summary
`README.md`	Added sections for REST API and WebSocket methods for Deepgram TTS with example usage.
`examples/node-speak-live/index.js`	Introduced live TTS functionality with event handling and audio data processing.
`src/lib/enums/LiveTTSEvents.ts`	Added enumeration for live TTS events, including socket and message events.
`src/lib/enums/index.ts`	Updated to export the new `LiveTTSEvents` enumeration.
`src/packages/SpeakClient.ts`	Added `SpeakClient` class with methods for REST and live TTS interactions.
`src/packages/SpeakLiveClient.ts`	Introduced `SpeakLiveClient` class for managing WebSocket connections for live TTS.
`src/packages/SpeakRestClient.ts`	Removed alias export for `SpeakRestClient`, simplifying the export structure.
`src/packages/index.ts`	Added exports for `SpeakClient` and `SpeakLiveClient` to the module's public API.

Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 54c9085 and e3a6e69.

Files selected for processing (8)

README.md (1 hunks)
examples/node-speak-live/index.js (1 hunks)
src/lib/enums/LiveTTSEvents.ts (1 hunks)
src/lib/enums/index.ts (1 hunks)
src/packages/SpeakClient.ts (1 hunks)
src/packages/SpeakLiveClient.ts (1 hunks)
src/packages/SpeakRestClient.ts (0 hunks)
src/packages/index.ts (1 hunks)

Files not reviewed due to no reviewable changes (1)

src/packages/SpeakRestClient.ts

Files skipped from review as they are similar to previous changes (7)

README.md
examples/node-speak-live/index.js
src/lib/enums/LiveTTSEvents.ts
src/lib/enums/index.ts
src/packages/SpeakClient.ts
src/packages/SpeakLiveClient.ts
src/packages/index.ts

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
-- I pushed a fix in commit <commit_id>, please review it.
-- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
-- @coderabbitai generate unit testing code for this file.
-- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
-- @coderabbitai generate interesting stats about this repository and render them as a table.
-- @coderabbitai read src/utils.ts and generate unit testing code.
-- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
-- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Outside diff range and nitpick comments (3)

examples/node-speak-live/index.js (1)

4-60: Excellent implementation of the live TTS functionality!

The code segment establishes a connection to the Deepgram service, sends text for synthesis, handles events, and writes the audio data to a file. The use of environment variables for the API key is a good security practice.

Consider adding a configuration option to specify the output file name instead of hardcoding it as output.mp3. This would make the code more flexible and allow users to customize the output file name.

README.md (2)

385-386: LGTM! Consider providing more context for the text variable.

The code segment correctly demonstrates the usage of the deepgram.speak.request method for text-to-speech synthesis using the REST API. The selected model aura-asteria-en is appropriate for English text-to-speech synthesis.

Consider defining or explaining the text variable in the surrounding context to provide clarity for users.

391-409: LGTM! Consider providing more context for the text variable.

The code segment correctly demonstrates the usage of the deepgram.speak.live method for live text-to-speech synthesis using WebSocket. The selected model aura-asteria-en is appropriate for English text-to-speech synthesis.

The event handling for Open and Close events, as well as the usage of sendText and flush methods, are implemented correctly.

Consider defining or explaining the text variable in the surrounding context to provide clarity for users.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between ecf09df and 54c9085.

Files selected for processing (8)

README.md (1 hunks)
examples/node-speak-live/index.js (1 hunks)
src/lib/enums/LiveTTSEvents.ts (1 hunks)
src/lib/enums/index.ts (1 hunks)
src/packages/SpeakClient.ts (1 hunks)
src/packages/SpeakLiveClient.ts (1 hunks)
src/packages/SpeakRestClient.ts (0 hunks)
src/packages/index.ts (1 hunks)

Files not reviewed due to no reviewable changes (1)

src/packages/SpeakRestClient.ts

Additional comments not posted (26)

src/lib/enums/index.ts (1)

3-3: LGTM!

The export statement correctly re-exports the LiveTTSEvents enumeration, expanding the module's public API. This change allows other parts of the codebase to utilize the LiveTTSEvents enumeration directly, potentially enhancing the functionality related to text-to-speech events.

The existing exports for LiveConnectionState and LiveTranscriptionEvents remain unchanged, ensuring that the core functionality is preserved.

src/packages/index.ts (2)

10-10: LGTM!

The export statement for SpeakClient has been added as suggested in the past review comments.

11-11: LGTM!

The export statement for SpeakLiveClient has been added as suggested in the past review comments.

src/lib/enums/LiveTTSEvents.ts (8)

16-16: LGTM!

The enum member Open is well-named and the comment clearly explains its purpose.

17-17: LGTM!

The enum member Close is well-named and the comment clearly explains its purpose.

18-18: LGTM!

The enum member Error is well-named and the comment clearly explains its purpose.

23-23: LGTM!

The enum member Metadata is well-named and the comment clearly explains its purpose.

24-24: LGTM!

The enum member Flushed is well-named and the comment clearly explains its purpose.

25-25: LGTM!

The enum member Warning is well-named and the comment clearly explains its purpose.

30-30: LGTM!

The enum member Audio is well-named and the comment clearly explains its purpose.

35-35: LGTM!

The enum member Unhandled is well-named and the comment clearly explains its purpose.
src/packages/SpeakClient.ts (2)
14-35: The class structure and design look good!

The SpeakClient class provides a clean and modular interface for interacting with the "speak" namespace. It encapsulates the functionality to interact with both REST and live APIs, which enhances the usability of the codebase.

20-24: Maintain backward compatibility with the existing dot notation setup.

The current implementation of the request method might break the existing interface that allows deepgram.speak.request(...etc). To maintain backward compatibility, consider updating the method as follows:
-  public request(source: TextSource, options?: SpeakSchema, endpoint = ":version/speak") {
-    const client = new SpeakRestClient(this.options);
-
-    return client.request(source, options, endpoint);
-  }
+  public request(
+    source: TextSource,
+    options?: SpeakSchema,
+    endpoint = ":version/speak"
+  ) {
+    const client = new SpeakRestClient(this.options);
+
+    return client.request(source, options, endpoint);
+  }
This change should maintain the existing interface while still allowing the new functionality of deepgram.speak.live(...etc).
examples/node-speak-live/index.js (4)

1-3: LGTM!

The required modules are imported correctly.

48-59: LGTM!

The writeFile function is implemented correctly. It writes the buffered audio data to a file, handles errors, and resets the buffer after writing. Good job!

62-62: LGTM!

Invoking the live function at the end of the file is the correct way to start the live TTS functionality.

1-62: Skipping past review comments.

The provided past review comments are not applicable to the current code. The code does not use a fetch override or WebSocket URL override, and it correctly passes the model option to the deepgram.speak.live function.

src/packages/SpeakLiveClient.ts (9)

26-34: LGTM!

The constructor correctly initializes the SpeakLiveClient class with the provided options and establishes the WebSocket connection by calling the connect method of the parent class.

44-62: LGTM!

The setupConnection method correctly sets up event handlers for various WebSocket events and emits appropriate events based on the connection state and incoming messages.

68-78: LGTM!

The handleTextMessage method correctly handles text messages received from the WebSocket connection and emits appropriate events based on the message type. It also emits an Unhandled event for unknown message types, which is a good practice for handling unexpected cases.

84-86: LGTM!

The handleBinaryMessage method correctly handles binary messages received from the WebSocket connection and emits an Audio event with the received data.

93-100: LGTM!

The sendText method correctly sends a JSON-formatted message with the Speak type and the provided text to the server for text-to-speech conversion.

105-111: LGTM!

The flush method correctly sends a JSON-formatted message with the Flush type to request the server to flush the current buffer and return generated audio.

116-122: LGTM!

The clear method correctly sends a JSON-formatted message with the Clear type to request the server to clear the current buffer.

127-133: LGTM!

The requestClose method correctly sends a JSON-formatted message with the Close type to request the server to close the connection.

139-162: LGTM!

The handleMessage method correctly handles incoming messages from the WebSocket connection. It distinguishes between string and binary data and calls the appropriate handler methods. It also emits an Error event for unknown data types or JSON parsing errors, which is a good practice for error handling.

jpvajda

LGMT!

lukeocodes requested changes Jul 3, 2024

View reviewed changes

SandraRodgers force-pushed the sr/add-tts-live-client branch from ce206d1 to 9134fb0 Compare July 3, 2024 21:27

naomi-lgbt reviewed Jul 3, 2024

View reviewed changes

examples/node-speak-live/index.js Outdated Show resolved Hide resolved

examples/node-speak-live/index.js Outdated Show resolved Hide resolved

lukeocodes requested changes Jul 4, 2024

View reviewed changes

src/packages/AbstractLiveClient.ts Outdated Show resolved Hide resolved

src/packages/index.ts Show resolved Hide resolved

src/packages/SpeakClient.ts Outdated Show resolved Hide resolved

SandraRodgers requested review from lukeocodes and naomi-lgbt July 8, 2024 18:11

SandraRodgers marked this pull request as ready for review July 8, 2024 18:12

SandraRodgers changed the title ~~feat: add SpeakLiveClient and LiveTTSEvents~~ feat: add TTS Live Client Jul 8, 2024

lukeocodes previously approved these changes Jul 8, 2024

View reviewed changes

README.md Show resolved Hide resolved

SandraRodgers dismissed lukeocodes’s stale review via 34356ad July 8, 2024 19:41

SandraRodgers requested a review from lukeocodes July 8, 2024 19:41

coderabbitai bot reviewed Sep 15, 2024

View reviewed changes

jpvajda approved these changes Sep 16, 2024

View reviewed changes

SandraRodgers and others added 8 commits September 16, 2024 10:22

feat: add SpeakLiveClient and LiveTTSEvents

9404732

feat: update AbstractLiveClient to handle binary data

4b6dd31

feat: add SpeakClient and example

bb9b257

feat: finishing touches TTS live client

bcec108

chore: update readme for TTS websocket

1061bcd

chore: update example and readme to be consistent with live STT

c57bcb6

feat: rename reset method to clear, respond with "Clear" payload

8a5bd73

feat: remove container from websocket speak options

e3a6e69

naomi-lgbt force-pushed the sr/add-tts-live-client branch from b6669ea to e3a6e69 Compare September 16, 2024 17:24

naomi-lgbt merged commit 2a03f9a into main Sep 18, 2024
4 checks passed

naomi-lgbt deleted the sr/add-tts-live-client branch September 18, 2024 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add TTS Live Client #306

feat: add TTS Live Client #306

SandraRodgers commented Jul 3, 2024 •

edited by coderabbitai bot

Loading

lukeocodes left a comment

jpvajda commented Sep 6, 2024 •

edited

Loading

dvonthenen commented Sep 6, 2024

coderabbitai bot commented Sep 15, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

jpvajda left a comment

feat: add TTS Live Client #306

feat: add TTS Live Client #306

Conversation

SandraRodgers commented Jul 3, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

lukeocodes left a comment

Choose a reason for hiding this comment

jpvajda commented Sep 6, 2024 • edited Loading

dvonthenen commented Sep 6, 2024

coderabbitai bot commented Sep 15, 2024 • edited Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

jpvajda left a comment

Choose a reason for hiding this comment

SandraRodgers commented Jul 3, 2024 •

edited by coderabbitai bot

Loading

jpvajda commented Sep 6, 2024 •

edited

Loading

coderabbitai bot commented Sep 15, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)