Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add TTS Live Client #306

Merged
merged 8 commits into from
Sep 18, 2024
Merged

feat: add TTS Live Client #306

merged 8 commits into from
Sep 18, 2024

Conversation

SandraRodgers
Copy link
Contributor

@SandraRodgers SandraRodgers commented Jul 3, 2024

Introduce live speak functionality to existing speak client. Maintain existing interface for batch.

Summary by CodeRabbit

  • New Features

    • Added detailed documentation for using the Deepgram Text-to-Speech API, including REST and WebSocket methods.
    • Introduced live text-to-speech functionality, allowing real-time audio synthesis.
    • Added new classes for managing text-to-speech requests and live connections, enhancing user interaction.
  • Bug Fixes

    • Simplified export structure for the SpeakRestClient class.
  • Documentation

    • Updated README with practical examples for both synchronous and asynchronous TTS implementations.
    • Expanded documentation with event handling details for live text-to-speech synthesis.

src/packages/SpeakLiveClient.ts Show resolved Hide resolved
src/packages/SpeakLiveClient.ts Show resolved Hide resolved
src/packages/SpeakLiveClient.ts Outdated Show resolved Hide resolved
src/packages/SpeakLiveClient.ts Outdated Show resolved Hide resolved
src/packages/SpeakLiveClient.ts Show resolved Hide resolved
src/packages/SpeakLiveClient.ts Outdated Show resolved Hide resolved
src/packages/AbstractLiveClient.ts Outdated Show resolved Hide resolved
examples/node-speak-live/index.js Outdated Show resolved Hide resolved
examples/node-speak-live/index.js Outdated Show resolved Hide resolved
src/packages/AbstractLiveClient.ts Outdated Show resolved Hide resolved
src/packages/index.ts Show resolved Hide resolved
src/packages/SpeakClient.ts Outdated Show resolved Hide resolved
@SandraRodgers SandraRodgers marked this pull request as ready for review July 8, 2024 18:12
@SandraRodgers SandraRodgers changed the title feat: add SpeakLiveClient and LiveTTSEvents feat: add TTS Live Client Jul 8, 2024
lukeocodes
lukeocodes previously approved these changes Jul 8, 2024
Copy link
Contributor

@lukeocodes lukeocodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to release this on alpha/beta or straight to main while the API is still in development?

README.md Show resolved Hide resolved
@jpvajda
Copy link
Contributor

jpvajda commented Sep 6, 2024

@lukeocodes Things we need to change in this PR:

  • Rename of reset to clear
  • Clear message now has a response type.
  • We want to remove the container field

cc @dvonthenen

@dvonthenen
Copy link
Contributor

@lukeocodes Things we need to change in this PR:

  • Rename of reset to clear
  • Clear message now has a response type.
  • We want to remove the container field

cc @dvonthenen

Yep, confirmed those should be the only changes between now and towards the end of July. 👍

Copy link

coderabbitai bot commented Sep 15, 2024

Walkthrough

The changes enhance the text-to-speech (TTS) functionality using the Deepgram API by adding documentation for both REST API and WebSocket methods in README.md. A live TTS implementation is introduced in index.js, alongside an enumeration for live TTS events in LiveTTSEvents.ts. The SpeakClient and SpeakLiveClient classes are added to facilitate interactions with TTS services. Additionally, export statements are updated to include these new components, ensuring their availability throughout the application.

Changes

Files Change Summary
README.md Added sections for REST API and WebSocket methods for Deepgram TTS with example usage.
examples/node-speak-live/index.js Introduced live TTS functionality with event handling and audio data processing.
src/lib/enums/LiveTTSEvents.ts Added enumeration for live TTS events, including socket and message events.
src/lib/enums/index.ts Updated to export the new LiveTTSEvents enumeration.
src/packages/SpeakClient.ts Added SpeakClient class with methods for REST and live TTS interactions.
src/packages/SpeakLiveClient.ts Introduced SpeakLiveClient class for managing WebSocket connections for live TTS.
src/packages/SpeakRestClient.ts Removed alias export for SpeakRestClient, simplifying the export structure.
src/packages/index.ts Added exports for SpeakClient and SpeakLiveClient to the module's public API.

Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 54c9085 and e3a6e69.

Files selected for processing (8)
  • README.md (1 hunks)
  • examples/node-speak-live/index.js (1 hunks)
  • src/lib/enums/LiveTTSEvents.ts (1 hunks)
  • src/lib/enums/index.ts (1 hunks)
  • src/packages/SpeakClient.ts (1 hunks)
  • src/packages/SpeakLiveClient.ts (1 hunks)
  • src/packages/SpeakRestClient.ts (0 hunks)
  • src/packages/index.ts (1 hunks)
Files not reviewed due to no reviewable changes (1)
  • src/packages/SpeakRestClient.ts
Files skipped from review as they are similar to previous changes (7)
  • README.md
  • examples/node-speak-live/index.js
  • src/lib/enums/LiveTTSEvents.ts
  • src/lib/enums/index.ts
  • src/packages/SpeakClient.ts
  • src/packages/SpeakLiveClient.ts
  • src/packages/index.ts

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai generate interesting stats about this repository and render them as a table.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (3)
examples/node-speak-live/index.js (1)

4-60: Excellent implementation of the live TTS functionality!

The code segment establishes a connection to the Deepgram service, sends text for synthesis, handles events, and writes the audio data to a file. The use of environment variables for the API key is a good security practice.

Consider adding a configuration option to specify the output file name instead of hardcoding it as output.mp3. This would make the code more flexible and allow users to customize the output file name.

README.md (2)

385-386: LGTM! Consider providing more context for the text variable.

The code segment correctly demonstrates the usage of the deepgram.speak.request method for text-to-speech synthesis using the REST API. The selected model aura-asteria-en is appropriate for English text-to-speech synthesis.

Consider defining or explaining the text variable in the surrounding context to provide clarity for users.


391-409: LGTM! Consider providing more context for the text variable.

The code segment correctly demonstrates the usage of the deepgram.speak.live method for live text-to-speech synthesis using WebSocket. The selected model aura-asteria-en is appropriate for English text-to-speech synthesis.

The event handling for Open and Close events, as well as the usage of sendText and flush methods, are implemented correctly.

Consider defining or explaining the text variable in the surrounding context to provide clarity for users.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between ecf09df and 54c9085.

Files selected for processing (8)
  • README.md (1 hunks)
  • examples/node-speak-live/index.js (1 hunks)
  • src/lib/enums/LiveTTSEvents.ts (1 hunks)
  • src/lib/enums/index.ts (1 hunks)
  • src/packages/SpeakClient.ts (1 hunks)
  • src/packages/SpeakLiveClient.ts (1 hunks)
  • src/packages/SpeakRestClient.ts (0 hunks)
  • src/packages/index.ts (1 hunks)
Files not reviewed due to no reviewable changes (1)
  • src/packages/SpeakRestClient.ts
Additional comments not posted (26)
src/lib/enums/index.ts (1)

3-3: LGTM!

The export statement correctly re-exports the LiveTTSEvents enumeration, expanding the module's public API. This change allows other parts of the codebase to utilize the LiveTTSEvents enumeration directly, potentially enhancing the functionality related to text-to-speech events.

The existing exports for LiveConnectionState and LiveTranscriptionEvents remain unchanged, ensuring that the core functionality is preserved.

src/packages/index.ts (2)

10-10: LGTM!

The export statement for SpeakClient has been added as suggested in the past review comments.


11-11: LGTM!

The export statement for SpeakLiveClient has been added as suggested in the past review comments.

src/lib/enums/LiveTTSEvents.ts (8)

16-16: LGTM!

The enum member Open is well-named and the comment clearly explains its purpose.


17-17: LGTM!

The enum member Close is well-named and the comment clearly explains its purpose.


18-18: LGTM!

The enum member Error is well-named and the comment clearly explains its purpose.


23-23: LGTM!

The enum member Metadata is well-named and the comment clearly explains its purpose.


24-24: LGTM!

The enum member Flushed is well-named and the comment clearly explains its purpose.


25-25: LGTM!

The enum member Warning is well-named and the comment clearly explains its purpose.


30-30: LGTM!

The enum member Audio is well-named and the comment clearly explains its purpose.


35-35: LGTM!

The enum member Unhandled is well-named and the comment clearly explains its purpose.

src/packages/SpeakClient.ts (2)

14-35: The class structure and design look good!

The SpeakClient class provides a clean and modular interface for interacting with the "speak" namespace. It encapsulates the functionality to interact with both REST and live APIs, which enhances the usability of the codebase.


20-24: Maintain backward compatibility with the existing dot notation setup.

The current implementation of the request method might break the existing interface that allows deepgram.speak.request(...etc). To maintain backward compatibility, consider updating the method as follows:

-  public request(source: TextSource, options?: SpeakSchema, endpoint = ":version/speak") {
-    const client = new SpeakRestClient(this.options);
-
-    return client.request(source, options, endpoint);
-  }
+  public request(
+    source: TextSource,
+    options?: SpeakSchema,
+    endpoint = ":version/speak"
+  ) {
+    const client = new SpeakRestClient(this.options);
+
+    return client.request(source, options, endpoint);
+  }

This change should maintain the existing interface while still allowing the new functionality of deepgram.speak.live(...etc).

examples/node-speak-live/index.js (4)

1-3: LGTM!

The required modules are imported correctly.


48-59: LGTM!

The writeFile function is implemented correctly. It writes the buffered audio data to a file, handles errors, and resets the buffer after writing. Good job!


62-62: LGTM!

Invoking the live function at the end of the file is the correct way to start the live TTS functionality.


1-62: Skipping past review comments.

The provided past review comments are not applicable to the current code. The code does not use a fetch override or WebSocket URL override, and it correctly passes the model option to the deepgram.speak.live function.

src/packages/SpeakLiveClient.ts (9)

26-34: LGTM!

The constructor correctly initializes the SpeakLiveClient class with the provided options and establishes the WebSocket connection by calling the connect method of the parent class.


44-62: LGTM!

The setupConnection method correctly sets up event handlers for various WebSocket events and emits appropriate events based on the connection state and incoming messages.


68-78: LGTM!

The handleTextMessage method correctly handles text messages received from the WebSocket connection and emits appropriate events based on the message type. It also emits an Unhandled event for unknown message types, which is a good practice for handling unexpected cases.


84-86: LGTM!

The handleBinaryMessage method correctly handles binary messages received from the WebSocket connection and emits an Audio event with the received data.


93-100: LGTM!

The sendText method correctly sends a JSON-formatted message with the Speak type and the provided text to the server for text-to-speech conversion.


105-111: LGTM!

The flush method correctly sends a JSON-formatted message with the Flush type to request the server to flush the current buffer and return generated audio.


116-122: LGTM!

The clear method correctly sends a JSON-formatted message with the Clear type to request the server to clear the current buffer.


127-133: LGTM!

The requestClose method correctly sends a JSON-formatted message with the Close type to request the server to close the connection.


139-162: LGTM!

The handleMessage method correctly handles incoming messages from the WebSocket connection. It distinguishes between string and binary data and calls the appropriate handler methods. It also emits an Error event for unknown data types or JSON parsing errors, which is a good practice for error handling.

Copy link
Contributor

@jpvajda jpvajda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT!

@naomi-lgbt naomi-lgbt merged commit 2a03f9a into main Sep 18, 2024
4 checks passed
@naomi-lgbt naomi-lgbt deleted the sr/add-tts-live-client branch September 18, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants