-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add TTS Live Client #306
Conversation
ce206d1
to
9134fb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to release this on alpha/beta or straight to main while the API is still in development?
@lukeocodes Things we need to change in this PR:
cc @dvonthenen |
Yep, confirmed those should be the only changes between now and towards the end of July. 👍 |
WalkthroughThe changes enhance the text-to-speech (TTS) functionality using the Deepgram API by adding documentation for both REST API and WebSocket methods in Changes
Recent review detailsConfiguration used: CodeRabbit UI Files selected for processing (8)
Files not reviewed due to no reviewable changes (1)
Files skipped from review as they are similar to previous changes (7)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (3)
examples/node-speak-live/index.js (1)
4-60
: Excellent implementation of the live TTS functionality!The code segment establishes a connection to the Deepgram service, sends text for synthesis, handles events, and writes the audio data to a file. The use of environment variables for the API key is a good security practice.
Consider adding a configuration option to specify the output file name instead of hardcoding it as
output.mp3
. This would make the code more flexible and allow users to customize the output file name.README.md (2)
385-386
: LGTM! Consider providing more context for thetext
variable.The code segment correctly demonstrates the usage of the
deepgram.speak.request
method for text-to-speech synthesis using the REST API. The selected modelaura-asteria-en
is appropriate for English text-to-speech synthesis.Consider defining or explaining the
text
variable in the surrounding context to provide clarity for users.
391-409
: LGTM! Consider providing more context for thetext
variable.The code segment correctly demonstrates the usage of the
deepgram.speak.live
method for live text-to-speech synthesis using WebSocket. The selected modelaura-asteria-en
is appropriate for English text-to-speech synthesis.The event handling for
Open
andClose
events, as well as the usage ofsendText
andflush
methods, are implemented correctly.Consider defining or explaining the
text
variable in the surrounding context to provide clarity for users.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (8)
- README.md (1 hunks)
- examples/node-speak-live/index.js (1 hunks)
- src/lib/enums/LiveTTSEvents.ts (1 hunks)
- src/lib/enums/index.ts (1 hunks)
- src/packages/SpeakClient.ts (1 hunks)
- src/packages/SpeakLiveClient.ts (1 hunks)
- src/packages/SpeakRestClient.ts (0 hunks)
- src/packages/index.ts (1 hunks)
Files not reviewed due to no reviewable changes (1)
- src/packages/SpeakRestClient.ts
Additional comments not posted (26)
src/lib/enums/index.ts (1)
3-3
: LGTM!The export statement correctly re-exports the
LiveTTSEvents
enumeration, expanding the module's public API. This change allows other parts of the codebase to utilize theLiveTTSEvents
enumeration directly, potentially enhancing the functionality related to text-to-speech events.The existing exports for
LiveConnectionState
andLiveTranscriptionEvents
remain unchanged, ensuring that the core functionality is preserved.src/packages/index.ts (2)
10-10
: LGTM!The export statement for
SpeakClient
has been added as suggested in the past review comments.
11-11
: LGTM!The export statement for
SpeakLiveClient
has been added as suggested in the past review comments.src/lib/enums/LiveTTSEvents.ts (8)
16-16
: LGTM!The enum member
Open
is well-named and the comment clearly explains its purpose.
17-17
: LGTM!The enum member
Close
is well-named and the comment clearly explains its purpose.
18-18
: LGTM!The enum member
Error
is well-named and the comment clearly explains its purpose.
23-23
: LGTM!The enum member
Metadata
is well-named and the comment clearly explains its purpose.
24-24
: LGTM!The enum member
Flushed
is well-named and the comment clearly explains its purpose.
25-25
: LGTM!The enum member
Warning
is well-named and the comment clearly explains its purpose.
30-30
: LGTM!The enum member
Audio
is well-named and the comment clearly explains its purpose.
35-35
: LGTM!The enum member
Unhandled
is well-named and the comment clearly explains its purpose.src/packages/SpeakClient.ts (2)
14-35
: The class structure and design look good!The
SpeakClient
class provides a clean and modular interface for interacting with the "speak" namespace. It encapsulates the functionality to interact with both REST and live APIs, which enhances the usability of the codebase.
20-24
: Maintain backward compatibility with the existing dot notation setup.The current implementation of the
request
method might break the existing interface that allowsdeepgram.speak.request(...etc)
. To maintain backward compatibility, consider updating the method as follows:- public request(source: TextSource, options?: SpeakSchema, endpoint = ":version/speak") { - const client = new SpeakRestClient(this.options); - - return client.request(source, options, endpoint); - } + public request( + source: TextSource, + options?: SpeakSchema, + endpoint = ":version/speak" + ) { + const client = new SpeakRestClient(this.options); + + return client.request(source, options, endpoint); + }This change should maintain the existing interface while still allowing the new functionality of
deepgram.speak.live(...etc)
.examples/node-speak-live/index.js (4)
1-3
: LGTM!The required modules are imported correctly.
48-59
: LGTM!The
writeFile
function is implemented correctly. It writes the buffered audio data to a file, handles errors, and resets the buffer after writing. Good job!
62-62
: LGTM!Invoking the
live
function at the end of the file is the correct way to start the live TTS functionality.
1-62
: Skipping past review comments.The provided past review comments are not applicable to the current code. The code does not use a fetch override or WebSocket URL override, and it correctly passes the
model
option to thedeepgram.speak.live
function.src/packages/SpeakLiveClient.ts (9)
26-34
: LGTM!The constructor correctly initializes the
SpeakLiveClient
class with the provided options and establishes the WebSocket connection by calling theconnect
method of the parent class.
44-62
: LGTM!The
setupConnection
method correctly sets up event handlers for various WebSocket events and emits appropriate events based on the connection state and incoming messages.
68-78
: LGTM!The
handleTextMessage
method correctly handles text messages received from the WebSocket connection and emits appropriate events based on the message type. It also emits anUnhandled
event for unknown message types, which is a good practice for handling unexpected cases.
84-86
: LGTM!The
handleBinaryMessage
method correctly handles binary messages received from the WebSocket connection and emits anAudio
event with the received data.
93-100
: LGTM!The
sendText
method correctly sends a JSON-formatted message with theSpeak
type and the provided text to the server for text-to-speech conversion.
105-111
: LGTM!The
flush
method correctly sends a JSON-formatted message with theFlush
type to request the server to flush the current buffer and return generated audio.
116-122
: LGTM!The
clear
method correctly sends a JSON-formatted message with theClear
type to request the server to clear the current buffer.
127-133
: LGTM!The
requestClose
method correctly sends a JSON-formatted message with theClose
type to request the server to close the connection.
139-162
: LGTM!The
handleMessage
method correctly handles incoming messages from the WebSocket connection. It distinguishes between string and binary data and calls the appropriate handler methods. It also emits anError
event for unknown data types or JSON parsing errors, which is a good practice for error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT!
b6669ea
to
e3a6e69
Compare
Introduce live speak functionality to existing
speak
client. Maintain existing interface for batch.Summary by CodeRabbit
New Features
Bug Fixes
Documentation