Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Cassandra db schema on session initialization #5922

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

akstron
Copy link

@akstron akstron commented Sep 2, 2024

…ution for initializing database.

Which problem is this PR solving?

Resolves #5797

Description of the changes

  • The PR includes the following changes:
    1. Embedding template files into binary
    1. Creation of database schema in initialization steps once session to database is established.

How was this change tested?

  • Currently not tested. Will test this shortly.

Checklist

…ution for initializing database.

Signed-off-by: Alok Kumar Singh <[email protected]>
@@ -131,6 +137,262 @@ func (f *Factory) configureFromOptions(o *Options) {
}
}

const (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move the changes to a separate file, like schema.go, there's no reason for this to be in factory.go

REPLICATION_FACTOR = `REPLICATION_FACTOR`
VERSION = `VERSION`
COMPACTION_WINDOW = `COMPACTION_WINDOW`
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality you're adding is only needed in jaeger-v2, which uses YAML config files. So there will be no env variables - you can simply use the struct below. But you should add mapstructure tags to the fields, as well as validation instructions (example: cmd/jaeger/internal/extension/jaegerquery/config.go)

datacenter string
trace_ttl int
dependencies_ttl int
keyspace string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should already be defined in the main config

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about this config: jaeger/cmd/jaeger/config-cassandra.yaml ? How would I know what all config key-value pairs are already defined?

Copy link
Author

@akstron akstron Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yurishkuro , I needed some help here. Is this: jaeger/cmd/jaeger/internal/all-in-one.yaml the example for the config file which would be used in v2? If yes, under what section should I expect the cassandra specific configs to be put? I couldn't find any example for this in the current code base.

Just to get the complete picture, would the schema configs I am adding look something like this:

schema:
    datacenter: <datacenterName>
    keyspace: <keyspace>
    .......

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd/jaeger/config-cassandra.yaml

)

// Parameters required for initilizing the db
type StorageConfigParams struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe SchemaConfig

var replication_factor, compaction_window_size int
var replication, compaction_window_unit string

mode := os.Getenv(MODE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually need the "mode"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v1 I suppose the mode env variable is used for simple testing scenarios, that's why I added it in this as well. Do you suggest removing it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if we can

//go:embed schema/v002.cql.tmpl
//go:embed schema/v003.cql.tmpl
//go:embed schema/v004.cql.tmpl
var schemaFile embed.FS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to embed all versions? we keep them mostly for historical reasons, but only the latest is used.

var schemaFile embed.FS

func handleTemplateReplacements(data []byte, params *StorageConfigParams) []byte {
templateKeysValuePairs := map[string]string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to rewrite the schema with Go template syntax and use that, because then the data binding can be directly against the config struct (better type safety).

return result
}

func constructQueriesFromTemplateFiles(session cassandra.Session, params *StorageConfigParams) ([]cassandra.Query, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cassandra.Session not able to execute multiple queries at once?

@@ -143,12 +405,21 @@ func (f *Factory) Initialize(metricsFactory metrics.Factory, logger *zap.Logger)
}
f.primarySession = primarySession

// After creating a session, execute commands to initialize the setup if not already present
if err := f.InitializeDB(primarySession); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if the schema already exists?

)

// Parameters required for initilizing the db
type StorageConfigParams struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro @akstron i'm working on a proposal to redefine the configs for cassandra in #5928 (comment). What are both of your thoughts on adding the configurations here into a mapstructure:"schema" grouping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think schema and data are different?

Copy link
Contributor

@mahadzaryab1 mahadzaryab1 Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should've mentioned. I was suggesting consolidating data and schema into one under schema.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create database scheme in Cassandra automatically
3 participants