Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: [WIP] Change persistence folder structure #211

Open
1 task done
janaka opened this issue Feb 4, 2024 · 1 comment
Open
1 task done

RFC: [WIP] Change persistence folder structure #211

janaka opened this issue Feb 4, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request proposal

Comments

@janaka
Copy link
Contributor

janaka commented Feb 4, 2024

Situation

Data persistence on disk isn't consistently separated by scope of ownership.

Current filesystem structure:

  • /index/PERSONAL/{user_id}/ - index files for Ask Your Docs feature

  • /index/SHARED/{org_id}/{space_id}/ - index files for Spaces

  • /sqlite/PERSONAL/{user_id)/usage.db - retrieval and LLM request and response data (chat history etc) for all interactions

  • /sqlite/SHARED/system.db - system data and metadata (orgs, users, user_groups, spaces, and space_groups)

  • /upload/PERSONAL/{user_id}/ - Ask Your Docs feature is hard coded to MANUAL_UPLOAD document. Those files are persisted here.

  • /upload/SHARED/{org_id}/{space_id} - file uploads for any spaces with datasource = MANUAL_UPLOAD are persisted here.

Database table to file mapping:

  • usage.db : settings (user scoped), history_{feature_name}, history_thread_{feature_name}
  • system.db : orgs, org_members, users, settings (none user scoped), space_groups, space_group_members, spaces, space_access, user_groups, user_group_memebers

Tables with joins:

  • orgs <> org_members
  • org_members <> users
  • spaces <> space_access <> users
  • spaces <> space_group_members
  • users <> user_group_members

This structure isn't ideal with the addition of Orgs and given upcoming features such as public chatbots and changing the Ask You Docs functionality to be structured as a personal org.

  • The semantics of space type (PERSONAL or SHARED) don't hold any longer for determining persistence location
  • FeatureType to pass a user_id context around, then using that to decide on a persistence location is also not great.

Goals and Requirements

  • Reduce the risk of org-owned content data (e.g. confidential docs that are indexed and chatted history against them) leaking across org boundaries
  • Make it easier to migrate an entire org from a multi-tenant instance to a dedicated instance.
  • Usage data (chat history etc): always strictly scoped to an org and user, hence personal.
  • Org System data (user_groups, spaces metadata etc.) - can be shared across multiple users but always strictly scoped to an org
  • Global System data - users and org_members are the only system-wide shared data i.e. accessible across orgs.
  • Presentation layer and domain concepts (such as features and space types) should not be directly coupled to the persistence layer system logic

Proposal

Structure folders based on a name for the persistence system followed by one or more keys that identify the unique owner of the data. The filename describes the data.

Pattern:

/{persistance_system_name}/{owner_scope_key_1}/../{owner_scope_key_n}/{filename}

Concrete changes:

  • /index/SHARED/{org_id}/{space_id}/ --> /index/orgs/{org_id}/{space_id}/

  • /index/PERSONAL/{user_id}/ --> Same as above. Ask Your Docs changes to be achieved by providing every user a personal org. A space that isn't shared with any other users is private.

  • /index/THREAD/{org_id}/{space_id}/ --> /index/personal/{user_id}/{space_id}

  • /sqlite/PERSONAL/{user_id)/usage.db --> /sqlite/personal/{user_id)/usage.db - authenticated user usage

  • /sqlite/SHARED/system.db --> /sqlite/global/system.db - global system

  • new --> /sqlite/orgs/{org_id}/system.db - org system

  • settings org and user scope - both (?) should be stored in the same settings table in /sqlite/orgs/{org_id}/system.db

  • settings global scope - /sqlite/global/system.db

  • /upload/SHARED/{org_id}/{space_id} --> /upload/org/{org_id}/{space_id}

  • /upload/PERSONAL/{user_id}/ --> same as above because of the changes to Ask Your Docs.

  • /upload/THREAD/{org_id}/space_id} --> /upload/personal/{user_id}/{space_id}

New use cases:

  • public chat bot --> /sqlite/personal/anon-{user_id}/usage.db - anonymous user usage. user_id is a generated guid.
  • experiment projects (ideally we just want to be able to prefix the root folder. this needs more thought)
    • usage:
    • index:
    • settings:
    • spaces: /sqlite/orgs/{org_id}/experiments/system.db
@janaka janaka self-assigned this Feb 4, 2024
@janaka janaka added enhancement New feature or request proposal labels Feb 4, 2024
@janaka
Copy link
Contributor Author

janaka commented Feb 4, 2024

Partially implementing as part of #207 as this involves org-scoped data. Partial because migrating existing data structure to the new in deployed systems will not be handled. A new DataScope enum has been introduced with backwards-compatible mappings where needed.

janaka added a commit that referenced this issue Feb 26, 2024
Personas will use the new file structure for org scoped data.

A step towards the new file structure proposed in RFC #211
janaka added a commit that referenced this issue Feb 26, 2024
* refactor: store layer implementing DataScope.
  - Personas will use the new file structure for org scoped data.
  - A step towards the new file structure proposed in RFC #211
* refactor: personas data entity to assistants
* add DAL functions for personas
* build: update Ruff package
* fix: remember and set selected Assistant for "new chat"
* adjust agent run logic
* refactor(chat ui): remove dates
* fix the data location refactor.
* improve the assistant edit UI
  - populate the system and user prompt fields with stater.
* refactor: use the LLM setting collection defined on the assistant.
* fix(UI): only show files/knowledge for shared ask
* chat ui: tweak upload file drop zone
* fix(UI): hide ML Engineering section for none admins
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request proposal
Projects
None yet
Development

No branches or pull requests

1 participant