Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reason=pgvecto.rs: IPC connection is closed unexpectedly for 100M dataset #592

Open
agandra30 opened this issue Sep 16, 2024 · 3 comments
Open

Comments

@agandra30
Copy link

agandra30 commented Sep 16, 2024

Hi folks,

I am trying to validate and see if we could leverage pgvector.rs for our usecases at scale.
Tried to create an HNSW MetricType.L2: 'L2' and parameters looks like this :

create_index_after_load=True
max_parallel_workers=16 
quantization_type='trivial' 
index=<IndexType.HNSW: 'HNSW'> 
m=16 ef_search=100 
ef_construction=300  index for the 100M using HNSW 

I ran 3 times and it failed

dataset': {'data': {'name': 'LAION', 'size': 100000000, 'dim': 768, 'metric_type': <MetricType.L2: 'L2'>}}, 'db': 'PgVectoRS-100mHNSWpgvectorrsr1v1-100mHNSWpgvectorrsr1v1'} failed to run, reason=pgvecto.rs: IPC connection is closed unexpectedly.

NOTE: The DB is up and the table is present it it a problem with the plugin ? or shall we switch back to the older pgvector plugin instead of rs ? any recommendations ?

You are now connected to database "mydatabase" as user "postgres" mydatabase=# \dx; List of installed extensions Name | Version | Schema | Description ---------+---------+------------+---------------------------------------------------------------------------------------------- plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language vectors | 0.0.0 | vectors | vectors: Vector database plugin for Postgres, written in Rust, specifically designed for LLM

@VoVAllen
Copy link
Member

It's possible with 100M vectors, but you will need 1.5x-2x memory comparing to the total vector size

@agandra30
Copy link
Author

agandra30 commented Sep 17, 2024

@VoVAllen thanks for the reply when you say memory , my configuration is postgres 16.4 running on a baremetal ubuntu machine that have close to 1Ti of Memory and

The client machine is also a ubuntu machine where i am running the scripts to connect to the DB run the validations is also has approx. 500Gi of memory.

Do you still observe or recommend more memory requirements ?

Postgres server.(16.4) :

# free  -mh

               total        used        free      shared  buff/cache   available
Mem:           1.0Ti        12Gi       598Gi       152Mi       396Gi       989Gi
Swap:             0B          0B          0B

Client :

 free  -mh
               total        used        free      shared  buff/cache   available
Mem:           503Gi       9.1Gi       295Gi        40Mi       198Gi       490Gi
Swap:             0B          0B          0B

Also are there any postgres settings that you recommend ?

  1. Since pgvector.rs is failing , do you recommend pgvector the old extension instead of this ?
  2. Its a single node configuration any tunabels you recommend

@agandra30
Copy link
Author

agandra30 commented Sep 17, 2024

@VoVAllen I saw somewhere there is a fix given for this in 0.2.0, so i have even tried the executions with the latest release 0.3.0 . Still same issue ? any pointers will be highly appreciated

Also i have installed plugin from source not using the docker image , assuming any approach for installation shouldn't matter correct me if am wrong?

Also our postgres server is running or baremetal , (afaik postgres is monolith architecture and do not necessarily facilitate or improve better in running on k8s) does pgvector.rs has any such limitations or requirments ?

Because we really are looking at large data sets and at scale

image

@agandra30 agandra30 changed the title reason=pgvecto.rs: IPC connection is closed unexpectedly. reason=pgvecto.rs: IPC connection is closed unexpectedly for 100M dataset Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants