Building a vector database in 2GB for 36 million Wikipedia passages
gpt3experiments.substack.com
Wikipedia neural search running on a laptop(2GB small model and 10GB large model). https://speech-kws.ozonetel.com/wiki Thanks to @CohereAI for releasing the Wikipedia embedding dataset. I saw that for 36 million passages the embedding size would be around 120 GB. So if I had to host the embeddings and enable neural search on this dataset I will have to do LLM ops and run vectorDB clusters. A quick search on Pinecone shows that for storage optimized it will cost around $1000 and for performance optimized it costs around $3000 to host this dataset.
Building a vector database in 2GB for 36 million Wikipedia passages
Building a vector database in 2GB for 36…
Building a vector database in 2GB for 36 million Wikipedia passages
Wikipedia neural search running on a laptop(2GB small model and 10GB large model). https://speech-kws.ozonetel.com/wiki Thanks to @CohereAI for releasing the Wikipedia embedding dataset. I saw that for 36 million passages the embedding size would be around 120 GB. So if I had to host the embeddings and enable neural search on this dataset I will have to do LLM ops and run vectorDB clusters. A quick search on Pinecone shows that for storage optimized it will cost around $1000 and for performance optimized it costs around $3000 to host this dataset.