Reducing the size of paraphrase-multilingual-mpnet-base-v2 model
Reduce 768 dimension vector floats to 540 bit dimensions
Embeddings have been the talk of the town and everyone is trying to store these embeddings in vector databases.
There has been a new realization that maybe sometimes we do not need a vector database and simple wrappers on top of numpy are enough.
People are also talking about reducing dimensions because most of the models have super sized dimensions which no one knows if they are really needed.
When you come to think of it, this is the exact problem that our KeSieve approach solves.
First, because it is a bit vector, the embedding sizes come down by a large factor and hence the embeddings for all your text data can fit in RAM.
Second, by using a unique approach we can even reduce the number of dimensions which we have achieved in our example in training Wikipedia embeddings.
To build this we started with the paraphrase-multilingual-mpnet-base-v2 model. We generated 1 million embeddings using the all-mpnet model. Then using those embeddings we ran it through the KeSieve to get a 540 dimension bit vector. As you know the all-mpnet base model is 768 float dimensions. So 768*64 bits became 540 bits storage for one sentnce embedding. More than 64 times reduction. Even if we do quantization, we can still achieve a lot of compression. This is the reason we were able to fit 36 million sentences from Wikipedia in 2GB RAM.
And the results are really good. The search results are equivalent to the all-mpnet base model. In almost all cases we get 9 out of the top 10 results similar to the base model.
This approach will work for all embeddings. We can generate much smaller alternative embeddings to any base model. Cohere, OpenAI etc.