I recently played the online game called context. I was intrigued by how it worked so I created a tiny clone that can be found through here.
How I decided to create my clone was using an AI model that generates embeddings, after that, I would store that information in a json file. With that, I could load the file in javascript and get the cosine similarity between a random word and the rest of the vocabulary.
I grabbed a random list of words from here.
The code to generate the embeddings is really small.
import json
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
with open('./vocabulary.txt','r') as file:
words = file.read()
words = words.split('\n')
embeddings = model.encode(words)
print(len(words), embeddings.shape)
vocab = list(zip(words, embeddings.tolist()))
with open('./embeddings.json', 'w') as file:
json.dump(vocab, file, indent=4)
I decided to create my own functions for the cosine similarity. So I needed to use two different functions for that. The first one is for the magnitud:
function magnitude(x) {
let sumOfSquares = 0;
for (let i = 0; i < x.length; i++) {
sumOfSquares += x[i] * x[i];
}
return Math.sqrt(sumOfSquares);
}
And a function for the dot product:
function dotProduct(a, b) {
let dotProduct = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
}
return dotProduct;
}
Now, with the two parts, I can create a cosine similarity function of my own:
function cosineSimilarity(a, b) {
const dotProd = dotProduct(a, b);
const magA = magnitude(a);
const magB = magnitude(b);
if (magA === 0 || magB === 0) {
return 0;
} else {
return dotProd / (magA*magB);
}
}
Finally, to serve the small webapp, I went with a tiny server written in pure node and no dependencies. I decided to go for no dependencies because I might at some point revisit this project and I do not want to fight dependencies in three years.
I basically sticked to only using a single html file for the frontend with the javascript in there to check if the current word is possible or not and to re-order the words every time a new word is inputed.
For a weekend project, I really liked how everything turned out. I think that for now, it is a solid version 0.1 and maybe I could add some small features to make it more confortable.