Rework ngram generation. Greatly improve performance of indexer. Commit horrendous sql sins

2024-05-04 21:10:46 +09:30 · 2024-05-04 21:10:46 +09:30 · bdb4064acc
commit bdb4064acc
parent 9f0e7e6b29
5 changed files with 155 additions and 57 deletions
--- a/13
+++ b/13
@ -1,6 +1,9 @@
-[ ] Refactor website table to generic document table (maybe using URN instead of URL?)
-[ ] Define tokens table FKed to document table
-[ ] Refactor index.py to tokenize input into tokens table
-[ ] Define N-Grams table 
-[ ] Add N-Gram generation to index.py
+[x] Refactor website table to generic document table (maybe using URN instead of URL?)
+[x] Define tokens table FKed to document table
+[x] Refactor index.py to tokenize input into tokens table
+[x] Define N-Grams table 
+[x] Add N-Gram generation to index.py
+[x] Add clustered index to document_ngrams table model
+[x] Add clustered index to document_tokens table model
+[ ] Add ddl command to create partition tables