Rework ngram generation. Greatly improve performance of indexer. Commit horrendous sql sins

This commit is contained in:
rmgr 2024-05-04 21:10:46 +09:30
parent 9f0e7e6b29
commit bdb4064acc
5 changed files with 155 additions and 57 deletions

13
todo
View file

@ -1,6 +1,9 @@
[ ] Refactor website table to generic document table (maybe using URN instead of URL?)
[ ] Define tokens table FKed to document table
[ ] Refactor index.py to tokenize input into tokens table
[ ] Define N-Grams table
[ ] Add N-Gram generation to index.py
[x] Refactor website table to generic document table (maybe using URN instead of URL?)
[x] Define tokens table FKed to document table
[x] Refactor index.py to tokenize input into tokens table
[x] Define N-Grams table
[x] Add N-Gram generation to index.py
[x] Add clustered index to document_ngrams table model
[x] Add clustered index to document_tokens table model
[ ] Add ddl command to create partition tables