Rework ngram generation. Greatly improve performance of indexer. Commit horrendous sql sins
This commit is contained in:
parent
9f0e7e6b29
commit
bdb4064acc
5 changed files with 155 additions and 57 deletions
13
todo
13
todo
|
|
@ -1,6 +1,9 @@
|
|||
[ ] Refactor website table to generic document table (maybe using URN instead of URL?)
|
||||
[ ] Define tokens table FKed to document table
|
||||
[ ] Refactor index.py to tokenize input into tokens table
|
||||
[ ] Define N-Grams table
|
||||
[ ] Add N-Gram generation to index.py
|
||||
[x] Refactor website table to generic document table (maybe using URN instead of URL?)
|
||||
[x] Define tokens table FKed to document table
|
||||
[x] Refactor index.py to tokenize input into tokens table
|
||||
[x] Define N-Grams table
|
||||
[x] Add N-Gram generation to index.py
|
||||
[x] Add clustered index to document_ngrams table model
|
||||
[x] Add clustered index to document_tokens table model
|
||||
[ ] Add ddl command to create partition tables
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue