search-engine/todo

11 lines
600 B
Text

[x] Refactor website table to generic document table (maybe using URN instead of URL?)
[x] Define tokens table FKed to document table
[x] Refactor index.py to tokenize input into tokens table
[x] Define N-Grams table
[x] Add N-Gram generation to index.py
[x] Add clustered index to document_ngrams table model
[x] Add clustered index to document_tokens table model
[ ] Add ddl command to create partition tables
[x] Investigate whether or not robots.txt is as aggressive as I'm making ito ut to be
[x] Instead of starting from a random page on the site, go to root and find site map and crawl that