Search¶
The classes in this module
Visit the interactive demo to learn how to use these classes.
Documentation¶
-
class
search.
EdXSearchEngine
(dataset_path, index_path, create=False)[source]¶ edX
-
__init__
(dataset_path, index_path, create=False)[source]¶ Creates a new search engine that searches over edX courses.
Parameters: - {string} (index_path) – the path to the edX course listings file.
- {string} – the path to a folder where you’d like to store the search engine index. The given folder doesn’t have to exist, but its parent folder does.
- {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
-
-
class
search.
HarvardXSearchEngine
(dataset_path, index_path, create=False)[source]¶ HX
-
__init__
(dataset_path, index_path, create=False)[source]¶ Creates a new HarvardX search engine. Searches over the HarvardX/DART database of all courses and course materials used in HarvardX. This includes videos, quizzes, etc.
TODO: consider renaming to DART, probz
Parameters: - {string} (index_path) – the path to the HarvardX course catalog CSV file.
- {string} – the path to a folder where you’d like to store the search engine index. The given folder doesn’t have to exist, but its parent folder does.
- {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
-
-
class
search.
PrebuiltSearchEngine
(search_fields, index_path)[source]¶ A search engine designed for when you’re just given a model file and can use that directly without having to build anything.
-
class
search.
Result
(dict_data, score)[source]¶ Encodes a search result. Basically a wrapper around a result dict and its relevance score (higher is better).
-
class
search.
SearchEngine
(create, search_fields, index_path)[source]¶ An abstract class for search engines. A batteries-included search engine that can operate on any given dataset. Uses the Whoosh library to index and run searches on the dataset. Has built-in support for query rewriting.
-
__init__
(create, search_fields, index_path)[source]¶ Creates a new search engine.
Parameters: - {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
- {str[]} (search_fields) – An array names of fields in the index that our search engine will search against.
- {str} (index_path) – A relative path to a folder where the whoosh index should be stored.
-
create_index
()[source]¶ Creates and returns a brand-new index. This will call get_empty_index() behind the scenes. Subclasses must implement!
-
get_empty_index
(path, schema)[source]¶ Makes an empty index file, making the directory where it needs to be stored if necessary. Returns the index.
This is called within create_index(). TODO this breakdown is still confusing
-
get_num_documents
()[source]¶ Returns the number of documents in this search engine’s corpus. That is, this is the size of the search engine.
-
load_index
()[source]¶ Used when the index is already created. This just loads it and returns it for you.
-
-
class
search.
UdacitySearchEngine
(dataset_path, index_path, create=False)[source]¶ Udacity
-
__init__
(dataset_path, index_path, create=False)[source]¶ Creates a new Udacity search engine.
Parameters: - {string} (index_path) – the path to the Udacity API JSON file.
- {string} – the path to a folder where you’d like to store the search engine index. The given folder doesn’t have to exist, but its parent folder does.
- {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
-
-
search.
pack_byte
()¶ S.pack(v1, v2, ...) -> string
Return a string containing values v1, v2, ... packed according to this Struct’s format. See struct.__doc__ for more on format strings.
-
search.
unpack_byte
()¶ S.unpack(str) -> (v1, v2, ...)
Return tuple containing values unpacked according to this Struct’s format. Requires len(str) == self.size. See struct.__doc__ for more on format strings.