Search¶
The classes in this module
Visit the interactive demo to learn how to use these classes.
Documentation¶
-
class
search.
EdXSearchEngine
(dataset_path, index_path, create=False)[source]¶ edX
-
__init__
(dataset_path, index_path, create=False)[source]¶ Creates a new search engine that searches over edX courses.
Parameters: - {string} (index_path) – the path to the edX course listings file.
- {string} – the path to a folder where you’d like to store the search engine index. The given folder doesn’t have to exist, but its parent folder does.
- {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
-
-
class
search.
GenericSearchEngine
[source]¶ An abstract class for any search engine, whether that’s an external API you’ve already built or a Whoosh-based search engine you can make from scratch via searchbetter.
This class encapsulates some useful functionality like query rewriting that can benefit any search engine, even one not made using SearchBetter tools.
Extending this class is easy - you just need to provide a search function and a few other details, and we’ll build in functionality from there.
-
process_raw_results
(raw_results)[source]¶ After rewriting, we’ll pass the full list of results in here for you to clean up. This could include sorting, removing duplicates, etc. (What you can do, and how you do it, really depends on what kind of objects your search engine returns.)
-
search
(term)[source]¶ Runs a plain-English search and returns results. :param term {String}: a query like you’d type into Google. :return: a list of dicts, each of which encodes a search result.
-
set_rewriter
(rewriter)[source]¶ Sets a new query rewriter (from this_package.rewriter) as the default rewriter for this search engine.
-
single_search
(term)[source]¶ Runs the search engine on a single term (no rewriting or anything), returning a list of objects.
Subclasses must implement!
Parameters: term (str) – a word or phrase to search for Returns: a list of objects that were found. Can be anything: dicts, strings, custom objects, whatever. Return type: list(object)
-
-
class
search.
HarvardXSearchEngine
(dataset_path, index_path, create=False)[source]¶ HX
-
__init__
(dataset_path, index_path, create=False)[source]¶ Creates a new HarvardX search engine. Searches over the HarvardX/DART database of all courses and course materials used in HarvardX. This includes videos, quizzes, etc.
TODO: consider renaming to DART, probz
Parameters: - {string} (index_path) – the path to the HarvardX course catalog CSV file.
- {string} – the path to a folder where you’d like to store the search engine index. The given folder doesn’t have to exist, but its parent folder does.
- {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
-
-
class
search.
PrebuiltSearchEngine
(search_fields, index_path)[source]¶ A search engine designed for when you’re just given a model file and can use that directly without having to build anything.
-
class
search.
UdacitySearchEngine
(dataset_path, index_path, create=False)[source]¶ Udacity
-
__init__
(dataset_path, index_path, create=False)[source]¶ Creates a new Udacity search engine.
Parameters: - {string} (index_path) – the path to the Udacity API JSON file.
- {string} – the path to a folder where you’d like to store the search engine index. The given folder doesn’t have to exist, but its parent folder does.
- {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
-
-
class
search.
WhooshResult
(dict_data, score)[source]¶ Encodes a search result from a Whoosh-based search engine. Basically a wrapper around a result dict and its relevance score (higher is better).
-
class
search.
WhooshSearchEngine
(create, search_fields, index_path)[source]¶ An abstract class for custom, Whoosh-based search engines.
A batteries-included search engine that can operate on any given dataset. Uses the Whoosh library to index and run searches on the dataset. Has built-in support for query rewriting.
-
__init__
(create, search_fields, index_path)[source]¶ Creates a new search engine.
Parameters: - {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
- {str[]} (search_fields) – An array names of fields in the index that our search engine will search against.
- {str} (index_path) – A relative path to a folder where the whoosh index should be stored.
-
create_index
()[source]¶ Creates and returns a brand-new index. This will call get_empty_index() behind the scenes. Subclasses must implement!
-
get_empty_index
(path, schema)[source]¶ Makes an empty index file, making the directory where it needs to be stored if necessary. Returns the index.
This is called within create_index(). TODO this breakdown is still confusing
-
get_num_documents
()[source]¶ Returns the number of documents in this search engine’s corpus. That is, this is the size of the search engine.
-
load_index
()[source]¶ Used when the index is already created. This just loads it and returns it for you.
-
single_search
(term)[source]¶ Helper function for search() that just returns search results for a single, non-rewritten search term. Returns a list of results, each of which is a Result object. The makeup of the results objects varies from search engine to search engine.
OVERRIDDEN from GenericSearchEngine.
-
-
search.
pack_byte
()¶ S.pack(v1, v2, ...) -> string
Return a string containing values v1, v2, ... packed according to this Struct’s format. See struct.__doc__ for more on format strings.
-
search.
unpack_byte
()¶ S.unpack(str) -> (v1, v2, ...)
Return tuple containing values unpacked according to this Struct’s format. Requires len(str) == self.size. See struct.__doc__ for more on format strings.