SearchBetter
stable
  • Rewriter
  • Search
    • Usage
    • Documentation
SearchBetter
  • Docs »
  • Search
  • Edit on GitHub

Search¶

The classes in this module

Visit the interactive demo to learn how to use these classes.

Usage¶

First, pip install searchbetter.

Then, in your Python code:

from searchbetter import search

Documentation¶

class search.EdXSearchEngine(dataset_path, index_path, create=False)[source]¶

edX

__init__(dataset_path, index_path, create=False)[source]¶

Creates a new search engine that searches over edX courses.

Parameters:
  • {string} (index_path) – the path to the edX course listings file.
  • {string} – the path to a folder where you’d like to store the search engine index. The given folder doesn’t have to exist, but its parent folder does.
  • {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
count_words()[source]¶

Returns the number of words in the underlying Udacity dataset.

create_index()[source]¶

Creates a new index to search the dataset. You only need to call this once; once the index is created, you can just load it again instead of creating it afresh all the time.

Returns the index object.

class search.HarvardXSearchEngine(dataset_path, index_path, create=False)[source]¶

HX

__init__(dataset_path, index_path, create=False)[source]¶

Creates a new HarvardX search engine. Searches over the HarvardX/DART database of all courses and course materials used in HarvardX. This includes videos, quizzes, etc.

TODO: consider renaming to DART, probz

Parameters:
  • {string} (index_path) – the path to the HarvardX course catalog CSV file.
  • {string} – the path to a folder where you’d like to store the search engine index. The given folder doesn’t have to exist, but its parent folder does.
  • {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
create_index()[source]¶

Creates a new index to search the dataset. You only need to call this once; once the index is created, you can just load it again instead of creating it afresh all the time.

Returns the index object.

class search.PrebuiltSearchEngine(search_fields, index_path)[source]¶

A search engine designed for when you’re just given a model file and can use that directly without having to build anything.

class search.Result(dict_data, score)[source]¶

Encodes a search result. Basically a wrapper around a result dict and its relevance score (higher is better).

get_dict()[source]¶

Get the underlying dict data

class search.SearchEngine(create, search_fields, index_path)[source]¶

An abstract class for search engines. A batteries-included search engine that can operate on any given dataset. Uses the Whoosh library to index and run searches on the dataset. Has built-in support for query rewriting.

__init__(create, search_fields, index_path)[source]¶

Creates a new search engine.

Parameters:
  • {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
  • {str[]} (search_fields) – An array names of fields in the index that our search engine will search against.
  • {str} (index_path) – A relative path to a folder where the whoosh index should be stored.
create_index()[source]¶

Creates and returns a brand-new index. This will call get_empty_index() behind the scenes. Subclasses must implement!

get_empty_index(path, schema)[source]¶

Makes an empty index file, making the directory where it needs to be stored if necessary. Returns the index.

This is called within create_index(). TODO this breakdown is still confusing

get_num_documents()[source]¶

Returns the number of documents in this search engine’s corpus. That is, this is the size of the search engine.

load_index()[source]¶

Used when the index is already created. This just loads it and returns it for you.

search(term)[source]¶

Runs a plain-English search and returns results. :param term {String}: a query like you’d type into Google. :return: a list of dicts, each of which encodes a search result.

set_rewriter(rewriter)[source]¶

Sets a new query rewriter (from this_package.rewriter) as the default rewriter for this search engine.

class search.UdacitySearchEngine(dataset_path, index_path, create=False)[source]¶

Udacity

__init__(dataset_path, index_path, create=False)[source]¶

Creates a new Udacity search engine.

Parameters:
  • {string} (index_path) – the path to the Udacity API JSON file.
  • {string} – the path to a folder where you’d like to store the search engine index. The given folder doesn’t have to exist, but its parent folder does.
  • {bool} (create) – If True, recreates an index from scratch. If False, loads the existing index
count_words()[source]¶

Returns the number of words in the underlying Udacity dataset.

create_index()[source]¶

Creates a new index to search the Udacity dataset. You only need to call this once; once the index is created, you can just load it again instead of creating it afresh all the time.

search.pack_byte()¶

S.pack(v1, v2, ...) -> string

Return a string containing values v1, v2, ... packed according to this Struct’s format. See struct.__doc__ for more on format strings.

search.unpack_byte()¶

S.unpack(str) -> (v1, v2, ...)

Return tuple containing values unpacked according to this Struct’s format. Requires len(str) == self.size. See struct.__doc__ for more on format strings.

Previous

© Copyright 2017, Neel Mehta. Revision 8e1c19f8.

Built with Sphinx using a theme provided by Read the Docs.