22
loading...
This website collects cookies to deliver better user experience
README
contains instructions for how to run the app locally on your own machine.app.py
file, which we’ve reproduced in full below:dotenv
for reading environment variables from the .env
fileflask
for the web application setupjson
for working with JSONos
also for getting environment variablespandas
for working with the datasetpinecone
for working with the Pinecone SDKrequests
for making API requests to download our datasetsentence_transformers
for our embedding modelinitialize_pinecone
method gets our API key from the .env
file and uses it to initialize Pinecone.delete_existing_pinecone_index
method searches our Pinecone instance for indexes with the same name as the one we’re using (“question-answering-chatbot”). If an existing index is found, we delete it.create_pinecone_index
method creates a new index using the name we chose (“question-answering-chatbot”), the “cosine” proximity metric, and only one shard.download_data
method downloads the dataset of Quora question-answers pairs if needed. If the file already exists in the tmp
directory, then we just use that file.read_tsv_file
method reads the TSV file using the pandas
library and inserts each row into a data frame. We also remove any duplicate questions found in the dataset.create_and_apply_model
method uses the sentence_transformers
library to work with the Average Word Embeddings Model. We then create a vector embedding for each question by encoding it using our model. The vector embeddings are then inserted into the Pinecone index.index.html
template file along with the JS and CSS assets, and the API endpoint provides the search functionality for querying the Pinecone index.query_pinecone
method takes the user’s input, converts it into a vector embedding, and then queries the Pinecone index to find similar questions. This method is called when the /api/search
endpoint is hit, which occurs any time the user submits a new search query.