26
loading...
This website collects cookies to deliver better user experience
README
contains instructions for how to run the app locally on your own machine.app.py
file, which we’ve reproduced in full below:app.py
file so that we understand it.dotenv
for reading environment variables from the .env
fileflask
for the web application setupjson
for working with JSONos
also for getting environment variablespandas
for working with the datasetpinecone
for working with the Pinecone SDKre
for working with regular expressions (RegEx)requests
for making API requests to download our datasetstatistics
for some handy stats methodssentence_transformers
for our embedding modelswifter
for working with the pandas dataframeinitialize_pinecone
method gets our API key from the .env
file and uses it to initialize Pinecone.delete_existing_pinecone_index
method searches our Pinecone instance for indexes with the same name as the one we’re using (“article-recommendation-service”). If an existing index is found, we delete it.create_pinecone_index
method creates a new index using the name we chose (“article-recommendation-service”), the “cosine” proximity metric, and only one shard.create_model
method uses the sentence_transformers
library to work with the Average Word Embeddings Model. We’ll encode our vector embeddings using this model later.process_file
method reads the CSV file and then calls the prepare_data
and upload_items
methods on it. Those two methods are described next.prepare_data
method adjusts the dataset by renaming the first “id” column and dropping the “date” column. It then grabs the first four lines of each article and combines them with the article title to create a new field that serves as the data to encode. We could create vector embeddings based on the entire body of the article, but four lines will suffice in order to speed up the encoding process.upload_items
method creates a vector embedding for each article by encoding it using our model. The vector embeddings are then inserted into the Pinecone index.map_titles
and map_publications
methods create some dictionaries of the titles and publication names to make it easier to find articles by their IDs later.index.html
template file along with the JS and CSS assets, and the API endpoint provides the search functionality for querying the Pinecone index.query_pinecone
method takes the user’s reading history input, converts it into a vector embedding, and then queries the Pinecone index to find similar articles. This method is called when the /api/search
endpoint is hit, which occurs any time the user submits a new search query.