33
loading...
This website collects cookies to deliver better user experience
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
data = pd.read_csv("news.csv")
x = np.array(data["title"])
y = np.array(data["label"])
cv = CountVectorizer()
x = cv.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
model = MultinomialNB()
model.fit(xtrain, ytrain)
news_headline = "Atlantis discovered under the Atlantic Ocean!"
data = cv.transform([news_headline]).toarray()
print(news_headline)
print(model.predict(data))
news_headline = "Kathy Hochul: Who is New York's first female governor?"
print(model.score(xtest, ytest))
to your script, you'll see that the accuracy score is ~80%, even though I've tested 40 news headlines from last week and got a 50% to 60% accuracy, that's because news headlines, news headline vocabulary and news headline topics change all the time.