jatbar was a legendary indie restaurant review site for the Bay Area, operated by Jason and Terry from 2004 to 2009.
Back then, Yelp was still not in full bloom, and being a high school student, it was hard for me to find good restaurants in the Bay Area. Thus, jatbar was my go-to source for finding good eats. It was through jatbar that I was introduced to the carnitas at La Bamba, the falafel at Falafel’s Drive In, and the pumpkin seed pesto chicken at Yung Le’s Fusion.
Unfortunately, the website was shut down circa 2009.
I wondered if Jason and Terry have been secretly writing reviews on Yelp since then. I decided to check this out, using author identification (also known as authorship attribution). We’ll use basic machine learning techniques to learn models to distinguish whether a review is written by Jason or Terry.
The entire code for this is available at https://github.com/mrorii/findjatbar
Here’s the entire workflow of the task:
- Retrieve Jason and Terry’s reviews for the 1,035 restaurants that are on jatbar. These reviews will be considered as positive examples for training.
- Retrieve Yelp reviews for the same 1,035 restaurants. We can probably assume that Jason and Terry wouldn’t have written reviews on Yelp for the restaurants they have covered on jatbar, so we’ll consider these reviews as negative examples for training.
- Using reviews from steps 1 and 2, learn a model using some binary classifier.
- Retrieve reviews from Yelp for restaurants that are not included in the 1,035 that are on jatbar, and test to see if the model predicts any of the reviews as written by Jason or Terry.
Here’s a schematic showing the workflow:
For the model, I use a simple Logistic Regression classifier from the excellent scikit-learn library. I chose to use the L1 norm, as it drives most of the feature weights to zero, making the model more compact and easily interpretable1.
For features, I use n-grams of the reviews, considering unigrams, bigrams, and trigrams. I only include feature instances that occurred in at least 3 reviews in the training set.
Let’s look at the results. We consider 2 tasks:
- Classifying whether a review is written by Jason
- Classifying whether a review is written by Terry
Classifying Jason’s reviews
We get excellent classification results for Jason, with an F1 score of 0.966 on the test set. This is surprising, considering the simplicity of both the method and features. The confusion matrix for the test set looks like the following:
| Predicted | Neg Pos ----------------------------- Actual Neg | 3977 3 Pos | 4 87
Out of the 3980 Yelp reviews, only 3 reviews are incorrectly classified as being written by Jason, while 87 of the 91 reviews that were written by Jason were correctly classified as being positive.
Just for reference, here is a plot of the precision-recall curve:
That’s a pretty darn good-looking PR curve.
Let’s now look at the weights of the resulting model. The following shows lists of the 30 features that had the largest and smallest weights:
Largest weights Smallest weights ---------------------- --------------------- 1: word review 6.657 1: however -2.784 2: terry 6.516 2: star -2.572 3: score 4.526 3: delicious -2.474 4: overdue 4.500 4: actually -2.209 5: spicey 4.236 5: great -2.080 6: ingrediants 3.695 6: the -1.888 7: descent 3.470 7: stars -1.850 8: serves 3.377 8: two -1.846 9: damm 3.227 9: take out -1.817 10: rated 3.045 10: a few -1.749 11: ive had 2.983 11: soon -1.706 12: flakey 2.979 12: really -1.679 13: picture 2.974 13: then -1.672 14: shrimps 2.970 14: and -1.533 15: steep for 2.935 15: as well -1.529 16: beef is 2.896 16: pho -1.491 17: its my 2.893 17: quality -1.477 18: roll and 2.889 18: service -1.475 19: dont 2.848 19: yum -1.417 20: wasnt 2.844 20: here -1.416 21: isnt 2.728 21: party -1.400 22: bargain 2.703 22: it is -1.398 23: recall 2.674 23: though -1.374 24: weak 2.640 24: first -1.325 25: jatbar 2.590 25: is not -1.316 26: nine 2.552 26: spot -1.303 27: and prices 2.549 27: at all -1.297 28: rate 2.475 28: decent -1.277 29: firm 2.472 29: few -1.270 30: visit 2.439 30: perhaps -1.267
Other than the obvious keywords such as “terry” and “jatbar”, we see that Jason cares about prices (“steep for”, “bargain”, and “and prices”). As expected, we see terms such as “beef is” and “shrimp” having large positive weights2. Also, we see that compared to the average Yelp user, Jason uses fewer mundane terms to describe the food, such as “delicious” and “yum”.
Classifying Terry’s reviews
The F1 score for Terry’s reviews was 0.917. The confusion matrix for the test set looks like the following:
| Predicted | Neg Pos ----------------------------- Actual Neg | 3999 3 Pos | 8 61
We see that it is a bit more difficult to classify Terry’s reviews, but the numbers are still good.
Here is a plot of the precision-recall curve for Terry:
The following shows lists of the 30 features that had the largest and smallest weights for Terry:
Largest weights Smallest weights ------------------------------ ---------------------- 1: grays 6.457 1: it's -3.908 2: is coming 6.186 2: love -3.026 3: jason 5.223 3: ive -2.391 4: i will make 4.576 4: stars -2.372 5: nyc 4.047 5: friend -2.229 6: the cheese 3.542 6: loved -2.051 7: fresh tasting 3.379 7: dont -2.032 8: jatbar 3.371 8: although -1.882 9: appeal to 3.309 9: yelp -1.831 10: offerings 3.256 10: cooked -1.789 11: california rolls 2.866 11: the -1.728 12: the cheese sandwich 2.853 12: didnt -1.658 13: there so 2.826 13: chicken -1.647 14: to sample 2.782 14: that's -1.589 15: gave a 2.775 15: night -1.559 16: prices were 2.742 16: star -1.517 17: vegetables 2.728 17: like it -1.500 18: its 2.687 18: i'm -1.490 19: wishlist 2.580 19: staff -1.487 20: late but 2.525 20: reasonable -1.471 21: arrived quickly 2.465 21: okay -1.451 22: among the 2.457 22: parking -1.408 23: vegetarian i 2.417 23: heavy -1.393 24: sampled 2.371 24: use -1.377 25: the vegetarian 2.363 25: come -1.359 26: the grill 2.299 26: and -1.342 27: refreshing 2.275 27: options -1.302 28: prepared to 2.265 28: decor -1.292 29: me away 2.252 29: wasnt -1.283 30: blow me 2.198 30: great -1.250
We see that the words “vegetable” and “vegetarian” appear often. It turns out that Terry is a semi-vegetarian. (His self-introduction on jatbar’s About page mentions “Representing Northern California and vegetarians”). Similarly, the word “cheese” appears often, which is also confirmed on his self-introduction (“Main staple of diet is the veggie burrito and sliced cheese pizza”). The word “grays” refers to Gray’s Papaya, which is a hot dog restaurant in West Manhattan, famous for their inexpensive, high-quality hot dogs.
For both models, we see that words like “service”, “decor”, “quality”, and “staff” have large negative weights. This means that Jason and Terry, compared to the average Yelp user, doesn’t care about the “fluff” and (probably) cares more about the food itself.
So, were Jason and Terry on Yelp?
Finally, let’s see if Jason and Terry are actually on Yelp. We take our best-performing model on the test set and apply it on Yelp reviews for restaurants that are not covered by jatbar. If a review is classified as positive, we suspect that it may be written by Jason or Terry.
Ideally, we would want to run the classifier on reviews for all of the restaurants in the Bay Area, but that is not realistic considering the number of reviews on Yelp. Thus, I decided to limit the search space by only looking at reviews of restaurants located near Santa Clara County that are related to “Burrito”, considering how much Jason loves burritos. (The full list of cities considered is available here)
Out of the 10,807 reviews that satisfied these conditions, our model predicted that 6 of them were written by Jason. Let’s look at each of them below:
taqueria nite in the east side of san jose should sound good/authentic/tasty….veenie had that huge/tasty nachos….i had three tacos with a side of rice/sour creme….i requested hard shell but they served it to me on the soft side of thangs….even tho we dined in~the only cool part was the drive thru~ive been to alot of moms/pops taquerias in my life time but has never seen one like this with a drive thru…i jus dont see myself stumbling into this place on a drunken stooper @ 2am in the morn
Jason would never write “nite”. Next review.
didnt like the extremly salty chips and u dont even get that many plus they tasted kinda soggyish and stale i didnt like it the burrito wasnt that big it was about average ive had bigger burritos than that plus was close to 9$ for the burrito and chips what a rip off.. cashier wasnt very friendly either it was more like ok heres your burrito here u go by (since i ordered it online) dang burrito didnt even fill me up i coulda gotten a bigger burrito from one of those tacos trucks for alot cheaper and it woulda filled me up
Jason would never write “u”. Next review.
UGH! or should I say YUCK. Had the cheese Enchaladas and the cheese wasnt even melted. Still in the grated form. The sauce was weird tasting, and had unsual green peppers inside, i dont recall seeing that on the menu when I ordered. I didnt bother to finish it. The refried beans looked brick red, like it could of been used for building an adobe house. The service was good, just wish the food followed suit. I have been here many, many times before and seems like its starting or already has declined. It wasnt just me, the wife had the flautas, she didnt care for ithem either. Probably wont be going back for quite awhile until I hear form someone else that they are good again.
I can kind of imagine Jason writing this.
Finally an authentic mexican restaurant in the bay area!!! I have search high and low and have finally found one. Its sad to say but I need to avoid most of the yelp reviews of what people consider “authentic”. Chipotle and Chevys are not authentic and for the most part the yelp recommendations usually taste along those lines. But this place is the real deal. I ordered a chile relleno and enchilada combo. It didnt taste like the bomb but is a huge step above what the other mexican places around the bay taste like. I also like the cabbage salad (?) that came with it. It sorta reminded me of the cabbage they serve you when you order papusa. And I didnt even have to ask for some additional tortillas to eat my meal with, they just brought it out. My friend order the pasole and it came in a huge bowl. I almost wished I ate meat so I coulda tried it because it looked so good. His came with a bunch of tostadas and an extra plate full of frest lemons, cabbage, and onions so he could add it to his pasole. Im sure the decor and look of the place will automatically make people pass this place up. But if you want some real mexican food you should give this place a try.
Jason loves eating meat. This guy is definitely not him.
Ordered a Carne Asada Super Burrito. I thought Carne Asada was grilled steak? This tasted like burnt ground beef or Crunchy burnt meat of some kind. By far THE WORST burrito Ive had in years! Oh, the Horchata had no taste…Its supposed to be sweet. This tasted just like how it looked. One last thing, chips. Most places give you some chips with your burrito. Not this place. Carne Asada Super burrito + Horchata-Chips=$10+ You do the math…Sorry Taqueria Azteca. No Mas!
I can see Jason writing a sarcastic review like this one.
Let me start off by saying two of my friends have been eating there almost every day! and they love it so much they got it tatted on their arms. if you do go look at the picture by the counter where you order. I go there whenever I am hungry and dont have to work. so maybe 3 times a week. Its easy to get to know Henry he is one of the funniest and friendliest guys you will ever meet. If you have never been GO! you will soon be apart of the jalisco family and if you havent met henry you havent had the true jalisco experience.
This many words, and not a single thing about the food. This is definitely not Jason.
It’s possible that Jason could have written these 2 of the 6 reviews. However, if you do a quick Google search on these 2 reviews, you can see pretty easily that the 2 Yelp users who wrote them do not match Jason’s profile. The results were similar for Terry.
It looks like Jason and Terry have truly fallen under the radar. Let’s hope that the real jatbar comes back soon.
Image courtesy of Eugene Kim.
- However, we also need to keep in mind that the L1 norm does NOT necessarily identify features that are truly indicative of the label. For example, if 2 features are highly correlated, one of the corresponding weights will be driven to zero, even if both features are predictive. [return]
- Jason loves eating meat (and seafood). For example, jatbar has thrown a Carnitas shootout in order to decide which taqueria in the Bay Area serves the best carnitas. [return]