Scrapy - Rotten Tomatoes

Scheme 1: Rotten Tomatoes APIS

The Rotten Tomatoes API provides access to Rotten Tomatoes’ ratings and reviews, allowing approved companies and individuals to enrich their applications and widgets with Rotten Tomatoes data.  

Accessable data:

  • Critic and Audience Scores. Tomatometer and Audience scores for movies.
  • Critic Reviews. A sampling of critic reviews for each movie.

Not accessable data:

  • Detailed movie metadata. 
  • Posters and images.  
  • TV scores. 
However, to use this scheme, you have to apply for a Rotten Tomatoes API Key. The business proposal form can be access using the following link: https://www.rottentomatoes.com/help_desk/proposalform
After you receive the key, you can follow the instruction posted by Mike to connect and download required data:

Scheme 2: Python Scrapy

Using Python scripts with the Beautiful Soup module to parse the DOM tree to gather information programmatically from HTML rather than using an API to access data. 

For static pages

staticpage

Page structure as shown above, the information we need are stored in a stable. The data does not change over time.

We can simply use requests and BeautifulSoup to collect data.

 

codeForStatic

For dynamic pages

Those pages present some dynamic data which needs to be loaded from the server. Namely, every time you refresh the page, the data might change.

DynamicPage

We need to use requests and Json rather than BeautifulSoup to collect data. Besides, we need to use go to “Network”>”XHR”>”Headers”>find the “Request URL”.

RequiestURL
codeForDynamicPage

I'd love to hear from you.

© 2020 by Cynthia Luo 棋馨

Shopping cart

0

No products in the cart.