Scraping websites is a challenging but rewarding process. When you have finally nailed the exact syntax, you need to scrape and then parse the HTML you need, it can be a relief and joy at the same time. My CLI gem, Easy_Vegan scrapes on two different tiers. The first scrape is somewhat static in that it will always scrape the recipe index page from a lovely blog titled “Minimalist Baker”. The second tier of scraping is done more dynamically on each of the specific recipe pages. The scraping of all these individual recipe pages is automated with the help of Ruby’s enumerable methods.
A diagram summarizing my object_oriented gem is below. As you can see, my gem contains 3 classes and is loosely modeled after the student scraper lab. My gem does go one step further than student scraper to dynamically generate urls of pages to be scraped.
My methodology in completing this project was calculated and I approached the project in specific steps. In order to familiarize myself with the gem building process, I watched Avi’s “Daily Deal” video twice. In this video he works outside-in. I also watched his subsequent video in which he programs inside-out. Between the two approaches, I decided that the outside-in approach would be my preferred game plan. Before diving in, and trying to create my own working environment from scratch, I watched yet another video on youtube titled “Publishing your first Ruby Gem”. After creating a rubygems.org account, practicing my git commands, creating the appropriate repo on github, I finally began to code.
I auto generated by working environment using bundle, and modeled my file requirements and environment after the Daily Deal video. The goal of my easy_vegan gem was to successfully allow the user to choose from dynamic recipe categories to browse the available recipes. From there, the gem scrapes a second tier of data, such that the user can easy view several attributes of a recipe.
The function of the gem is simple. Eventually I would like to add more features such as search by keyword, open the webpage of the recipe via a simple open command. I faced a few unexpected challenges in the building of this gem. The first relates to defining methods as instance methods or as class methods. By default, I was defining them all as class methods, because when I would attempt to evoke an instance method, I would hit several errors. It wasn’t until my debugging session that I realized the fundamental flaw with creating only class methods. Most importantly, my attr_accessors were instance methods, and there for could only be evoke, or operate on, specific instances of my class. This was a huge blockade in my ability to scrape the website and assign the scraped info to attributes of the Recipe Object.
My solution was to refactor several methods to be instance methods. The method definition changed, and also the way in which the objects were operated on. Using enumerables like .each_with_index I iterated over each instance of Recipe to perform the operations required. My gem still needs to be refactored and cleaned up a bit. I will update this post appropriately as soon as my gem is complete!