In order to harvest or scrape data from the world wide web, you have to learn how to download iles from the web. The sample code below, a simplifief method, illustrates a program that extracts the default page of bookmarktutoring.com.
require 'net/http'
url = 'http://www.bookmarktutoring.com/default.html'
uri = URI.parse(url)
response = Net::HTTP.get_response(uri)
puts response.body
In Ruby, downloading a file from the Internet requires the net/http gem. This code package or gem allows you to access (read and write) methods on files stored on remote servers.
The code extracts the main page of BookMarkTutoring.com and displays it on the screen. Note that in order to access the main page of bookmarktutoring the default or index page of the web site is used.
On the first line of the code, the net/http gem is imported into the program. Next the page to read into a variable called URL. Note that the web page is required to be in single quote marks.
The parse method of the URI is then involved. This command splits up the URL into a parsed segments so that the server where the web page resides can locate the file.
The method "get_response" is then called. This method retrieves the response of the server. The resulting response of the server is stored in the variable called "response."
The body method is then invoked on the response object. The body contains the data contained in the page returned.
Lesson 2 Ruby Data Harvesting, Downloading Web Pages, Readlines, Arrays
BookMark Magazine, an educational publication from BookMarkTutoring.com, provides educational supplemental material for students and teachers. These range from captivating educational classroom activities, such as click and color educational coloring pages and our Infinity Machine online drawing software (mathematical brush driven) to subject specific learning link libraries.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment