Oxaric’s Blog

A compendium of amazing things…

Ruby Image Grabber

Posted by oxaric on December 7, 2008




Yesterday I coded an image grabber in Ruby. It takes a URL, gets the HTML file, and downloads every image it finds identified in the HTML file. It has one extra option that takes the minimum file size of a picture to download. It is not a web crawler and will not follow links to grab other images but I plan to create an image web crawler based upon this code and hopefully I’ll have that up soon.


Something to note is that it does not have the ability to download images referenced by php script. For example on certain forums images are displayed with php script and have a reference similar to “show.php?image_file_name.jpg”. This program will not travel to other links and so it is not able to download php referenced images.


However, if there is a gallery of images that has thumbnails with direct links to the bigger image this program will grab both images.


A neat feature is that the program takes into consideration special HTML codes and should have no problem with ‘coded’ URLs or foreign language image names.


Normally I’d put the source code up here but the program contains special ascii characters and the formatting for displaying the code isn’t working. I think it’s worth your time to download the program and give it a shot. ;)


Click to directly download grabimages.rb


More Information:


It grabs these image types:
.jpg
.jpeg
.png
.bmp
.gif
.tif
.tiff



Usage: ruby grabimages.rb [URL] [Download Path] [Option: Minimum Picture Size, in kB]


Usage Example:

~/test> ruby grabimages.rb www.yahoo.com download/


Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.