<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Scraping the web for fun and profit</title>
	<link>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/</link>
	<description>top secret/secure computing information</description>
	<pubDate>Sat, 05 Jul 2008 20:30:40 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>

	<item>
		<title>By: Marcin</title>
		<link>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/#comment-2281</link>
		<dc:creator>Marcin</dc:creator>
		<pubDate>Wed, 07 Nov 2007 16:15:51 +0000</pubDate>
		<guid>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/#comment-2281</guid>
		<description>This was posted by dre, not me. We have a couple people blogging on our team :)

* &lt;strong&gt;Marcin&lt;/strong&gt;, (me)
* &lt;strong&gt;Casey&lt;/strong&gt;, OS X guru and
* &lt;strong&gt;dre&lt;/strong&gt;, an everything guru</description>
		<content:encoded><![CDATA[<p>This was posted by dre, not me. We have a couple people blogging on our team :)</p>
<p>* <strong>Marcin</strong>, (me)<br />
* <strong>Casey</strong>, OS X guru and<br />
* <strong>dre</strong>, an everything guru</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Travis</title>
		<link>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/#comment-2276</link>
		<dc:creator>Travis</dc:creator>
		<pubDate>Wed, 07 Nov 2007 13:20:25 +0000</pubDate>
		<guid>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/#comment-2276</guid>
		<description>nice article,

i am reading webbots, spiders, and screen scrapers as well.  although it may not be the best way to scour the web its certainly an interesting read.  languages like python, ruby, and perl may or may not be more powerful than the php curl combination.  either way Marcin i like your mention of all the available parsing options and of course google alerts.

http://travisaltman.com</description>
		<content:encoded><![CDATA[<p>nice article,</p>
<p>i am reading webbots, spiders, and screen scrapers as well.  although it may not be the best way to scour the web its certainly an interesting read.  languages like python, ruby, and perl may or may not be more powerful than the php curl combination.  either way Marcin i like your mention of all the available parsing options and of course google alerts.</p>
<p><a href="http://travisaltman.com"  onclick="javascript:urchinTracker ('/outbound/comment/travisaltman.com');">http://travisaltman.com</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Casey</title>
		<link>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/#comment-2058</link>
		<dc:creator>Casey</dc:creator>
		<pubDate>Tue, 23 Oct 2007 20:18:16 +0000</pubDate>
		<guid>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/#comment-2058</guid>
		<description>Simplicity and it's faster from what I have seen.  This will grab all links on a page that are in list items inside of an ordered list.  So simple and easy to write.

require 'rubygems'
require 'scrapi'

links = Scraper.define do
  process "ol&#62;li&#62;a[href]", "urls[]"=&#62;"@href"
  result :urls
end

puts links.scrape(URI.parse("http://ezinearticles.com/?cat=Business"))</description>
		<content:encoded><![CDATA[<p>Simplicity and it&#8217;s faster from what I have seen.  This will grab all links on a page that are in list items inside of an ordered list.  So simple and easy to write.</p>
<p>require &#8216;rubygems&#8217;<br />
require &#8217;scrapi&#8217;</p>
<p>links = Scraper.define do<br />
  process &#8220;ol&gt;li&gt;a[href]&#8221;, &#8220;urls[]&#8221;=&gt;&#8221;@href&#8221;<br />
  result :urls<br />
end</p>
<p>puts links.scrape(URI.parse(&#8221;http://ezinearticles.com/?cat=Business&#8221;))</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dre</title>
		<link>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/#comment-2056</link>
		<dc:creator>dre</dc:creator>
		<pubDate>Tue, 23 Oct 2007 19:00:59 +0000</pubDate>
		<guid>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/#comment-2056</guid>
		<description>Why would you rather use CSS locators than XPath?  I've heard of people wanting to use the DOM selectors - for example in the case of Selenium tests being run under Internet Explorer (apparently the XPath ones are very slow), but never CSS locators.

html &#62; body &#62; div #wrap &#62; div #content &#62; div .post &#62; div #commentsection &#62; form #commentform &#62; p &#62; textarea #comment

or

id('comment')

right?</description>
		<content:encoded><![CDATA[<p>Why would you rather use CSS locators than XPath?  I&#8217;ve heard of people wanting to use the DOM selectors - for example in the case of Selenium tests being run under Internet Explorer (apparently the XPath ones are very slow), but never CSS locators.</p>
<p>html &gt; body &gt; div #wrap &gt; div #content &gt; div .post &gt; div #commentsection &gt; form #commentform &gt; p &gt; textarea #comment</p>
<p>or</p>
<p>id(&#8217;comment&#8217;)</p>
<p>right?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Casey</title>
		<link>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/#comment-2042</link>
		<dc:creator>Casey</dc:creator>
		<pubDate>Tue, 23 Oct 2007 00:45:46 +0000</pubDate>
		<guid>http://www.tssci-security.com/archives/2007/10/18/scraping-the-web-for-fun-and-profit/#comment-2042</guid>
		<description>I use scrubyt and scrapi for scraping.  I prefer scrapi though because I'd rather use CSS selector id's rather than xpaths.</description>
		<content:encoded><![CDATA[<p>I use scrubyt and scrapi for scraping.  I prefer scrapi though because I&#8217;d rather use CSS selector id&#8217;s rather than xpaths.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.175 seconds -->
