Build Link List from Safari Window

The previous post reminded of one of the oldest scripts I have and still use regularly. I regularly have to gather information found on different webpages and email them to somebody. I try to gather all the relevant webpages in tabs in one Safari and then run this script. It will look at all the tabs in the frontmost Safari Window and build a plain text list out of the titles and links and place that in clipboard, ready to paste into an email or elsewhere.The result will look like this:

Apple
<http://www.apple.com/>

Google
<http://www.google.com/>

Scripting OS X | #! is not a curse word
<https://scriptingosx.com/>

The script is fairly straightforward:

global linkText

on run
	set linkText to ""

	tell application "Safari"
		activate
		set w to window 1

		set n to 0
		try -- this will fail for the downloads window
			set n to count tabs of w
		end try

		if n > 1 then
			repeat with t in every tab of w
				my appendLineWithDoc(t)
			end repeat
		else if n = 1 then
			my appendLineWithDoc(document of w)
		end if
	end tell

	set the clipboard to linkText

	return linkText
end run

on appendLineWithDoc(theDoc)
	tell application "Safari"
		tell theDoc
			try
				set linkText to linkText & name
				set linkText to linkText & return & "<" & URL & ">" & return & return
			end try
		end tell
	end tell
end appendLineWithDoc

You can save this script in your ~/Library/Scripts/Applications/Safari  folder and enable the Script menu and it will be shown only in Safari.

Speak Instapaper Posts — Part 2

In the last part we built a useful workflow that would open a given number of unread article from your Instapaper feed. But we stopped short of the goal, to convert the text of the articles to speech files.

If you look into the library of Automator actions there is one with the promising name “Get Text from Webpage.” However this will extract all the text, usually including all the menus, ads and all the other detritus that clutters webpages these days. The latest version of Safari (( Safari 5, as I write this )) has a functionality called “Reader,” which removes all this clutter and allows the user to focus on just the text. Unfortunately, the “Reader” functionality in Safari is not scriptable.

But before Safari had “Reader” there was the Readability javascriptlet from Arclab90 which does very much the same thing. Since Safari’s AppleScript dictionary allows us to execute arbitrary JavaScript against a webpage, we can use that to extract the relevant text from the article. That saves us from having to recreate the logic of the Readabilty scriptlet in AppleScript.

Do the following with the workflow we built in Part 1:

  • duplicate the Workflow file and name the copy: Speak Instapaper Articles to iTunes
  • remove the last action “New Safari Documents” from the workflow (( there is a bug in Safari’s AppleScript implementation where document references from freshly created web documents will go stale once the page is loaded. This also affects the “New Safari Documents” action. We will work around this bug in our AppleScript))
  • add a new empty “Run AppleScript” action at the end of the workflow and enter the following code:
on run {input, parameters}
	
	-- uses the 'Readability' javascript from
	-- http://lab.arc90.com/experiments/readability/
	
	set readabilityScript to "javascript:(function(){readConvertLinksToFootnotes=false;readStyle='style-newspaper';readSize='size-medium';readMargin='margin-medium';_readability_script=document.createElement('script');_readability_script.type='text/javascript';_readability_script.src='http://lab.arc90.com/experiments/readability/js/readability.js?x='+(Math.random());document.documentElement.appendChild(_readability_script);_readability_css=document.createElement('link');_readability_css.rel='stylesheet';_readability_css.href='http://lab.arc90.com/experiments/readability/css/readability.css';_readability_css.type='text/css';_readability_css.media='all';document.documentElement.appendChild(_readability_css);_readability_print_css=document.createElement('link');_readability_print_css.rel='stylesheet';_readability_print_css.href='http://lab.arc90.com/experiments/readability/css/readability-print.css';_readability_print_css.media='print';_readability_print_css.type='text/css';document.getElementsByTagName('head')[0].appendChild(_readability_print_css);})();"
	
	set output to {}
	tell application "Safari"
		repeat with x in input
			set theURL to contents of x
			make new document with properties {URL:theURL}
			delay 0.5
			
			repeat until ( (do JavaScript "document.readyState;" in document of window 1) is equal to "complete")
				delay 0.5
			end repeat
			set d to document of window 1
			
			do JavaScript readabilityScript in d
			delay 3
			repeat until ( (do JavaScript "document.readyState;" in d) is equal to "complete")
				delay 1
			end repeat
			
			set thetext to text of d
			-- remove first three and last four paragraphs since these are Readability links
			set AppleScript's text item delimiters to return
			set thetext to (paragraphs 4 through -5 of thetext) as text
			close d
			
			set output to output & {thetext}
		end repeat
	end tell
	
	return output
end run

Let’s slowly go through this code:

  • first we setup variable to store the Readabilty javascript code.
  • then we initialize a list output to store the results.
  • then we loop through the items that were passed into the action in the input variable. In this case the items are the URLs of the Instapaper posts.
  • set theURL to contents of x
    this de-references the iterator variable. Due to some oddities of the AppleScript language this is usually a wise thing to do in a repeat loop.
  • make new document with properties {URL:theURL}
    delay 0.5

    we tell Safari to open a new document with the given URL and pause for a while to let Safari start loading

  • repeat until ( (do JavaScript "document.readyState;" in document of window 1) is equal to "complete")
        delay 0.5
    end repeat
    set d to document of window 1

    We have to wait until the page is completely loaded before we can apply the Readability script against the page. Unfortunately Safari does not expose the state of the page (loading or complete) to AppleScript. This is however exposed to the JavaScript DOM within the page and we can access DOM information from AppleScript with the do Javascript event. So we poll the document.readyState attribute in Javascript until it reports complete. Then we remember a reference to this document in a variable. ((Safari has a bug where a AppleScript reference to document will change while it is loading, resulting in broken references. All this is a clumsy, but effective workaround.))

  • now we can execute the Readability script against the page:
    do JavaScript readabilityScript in d
    delay 3
    repeat until ( (do JavaScript "document.readyState;" in d) is equal to "complete")
    	delay 1
    end repeat

    We use the same DOM trick to wait until Safari is done.

  • Now the text property of the document contains the cleaned up text of the article. We can extract that, remove some extra lines that Readabilty inserts, close the Safari window and append the text as its own element to the output list.
    set thetext to text of d
    -- remove first three and last four paragraphs since these are Readability links
    set AppleScript's text item delimiters to return
    set thetext to (paragraphs 4 through -5 of thetext) as text
    close d
    			
    set output to output & {thetext}

This would be a good time to save the workflow, and do a test run. You can show the results of the workflow in Automator to see if the text is extracted properly. Readability is not perfect and does not work on all pages, but the success rate is quite high.

The remaining work of converting the text into audio is very straightforward. Add the following workflow actions:

  • Text to Audio File
  • Import File into iTunes
  • Add Songs to Playlist ((You want to create a specific playlist for these files in iTunes))

And then you are done. You can also download the complete Workflow.