Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Automator or other script experts?
#1
I have a folder of PDFs I'd like text searched for. (Each PDF has searchable text.) I want to copy or enter the text and then have each PDF searched for that string (not the separate words). I want the results to show me which PDFs have the results if possible as a list, or opening/leaving the PDFs open is fine too.

The string is a phrase, usually. Maybe just an acronym. But whatever, its a few words appearing in a certain order.

I see that Automator includes a thing for searching PDF "from Spotlight metadata" but don't know anything about how to glue together what makes the actions work.

[Edit] I'm using macOS' Preview to read the PDFs.
Reply
#2
It's occurred to me a simple script could be more trouble than it's worth. I still have to open/search each PDF that would have the string because what I need from the PDF is usually a paragraph following the search string IF its present.

Example, I have a definition to look for. Out of say, 10 PDFs, one or two should have it. I open each PDF and search within it. When I find it I copy the word's or phrase's definition and paste into something else and move on to the next PDF to see if it's also in that one etc.
Reply
#3
Is Spotlight search in Finder not adequate? It's basic basic, but its metadata is what you'd be using in a script, and this already has a decent UI.

Reply
#4
Thanks rj, yes that'll do it but now I'm further realizing it's not enough.

For example, if I search for "Energy management" one or two of the docs will have that as a definition (along with however they define energy management.)

But every document has those words, returning all PDFs, most of which are irrelevant because they are merely using the words "energy" and "management" together as part of a sentence or whatever (or maybe not even together.) The Spotlight results would need to show me what page(s) the text resides and how it looks; otherwise I'm back to just opening them one-by-one and running the same search each time.

Like now Smile

*****************

The good news is that I'm being paid well for this insanely mundane and tedious task, and by the hour, but I thought I'd still try to to speed it up!
Reply
#5
Sounds like a job for BBEdit.
Reply
#6
Assuming you have consistently formatted documents, this returns paths hit by your search term and the text from the following paragraph.

set searchTerm to (display dialog 
Input the search string:
default answer
energy management
)'s text returned
set itsPathandpostPara to {}
set found to (do shell script
fgrep -irA1
& searchTerm's quoted form & space & (choose folder with prompt
Locate the folder containing the files to be searched.
)'s POSIX path's quoted form)
set text item delimiters to
-
repeat with focus from 2 to (count found's paragraphs) by 3
set itsPathandpostPara's end to {found's paragraph focus's text item 1, found's paragraph focus's text item 2} & return
end repeat
set text item delimiters to return
set itsPathandpostPara to itsPathandpostPara as text
set text item delimiters to
itsPathandpostPara
Reply
#7
Free consulting! This place rocks.
Reply
#8
Marc Anthony wrote:
Assuming you have consistently formatted documents, this returns paths hit by your search term and the text from the following paragraph.

set searchTerm to (display dialog 
Input the search string:
default answer
energy management
)'s text returned
set itsPathandpostPara to {}
set found to (do shell script
fgrep -irA1
& searchTerm's quoted form & space & (choose folder with prompt
Locate the folder containing the files to be searched.
)'s POSIX path's quoted form)
set text item delimiters to
-
repeat with focus from 2 to (count found's paragraphs) by 3
set itsPathandpostPara's end to {found's paragraph focus's text item 1, found's paragraph focus's text item 2} & return
end repeat
set text item delimiters to return
set itsPathandpostPara to itsPathandpostPara as text
set text item delimiters to
itsPathandpostPara

Thanks Marc! The docs are consistently formatted unfortunately. I did run it from within Script Editor (didn't even realize that was still around) and it stopped at the shell script with a "non-zero status." I appreciate it though.
Reply
#9
You're welcome. That error means it was pointed at a folder that didn't contain at least one file with the search term.
Reply
#10
Marc Anthony wrote:
You're welcome. That error means it was pointed at a folder that didn't contain at least one file with the search term.

That's my problem. "Energy Management" was indeed included on a page with other definition terms. Here's a partial screenshot of what the particular PDF looks like after using Preview's search field:



I realize that alone doesn't prove I pointed the script at the folder.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)