// article

Articles

Search within pdf files using grep

Put this code snippet into a script named greppdf into your path :

1
2
3
4
5
6
7
8
9
10
11
#!/bin/bash

for PDF in *.pdf
do
    NB_PAGES=`pdfinfo "$PDF" |grep "Pages" |cut -f 2 -d ":"`

    for (( PAGE=1; PAGE<=$NB_PAGES; PAGE++ ))
    do  
        pdftotext "$PDF" -f $PAGE -l $PAGE - | grep -i $@ | while read line; do echo "$PDF:$PAGE:$line"; done
    done
done

Now you can search through a directory of pdf, using this command (you can use as well regular grep parameters) :

1
greppdf "programming"

This will output the filename and slide number where the “programming” string is found.

Requirements: The package poppler-utils needs to be installed on your system.

Discussion

No comments yet.

Post a comment