In this article we will see how to create a PDF search using the command line.
Here we will take the following PDF and see if we can make extract its content searchable. PDF link here
Here are the steps-
- Open a terminal in linux.
- Use the wget function to get download the file and save it to dc-best-practices-google.pdf.
wget "https://static.googleusercontent.com/media/www.google.com/en//corporate/datacenter/dc-best-practices-google.pdf"
- Use pdftotext function to convert the file to text.
pdftotext dc-best-practices-google.pdf
- open the file dc-best-practices-google.txt with any editor
vim dc-best-practices-google.txt
- Use the grep command to search for Green data center
grep -F -C2 "Green Data Center" dc-best-practices-google.txt
- This will show the following output which confirms that the PDF data has been made searchable.
- To create your PDF search engine, use this link