In this article we will see how to create a PDF search using the command line.
Here we will take the following PDF and see if we can make extract its content searchable. PDF link here
Here are the steps-
- Open a terminal in linux.
- Use the wget function to get download the file and save it to dc-best-practices-google.pdf.
1wget "https://static.googleusercontent.com/media/www.google.com/en//corporate/datacenter/dc-best-practices-google.pdf" - Use pdftotext function to convert the file to text.
1pdftotext dc-best-practices-google.pdf - open the file dc-best-practices-google.txt with any editor
1vim dc-best-practices-google.txt - Use the grep command to search for Green data center
1grep -F -C2 "Green Data Center" dc-best-practices-google.txt - This will show the following output which confirms that the PDF data has been made searchable.
- To create your PDF search engine, use this link