Wednesday Jan 08, 2014

Extract PDF titles from Windows Search

The reason why I tried the SQL for "Windows Search" was I had bunch of PDF files whose
contents I cannot guess from their filenames.

Oracle docs are part of them!  Below is one example. pdf, mobi and epub are all named "E17693".

Here's another.

Oracle Database Data Access Components Documentation

Oracle Data Provider for .NET Developer's Guide
PDF

I put these 2 pdf files and 1 mobi file in one folder.
Here's the query to find titles for all files in current folder in Cygwin.

/cygdrive/c/tmp/docs$ wssql \
> "SELECT System.ItemName,System.Title    \
>    FROM           \
>  SystemIndex        \
>    WHERE          \
>  System.ItemPathDisplay like '$(cygpath -w $PWD)%'  \
> "
Query=SELECT System.ItemName,System.Title       FROM            SystemIndex           WHERE           System.ItemPathDisplay like 'C:\tmp\docs%'
docs;NULL;
E17693-13.mobi;NULL;
e17693.pdf;Oracle Data Mining Application Developer's Guide;
e17732.pdf;Oracle Data Provider for .NET Developer's Guide;

I now know which PDF is which now. Maybe I can make the SQL more obscure or
pipe the output to 'sed' or modify the C# source to create rename script.

The above query didn't show title for the mobi file.
This is probably because I don't have appropriate IFilter installed and
I don't even know such IFilter exists!
I will write about mobi&epub next.

About

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« January 2014 »
SunMonTueWedThuFriSat
   
1
2
3
4
5
11
12
13
14
16
18
19
20
21
22
24
25
26
27
28
29
30
31
 
       
Today