Sunday Mar 16, 2008

Shell script: Matching the first occurrence of a string or a pattern in a file and exit

I was looking to solve the above problem, as lazy as ever googled for a solution, but couldn't find what I was looking for (though there were while loop solutions). If you need a single line shell script which matches the first occurrence of a string (or a pattern) from the input and exit, the following code could be useful.

Input file: input.txt

Start
Line: 1
Line: 2
Line: 3
Line: 4
Line: 5
End

To get output 1 (second string in second line Line: 1) from the above file and exit, use the below grep and awk combination, although this may not be the efficient solution.

cat input.txt | grep "Line:" | awk '{ print $2; exit }'

Wednesday Sep 19, 2007

OpenOffice Parser: Extracting text from OpenOffice documents

With OpenDocument formats getting widespread acceptance, a lack of simple text extractor from OpenOffice documents is the main motivation for me in developing this one.  The below code will extract text from Open Office documents (like odt, odp, etc). I have used the JDOM XML APIs for easier processing of OpenOffice XMLs. Hope this will make life a bit easier.

/\*
 \* OpenOfficeParser.java
 \*
 \* Created on September 12, 2007, 4:24 PM
 \*
 \* To change this template, choose Tools | Template Manager
 \* and open the template in the editor.
 \*/

/\*\*
 \*
 \* @author prasanna
 \*/

import java.io.InputStream;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.Text;
import org.jdom.input.SAXBuilder;
import java.util.zip.ZipFile;
import java.util.zip.ZipEntry;
import java.util.Enumeration;
import java.util.Iterator;
import java.util.List;

public class OpenOfficeParser {
   
    StringBuffer TextBuffer;
   
    /\*\* Creates a new instance of OpenOfficeParser \*/
   
    public OpenOfficeParser() {}
   
    //Process text elements recursively
    public void processElement(Object o) {
        
        if (o instanceof Element) {
           
            Element e = (Element) o;
            String elementName = e.getQualifiedName();
           
            if (elementName.startsWith("text")) {
               
                if (elementName.equals("text:tab")) // add tab for text:tab
                    TextBuffer.append("\\t");
                else if (elementName.equals("text:s"))  // add space for text:s
                    TextBuffer.append(" ");
                else {
                    List children = e.getContent();
                    Iterator iterator = children.iterator();
                   
                    while (iterator.hasNext()) {
                       
                        Object child = iterator.next();
                        //If Child is a Text Node, then append the text
                        if (child instanceof Text) { 
                            Text t = (Text) child;
                            TextBuffer.append(t.getValue());
                        }
                        else
                        processElement(child); // Recursively process the child element                   
                    }                   
                }
                if (elementName.equals("text:p"))
                    TextBuffer.append("\\n");                   
            }
            else {
                List non_text_list = e.getContent();
                Iterator it = non_text_list.iterator();
                while (it.hasNext()) {
                    Object non_text_child = it.next();
                    processElement(non_text_child);                   
                }
            }               
        }
    }
   
    public String getText(String fileName) throws Exception {
        TextBuffer = new StringBuffer();
       
        //Unzip the openOffice Document
        ZipFile zipFile = new ZipFile(fileName);
        Enumeration entries = zipFile.entries();
        ZipEntry entry;
       
        while(entries.hasMoreElements()) {
            entry = (ZipEntry) entries.nextElement();
                                  
            if (entry.getName().equals("content.xml")) {
               
                TextBuffer = new StringBuffer();               
                SAXBuilder sax = new SAXBuilder();
                Document doc = sax.build(zipFile.getInputStream(entry));
                Element rootElement = doc.getRootElement();
                processElement(rootElement);
                break;
            }
        }                 
        System.out.println("The text extracted from the OpenOffice document = " + TextBuffer.toString());
        return TextBuffer.toString();       
    }     
   
   
    public static void main(String args[]) throws Exception
    {
        new OpenOfficeParser().getText("OpenDocumentFile.odt");
    }
}

Sunday Sep 09, 2007

Division of two numbers without using division operator

I was trying an efficient solution for this problem for sometime and came up with this.

The logic is simple, just left shift (multiply by 2) the divisor till it reaches dividend/2, then continue this routine with the the difference between dividend and divisor and divisor till the point when dividend is less than divisor or the difference is zero. Its similar to the way binary search is used to find an element in a sorted list. Confused! go through the below recursive procedure in python.

#Division of two numbers without using division operator

dividend = int(raw_input("Enter the dividend:"))
divisor  = int(raw_input("Enter the divisor:"))
tempdivisor = divisor
remainder = 0

def division (dividend, divisor):

    global remainder

    quotient = 1
      
    if divisor == dividend:
        remainder = 0
        return 1
    elif dividend < divisor:
        remainder = dividend
        return 0
   
    while divisor <= dividend:
       
        #Here divisor < dividend, therefore left shift (multiply by 2) divisor and quotient
        divisor = divisor << 1
        quotient = quotient << 1        

    #We have reached the point where divisor > dividend, therefore divide divisor and quotient by two
    divisor = divisor >> 1
    quotient = quotient >> 1
   
    #Call division recursively for the difference to get the exact quotient
    quotient = quotient + division(dividend - divisor, tempdivisor)
            
    return quotient

print "%s / %s: quotient = %s" % (dividend, tempdivisor, division(dividend, divisor))
print "%s / %s: remainder = %s" % (dividend, tempdivisor, remainder)

Monday Aug 27, 2007

Java Regular Expressions: Validating HTTP GET URIs, fetching GET Paramaters and values

I was wondering if there is a way to check the validity of HTTP GET URI using Java regular expressions and if its valid, it should fetch all the GET Parameters and their values. Fortunately after some time hacking around Java REs, I discovered an easy solution to accomplish the same, though I am not sure if its efficient.

class URIMatcher {
public static void main (String args[]) {
   
    String query = "Param1=1&Param2=23&Param3=3335&Param4=hello&Param5=&Param6=world";
   
    Pattern ValidURI = Pattern.compile("(?:([a-zA-Z0-9]+)=([\^=&]\*)&)\*([a-zA-Z0-9]+)=([\^=&]\*)");
    Pattern getValues = Pattern.compile("([a-zA-Z0-9]+)=([\^=&]\*)&\*");
    Matcher ValidURIMatch = ValidURI.matcher(query);
    Matcher getParams = getValues.matcher(query);
   
    if (ValidURIMatch.matches()) {
        while(getParams.find())
            System.out.println("Name = " + getParams.group(1) + " Value = " + getParams.group(2));       
    } else {
        System.out.println("URI is not valid");
    }   
 }
}

The first pattern accepts a valid URI (URIs like Param1=&, Param1=hello&Param2, etc are invalid and are filtered out). From the valid URI, the second pattern fetches all GET Parameters and their values, for the above example it will be

Name = Param1 Value = 1
Name = Param2 Value = 23
Name = Param3 Value = 3335
Name = Param4 Value = hello
Name = Param5 Value =
Name = Param6 Value = world

Regular Expressions are really powerful indeed!
About

prasanna

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today