X

Innovative ideas for every utility

Tracking users from post processing the access.log

Anthony Shorten
Senior Principal Product Manager

Most log analysers do not produce all statistics that a site might want to track. One of the important usage statistics that needs to be calculated is to track the number of active users (i.e. users that are actively using the system - not idle users) to try and correlate performance events with level of usage on the system.

It is possible to post process the access.log files using a set of UNIX script to summarize the information as the basic information such as userid and date time are contained in the file. The key to processing this data is to:

  • Filter out the superfluous data and isolate the userid, date and time from the access.log records. Remember a record is written to the access.log when the user is actively seeking a resource on the web server. This is a definition of an "active" transaction record.
  • Summarize that subset by the desired time frame (say each minute of the day). This is to cut down the information to the basics that are needed to for analysis. After the summarization you should end up with a unique record for each user in that minute of the day for the dates you require.

Note: The code below is not optimized and is provided as is. I am not a great script programmer and I am sure some of you out there would improve on it.

The script below is an example of such a set of commands to summarize that information:



#!/usr/bin/ksh

# Find all the access.log files in the log directory

# 1) We read each record from the access.log files (cat)

# 2) We filter out the console (grep)

# 3) We get the date time and userid from the access log record (cut)

# 4) We substring the date/time to chop off the seconds, the first "[" char and move the fields around (awk)

# 5) We filter out any records containing "-" (grep)

# 6) We isolate duplicates within that minute (sort)

# 7) We comma seperate the date/time and userid fields (sed)

# 8) Change the ":" within the date/time field to seperate date and time (sed) - Only the first one which should be at the end of the date

# At the end of pass 1 you should have a CSV that has a unique user entry for a date and time to the minute level. This implies that the user was active for that minute on that date.

ls -1 $SPLEBASE/logs/system/*access.log | while read _file

do

echo `date` : Started Parsing ${_file}

echo `date` : Pass 1 Started - Creating user list by minute

export _outfile=${_file}_userlist.csv

cat ${_file} | grep -v console |cut -f3,4 -d " " | awk '{print substr($2,2,17),$1}' | grep -v "-" | sort -u | sed -e "s% %,%g" | sed -e "s%:%,%" >${_outfile}

echo `date` : Pass 1 Ended - Check ${_outfile} for results

echo `date` : Lines created: `wc -l <${_outfile}`

done

echo `date` : Parsing finished



The above set of commands creates a file containing the date, time (to minute only) and userid and is comma delimited.

Note: Additionally the above information with coupled with information from SC_USER table will yield the Name of the user. It can also be used to determine who did not logon on (as you have the users who were active, the missing users are the ones that DID NOT login in the collection period).

This information may be loaded into data or post processed by another script to count the number of users in each minute of the day. The script below is such a script, which takes the files produced by the last process and creates active user counts:



#!/usr/bin/ksh

# Find all the *_userlist.csv files in the current directory

# 1) Sort the file to remove duplicates (just in case the files were combined)

# Otherwise you might count a user twice.

# 2) For each date/time combination work out how many users are active.

# 3) Output a date, time, number CSV file called _summary.csv

ls -1 *userlist.csv | while read _file

do

echo `date` : Started Parsing ${_file}

export _tmpfile=$$.tmp

cat ${_file} | cut -f1,2 -d "," | sort -u >${_tmpfile}

echo `date` : Pass 2 Started - Creating user summary

# Set the file name for the output

export _outfile=`basename ${_file} _userlist.csv`

export _outfile=${_outfile}_summary.csv

# Delete the file if it exists as you are going to append to it

rm -rf ${_outfile} 2>/dev/null

# Set the initial key values

export _key=""

export _count=0

# Loop through file and count number of instances

cat ${_tmpfile} | sort -u | while read _key

do

export _numrows=`cat ${_file} | grep "${_key}" | wc -l`

echo ${_key},${_numrows} >>${_outfile}

done

rm -rf ${_tmpfile} 2>/dev/null

echo `date` : Pass 2 Ended - Check ${_outfile} for results

echo `date` : Lines created: `wc -l <${_outfile}`

done

echo `date` : Parsing finished



This will produce a file that contains date, time (to the minute) and count of active users in a comma delimited format that can be loaded into Excel to produce relevant graphs.

Join the discussion

Comments ( 1 )
  • Muhammad Ahmad Monday, July 12, 2010
    Thats really nice in order to capture user informations.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.