Tracking users from post processing the access.log

Most log analysers do not produce all statistics that a site might want to track. One of the important usage statistics that needs to be calculated is to track the number of active users (i.e. users that are actively using the system - not idle users) to try and correlate performance events with level of usage on the system.
It is possible to post process the access.log files using a set of UNIX script to summarize the information as the basic information such as userid and date time are contained in the file. The key to processing this data is to:

  • Filter out the superfluous data and isolate the userid, date and time from the access.log records. Remember a record is written to the access.log when the user is actively seeking a resource on the web server. This is a definition of an "active" transaction record.

  • Summarize that subset by the desired time frame (say each minute of the day). This is to cut down the information to the basics that are needed to for analysis. After the summarization you should end up with a unique record for each user in that minute of the day for the dates you require.

Note: The code below is not optimized and is provided as is. I am not a great script programmer and I am sure some of you out there would improve on it.

The script below is an example of such a set of commands to summarize that information:

# Find all the access.log files in the log directory
# 1) We read each record from the access.log files (cat)
# 2) We filter out the console (grep)
# 3) We get the date time and userid from the access log record (cut)
# 4) We substring the date/time to chop off the seconds, the first "[" char and move the fields around (awk)
# 5) We filter out any records containing "-" (grep)
# 6) We isolate duplicates within that minute (sort)
# 7) We comma seperate the date/time and userid fields (sed)
# 8) Change the ":" within the date/time field to seperate date and time (sed) - Only the first one which should be at the end of the date
# At the end of pass 1 you should have a CSV that has a unique user entry for a date and time to the minute level. This implies that the user was active for that minute on that date.
ls -1 $SPLEBASE/logs/system/*access.log | while read _file
echo `date` : Started Parsing ${_file}
echo `date` : Pass 1 Started - Creating user list by minute
export _outfile=${_file}_userlist.csv
cat ${_file} | grep -v console |cut -f3,4 -d " " | awk '{print substr($2,2,17),$1}' | grep -v "-" | sort -u | sed -e "s% %,%g" | sed -e "s%:%,%" >${_outfile}
echo `date` : Pass 1 Ended - Check ${_outfile} for results
echo `date` : Lines created: `wc -l <${_outfile}`
echo `date` : Parsing finished

The above set of commands creates a file containing the date, time (to minute only) and userid and is comma delimited.
Note: Additionally the above information with coupled with information from SC_USER table will yield the Name of the user. It can also be used to determine who did not logon on (as you have the users who were active, the missing users are the ones that DID NOT login in the collection period).
This information may be loaded into data or post processed by another script to count the number of users in each minute of the day. The script below is such a script, which takes the files produced by the last process and creates active user counts:

# Find all the *_userlist.csv files in the current directory
# 1) Sort the file to remove duplicates (just in case the files were combined)
# Otherwise you might count a user twice.
# 2) For each date/time combination work out how many users are active.
# 3) Output a date, time, number CSV file called _summary.csv
ls -1 *userlist.csv | while read _file
echo `date` : Started Parsing ${_file}
export _tmpfile=$$.tmp
cat ${_file} | cut -f1,2 -d "," | sort -u >${_tmpfile}
echo `date` : Pass 2 Started - Creating user summary
# Set the file name for the output
export _outfile=`basename ${_file} _userlist.csv`
export _outfile=${_outfile}_summary.csv
# Delete the file if it exists as you are going to append to it
rm -rf ${_outfile} 2>/dev/null
# Set the initial key values
export _key=""
export _count=0
# Loop through file and count number of instances
cat ${_tmpfile} | sort -u | while read _key
export _numrows=`cat ${_file} | grep "${_key}" | wc -l`
echo ${_key},${_numrows} >>${_outfile}
rm -rf ${_tmpfile} 2>/dev/null
echo `date` : Pass 2 Ended - Check ${_outfile} for results
echo `date` : Lines created: `wc -l <${_outfile}`
echo `date` : Parsing finished

This will produce a file that contains date, time (to the minute) and count of active users in a comma delimited format that can be loaded into Excel to produce relevant graphs.


Thats really nice in order to capture user informations.

Posted by Muhammad Ahmad on July 12, 2010 at 10:30 AM EST #

Post a Comment:
  • HTML Syntax: NOT allowed

Anthony Shorten
Hi, I am Anthony Shorten, I am the Principal Product Manager for the Oracle Utilities Application Framework. I have been working for over 20+ years in the IT Business and am the author of many a technical whitepaper, manual and training material. I am one of the product managers working on strategy and designs for the next generation of the technology used for the Utilities and Tax markets. This blog is provided to announce new features, document tips and techniques and also outline features of the Oracle Utilities Application Framework based products. These products include Oracle Utilities Customer Care and Billing, Oracle Utilities Meter Data Management, Oracle Utilities Mobile Workforce Management and Oracle Public Service Revenue Management. I am the product manager for the Management Pack for these products.


« July 2016