The branch development holds the new version with a database implementation
This analyzer uses IIS Logs, to create statistics as raw text files and as CSV files. besides the mentioned text reports the analyzer also creates a directory called graphs
, which stores a collection of charts created with matplotlib
.
- Python > 3.5 (Not tested with python 2.X)
- matplotlib >= 3.1.1
- Memory >= Logfiles size (A future version will probably migrate to a real db to remove this constrain)
- Execute the main.py once, to create the necessary directorys
- Copy your logfiles inside the input folder
- Execute main.py again
- The raw text statistics are now inside the
reports
directory, the csvs insidecsvs
and inside theoutput
directory is a file, which is the combination of all your logfiles
The following statistics are created by the analyzer in raw text and csv:
report name | information | csv names | chart |
---|---|---|---|
Browser | A list of all browser communicating with the IIS and the amount of http request each browser has made | HitsPerBrowser | Yes |
HitsPerDay | A list with all weekdays and the http request made at this day (whole period) | HitsPerDay | Yes |
HitsPerEndpoint | A list of all endpoints targeted by clients and the request amount | HitsPerEndpoint | Yes (Top 10) |
HitsPerHour | All hours and their http request amounts (whole period) | HitsPerHour | Yes |
HitsPerMonth | A list with all months of the period and their http request during that time | HitsPerMonth | Yes |
HTTPCode206 | A list with months where HTTP 206 Codes happend. The amount of that code and which endpoint caused it | HTTP206HitsPerMonth | No |
HTTPCodeHits | A list with the hitten HTTP codes and their amount | HitsPerHTTPCode | Yes |
IpHits | Http requests per IP address | HitsPerIP | No |
OS | A list of all operating systems hitting the IIS with their hit amounts | HitsPerOS | Yes |
UsersPerMonth | A list of usages1 per month | UsagesPerMonth | No |
1: A usage is defined as the time an IP address communicates with IIS, once a new IP address communicates, another usage is counted.
The directory html_report
includes a report.html
which includes most of the charts from the graphs
directory with an basic explanation. The pictures are embedded as base64 strings inside the html file, which means the file can be moved without loosing content.
The script supports the following formats out of the box:
- date, time, s_ip, cs_method, cs_uri_stem, cs_uri_query, s_port, cs_username, c_ip, user_agent, referer, sc_status, sc_substatus, sc_win32_status, cs_bytes, time_taken
- date, time, s_ip, cs_method, cs_uri_stem, cs_uri_query, s_port, cs_username, c_ip, user_agent, referer, sc_status, sc_substatus, sc_win32_status, time_taken
- date, time, s_ip, cs_method, cs_uri_stem, cs_uri_query, s_port, cs_username, c_ip, user_agent, sc_status, sc_substatus, sc_win32_status, time_taken
The analyzer uses a model to load a log entry. The model looks like this and is located in the lib/models.py
:
class Logentry:
def __init__(self, date, time, s_ip, cs_method, cs_uri_stem, cs_uri_query, s_port, cs_username, c_ip, user_agent, referer, sc_status, sc_substatus, sc_win32_status, cs_bytes, time_taken):
self.date = date
self.time = time
self.s_ip = s_ip
self.cs_method = cs_method
self.cs_uri_stem = cs_uri_stem
self.cs_uri_query = cs_uri_query
self.s_port = s_port
self.cs_username = cs_username
self.c_ip = c_ip
self.user_agent = user_agent
self.referer = referer
self.sc_status = sc_status
self.sc_substatus = sc_substatus
self.sc_win32_status = sc_win32_status
self.cs_bytes = cs_bytes
self.time_taken = time_taken
The entrys are loaded with the following function in the lib/helpers.py
:
def read_file(self, filepath):
print(":: Loading File {}".format(filepath))
entries = []
with open(filepath, "r", encoding=self.encoding) as log_file:
for line in log_file:
data = line.split(" ")
if len(data) == 16:
entries.append(
Logentry(data[0], data[1], data[2], data[3], data[4], data[5], data[6], data[7], data[8], data[9],
data[10], data[11], data[12], data[13], data[14], data[15]))
elif len(data) == 15:
entries.append(
Logentry(data[0], data[1], data[2], data[3], data[4], data[5], data[6], data[7], data[8], data[9],
data[10], data[11], data[12], data[13], 0, data[14]))
elif len(data) == 14:
entries.append(
Logentry(data[0], data[1], data[2], data[3], data[4], data[5], data[6], data[7], data[8], data[9],
0, data[10], data[11], data[12], 0, data[13]))
return entries
If you have a logfile with a diffrent format, which uses the same or less fields as definded in the model, you can adjust the read_file
method to load your logfile right. If your log uses diffrent fields as defined in the model, you have to adjust the model first and then the read_file
method.