0% found this document useful (0 votes)
6 views2 pages

Cycle Detection Tutorial

This tutorial outlines the process for detecting cycles in time series data using auto correlation, detailing the necessary environment setup and dependencies. It provides step-by-step instructions for data generation, running Spark jobs for statistics, and aggregating data to hourly averages. Configuration adjustments can be made in the cyd.conf file as needed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Cycle Detection Tutorial

This tutorial outlines the process for detecting cycles in time series data using auto correlation, detailing the necessary environment setup and dependencies. It provides step-by-step instructions for data generation, running Spark jobs for statistics, and aggregating data to hourly averages. Configuration adjustments can be made in the cyd.conf file as needed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

This tutorial is for cycle detection in time series data using auto correlation.

A
set of
candidate lags are provided. The lag with the highest correlation corresponds to a
cycle.

Environment
===========
Path etc shown here corresposnds to my environment. Please Change them as needed
for your
environment

Build
=====
Follow instructions in spark_dependency.txt

Python dependency
=================
The shell script commands for data generation run python scripts for data
generation. Before you run
the data generation commands do the following
1. checkout project avenir
2. copy the avenir/python/lib directory to ../lib with respect to your location of
cpu_usage.py file

Create input data


=================
./and_spark.sh crInput <num_of_days> <reading_intervaL> <num_servers> <output_file>

where
num_of_days = number of days e.g 15
reading_intervaL = reading interval in sec e.g. 300
num_servers = number of servers e.g. 4
output_file = output file, we will use c.txt from now on

Copy output to input path for NumericalAttrStats and TemporalAggregator spark jobs

Run Spark job for stats


=======================
./cyd.sh numStat

Copy and consolidate stats file


===============================
./and_spark.sh crStatsFile

Aggregate to hourly
===================
If the sampling interval is in minutes or sec aggregate to hourly average
./cyd.sh tempAggr

Copy and consolidate aggregate output


=====================================
./cyd.sh crAucInput

Run Spark job for auto correlation


==================================
./cyd.sh autoCor

Configuration
=============
Configuration is in cyd.conf. Make changes as necessary

You might also like