Abstract
In statistical process control (SPC), models are built from baseline data that are observations during successful production. Often the baseline data has to be extracted from a long steam of historical data that includes observations from both successful and unsuccessful productions. Baseline periods have to be identified correctly to ensure that the SPC models are correct and, subsequently, the on-line monitoring based on these models is effective.
This paper proposes a new method to identify baseline periods in a long historical dataset. The method identifies baseline periods where the quality is good, the quality variable has a stable distribution, and the time intervals are sufficiently long. The proposed method is tested on a real dataset from a melting process and yields a baseline that is considered reasonable and convincing to the process engineers. Simulation experiments also show that the proposed method is robust to the distribution of the quality variable by consistently identifying correct baseline periods across different distributions. In contrast, two existing methods of change-point identification are very sensitive to distribution assumptions.