"In industrial settings, the output yields of certain units often exhibit significant fluctuation .."

mars RaqForum 25 No.
0 View • 4 Hours ago

7.1 Output yield and production route

compilation of industrial mathematics algorithms(39)

In industrial settings, the output yields of certain units often exhibit significant fluctuations, and some of these outputs require special attention. These outputs of interest are often associated with specific production routes. For example, in a catalytic cracking unit, a high gasoline yield indicates a gasoline route, while a high diesel yield indicates a diesel route. The goal is to classify historical yield data into a specified number of clusters, so that these clusters correspond to the respective production routes. Subsequently, when modeling (e.g., building a linear model under mass conservation constraints), data from the same production route are selected as input data to improve model accuracy.

The yield data of different outputs for a given input is represented by X:

where n is the number of outputs and m is an index. Each row represents the yields of different outputs for a particular day’s production data, and each column represents the daily yields of a specific output.

Our aim is to classify the yield data, X. However, since the historical data lacks pre-existing labels, unsupervised clustering is necessary. Given that the number of production routes corresponds to the number of clusters, k-means clustering satisfies our requirements.

Classifying the yield data, X, into k clusters is performed as follows:

1. Randomly select k data points as initial cluster centroids C:

where C_t is the centroid of the t-th cluster in C, chosen randomly from X.

2. Calculate the distance from each data point to the k centroids. Assign each data point to the k-th cluster whose centroid is closest:

xc_i=t,t=pmin(dis(X_i,C_t))

where xc_i is the cluster label for the i-th row X_i of X, C_t is the centroid of the t-th cluster, and dis() is the distance calculation method.

Yields from the same production route are similar. If each row in X is viewed as a point in a high-dimensional space, points from the same production route should cluster together, while points from different production routes should be relatively far apart. Therefore, we choose Euclidean distance as the distance metric.

3. Update the centroids.

Compute the mean yield of each output yield in each cluster and use them as the new centroids, CN:

CN_t=avg(Xt_j),j∈[1,2,…,n],t∈[1,2,…,k]

where CN_t is the new centroid of the t-th cluster, and Xt_j is the yield of the j-th output from data points in X that belong to the t-th cluster.

4. Repeat steps 2 and 3 until the centroids no longer move or the maximum number of iterations N is reached.

That is, the sum of the distances between the centroids is less than a very small threshold, ε:

sum(dis(CN_t, CO_t))<ε|| iter==N,t∈[1,2,…,k]

where CO_t is the centroid of the t-th cluster from the previous iteration, ε is a very small number, iter is the number of iterations, and dis() is the distance calculation method.

Upon completion of the iterations, each point in X is assigned to the cluster associated with the nearest centroid, C_t.

xc_i=t,t=pmin(dis(X_i,C_t))

SPL routine:

	A	B	C	D
1	[[1,2,3,4],[2,3,1,2],[1,1,1,-1],[1,0,-2,-6]]			/X
2	=k=2		/Number of clusters
3	=iter=300		/Number of iterations
4	=center=null		/Initial centroids C
5	=it=0
6	=func(A7,A1,A2,A3,A4)
7	func
8		if !D7	=D7=A7.sort(rand()).to(k)	/Random centroids
9			=it+=1
10			return func(A7,A7,B7,C7,D7)
11		else	=A7.((d=~,D7.(dis(~,d))))
12			=C11.(~.pmin())
13			=A7.group(C12(#))
14			=C13.((cent=mmean(~,1).~,if(ifa(cent),cent,[cent])))
15			=C14.sum(dis(~,D7(#)))	/Sum of distances between new and old centroids
16			1E-4	/ε
17			if C15<C16\|\|it==C7
18				=C14
19				=C12
20				return [D18,D19]
21			else	=D7=C14
22				=it+=1
23				return func(A7,A7,B7,C7,D7)