SAP Video about the avato Industrie 4.0 project at Steinbeis

SAP Video about the avato Industrie 4.0 project at Steinbeis

“Faster” and “Better” are the keywords around which the Industry 4.0 project at Steinbeis Papier revolves. With avato consulting as a strategic partner these goals could be achieved. What do you need? 25,000 sensors and the solution tailored by avato, which delivers data from production every second and, in conjunction with data from MES and ERP or other systems on the SAP HANA platform, evaluates it almost in real time with the help of machine learning. But not only the production benefits. The new platform is also used in purchasing, materials management and controlling – other areas are planned.

Click here to watch the video for more information about the project.

UPDATE: avato customer Steinbeis also wins EMEA Regional SAP Quality Award

UPDATE: avato customer Steinbeis also wins EMEA Regional SAP Quality Award

avato customer Steinbeis Papier GmbH wins for Industry 4.0 project after the SAP Quality Award 2019 in the category Innovation in Germany also the EMEA Regional Award.

Industry 4.0, IoT, Big Data, AI – just catchwords or ingredients for successful digitalization in medium-sized businesses?

The focus on relevant business results, an excellent team combined with the intelligent use of modern technologies lead to convincing results for the medium-sized manufacturer of sustainably produced recycled paper. The independent international jury of the EMEA SAP Quality Awards therefore awarded the Industry 4.0@Steinbeis project, which was carried out with avato’s support, gold in the Innovation category. Out of all the winning projects of the 15 SAP market units, it was selected as “the best of the best SAP implementation projects” in the Innovation category.

If you want to learn more about the project, Steinbeis Papier and the services of avato consulting, please read on…

UPDATE: EMEA Regional SAP Quality Award Gold in the category Innovation for Industry 4.0 @ Steinbeis

UPDATE: EMEA Regional SAP Quality Award Gold in the category Innovation for Industry 4.0 @ Steinbeis

In the middle of 2017 Steinbeis Papier GmbH ( chose avato consulting ag as a strategic partner for its digitization initiative under the title Industry 4.0 @ Steinbeis Papier. In a short preparation phase the goals were defined, possible application scenarios evaluated and prioritized for implementation in a roadmap.

The implementation started at the beginning of 2018. The technical platform was set up, more than 25.000 sensors were integrated and the prioritized application scenarios were implemented in several release cycles.

The project led to impressive results for the medium-sized manufacturer of sustainably produced recycled paper. These convinced not only the Steinbeis management, but also the jury of the SAP Quality Awards 2019. The project was initially awarded gold in the Innovation category by SAP Germany in mid-December ( The jury was particularly impressed by how the joint team of Steinbeis and avato significantly improved production and maintenance processes within a tight time and budget frame through intelligent and innovative use of modern technologies and with a data-driven, agile approach.

UPDATE: At the end of April 2020, the project was now also awarded the Regional SAP Quality Gold Award in the Innovation category. An independent jury selected it as “the best of the best SAP implementation projects” in the Innovation category from all local SAP Quality Award winners from the 15 SAP market units in Europe, the Middle East and Africa (EMEA region) in a multi-stage selection process (

In the first project phase, the focus was on optimizing production processes and the value chain. In particular, the early automated detection of unusual events and procedures in the production facilities and the production process delivers significant optimizations in yields and quality, but also helps to prevent expensive machine and plant failures.

Currently, the integrated database from production and business management applications with the analysis and machine learning tools are used to implement various application scenarios in materials management, purchasing and controlling.

The solution used at Steinbeis is based on the avato Smart Data Framework. The SAP HANA in-memory database is used to collect the data from the various production and quality systems and to analyze it in conjunction with information from the MES and SAP system. Furthermore, various modern machine learning algorithms and IT tools are used to quickly and cost-efficiently process the large amounts of data into practically usable information.

If you would like to know more about the procedure, the tools used and the experiences, please contact us or browse through our blog.



Start free, grow at low cost


SAP HANA Projects do not have to start with a major investment

SAP HANA, Express Edition (HANA XE) is a slim version of the SAP HANA platform, which, with a few exceptions, offers all important functionalities in a compact and ready-to-use format. With this multimodal in-memory database platform, a wide variety of data can be processed efficiently.

Among the most important modules and functions are:

  • Relational DB Engine (OLTP + OLAP)
  • Graph Engine
  • Text processing module
  • Geo-spatial module
  • Document Store
  • Time Series
  • In-database Predictive / Machine Learning

The different engines can be applied simultaneously and directly to your data. The data does not have to be duplicated and managed multiple times between different distributed software components, wich is often the case with other data architectures. This means that even complicated use cases can be tackled quickly and efficiently with a relatively simple architecture.

Furthermore, the platform offers several built-in machine learning, AI and business algorithms. Since these are implemented very close to the data, query response times in fractions of a second – even on mass data – are possible.

The software license enables both non-productive and productive use cases. This means that a HANA XE can be used not only to create prototypes but also to deploy productive applications.


Commercial Aspects

HANA XE comes with all important HANA features free of charge up to 32GB RAM. This allows SAP HANA projects to start with low cost – without lengthy license procurement – and quickly. Especially PoC or pilot projects, as they are almost always recommended in the Data Analytics and Machine Learning area, benefit from the uncomplicated and fast use of the very powerful SAP HANA technology.

If the free 32GB RAM is no longer sufficient, the license can be easily extended up to 128GB with license upgrades from the SAP Store:

Due to the efficient data compression SAP HANA requires significantly less memory resources than conventional databases. Depending on the data and the chosen data model, compression factors of 5-7 are common. Compared to uncompressed CSV data, a factor of up to 15 can even be achieved.


“In the wild” – Examples from the field

In our avato Smart Data and SAP consulting practice, the SAP HANA XE platform has established itself as an all-purpose weapon. avato was able to implement even complex use cases with low budgets by making use of this toolbox. Examples are:

  • Master data generation and optimization with ML methods; analysis and processing of SAP transaction data from 10+ years
  • Digital Assistants for SAP ERP by replicating master and transaction data from SAP ERP into a SAP HANA XE for super-fast analysis and processing using ML algorithms, and delivering the results to SAP ERP via digital assistant with or without user interaction
  • Advanced production controlling applications: material and document flows (e.g. batches) are converted into graphs. Key performance indicators can then be calculated easily using graph analysis methods. Even more complex analyses can be performed by applying graph algorithms to the data
  • Advanced production analysis applications: historical and near real-time production data from complex chemical manufacturing processes are represented as graphs together with plant and asset information. Even complex questions can be answered easily using the graph
  • Real-time reporting for several business areas (including procurement, controlling, production). The use of Virtual Data Models (VDM) and HANA XSA modelling, as well as analyzing the data ins SAP Analytics Cloud (SAC), allows powerful reporting solutions in a short time and with low effort


We are happy to support you solving your tasks with SAP HANA XE in a smart way.

SAP HANA XE – Start free, grow at low cost

Do you have any question? Simply email to: Imprint:  Date: March 2020 Author: Andor Németh Contact: © 2020 avato consulting ag All Rights Reserved.
Production Monitoring 4.0 in the Paper Industry

Production Monitoring 4.0 in the Paper Industry

Initial Situation:

How many people are necessary to operate a paper machine optimally? Due to the high degree of automation, actual operation is possible with a very small production team. Over the last 10 years, some paper manufacturers have increased the production volume per employee by a factor of 10! At the same time, paper production is and remains a complex dynamic process with many possible settings and influencing options in a complex production plant. Due to the high and still increasing number of sensors, a fully manual monitoring of the production process by only a few persons is impossible in practice. As a result, problems in the system or operating settings are often not detected. The consequences are unplanned downtimes and quality deterioration in the end product. In many cases, only time-consuming ex-post analyses are possible. Even though process control systems are offering alarming functionality, the checks made are rule-based using static limits without taking operating mode, grades or changes in settings into account. As a result, end users are flooded with alarms, which is why these alarming functions are usually only used to a very limited extent.

Smart Data Approach:

Fully automated and dynamic monitoring of thousands of process signals and alarms in case of unusual patterns in sensor data allow early identification of problems in production. With this new insight derived from data, downtimes can be prevented and product quality is improved. In the Smart Data alarming system, the normal behaviour of the machine is continuously dynamically derived from historical data, taking into account grades and operating modes. Dependent alarms are summarized and prioritized according to importance. In addition to sensor data, monitoring can also be flexibly applied to other data such as quality parameters or calculated indicators such as raw material consumption etc. Resulting alarms are presented in a user-friendly interface where they can be investigated and processed further by end-users with extended analysis functions.


  • Increase of OEE – potentially saving several hundred thousands of euro per year
  • Prevention of Downtimes
  • Improved quality of the final product
  • Predictive maintenance


  • Real-time Monitoring
  • Dynamic calculation of threshold values
  • Consideration of grades and production modes
  • Prioritization of alarms
  • Automated monitoring of raw material and energy consumption

Production Monitoring 4.0 in the Paper Industry – Reduced downtime, improved quality, predictive maintenance

Do you have any question? Simply email to:

Date: January 2020
Author: Leon Müller
© 2020 avato consulting ag
All Rights Reserved.

XGBOOST: Differences between gbtree and gblinear

XGBOOST: Differences between gbtree and gblinear


The XGBoost framework has become a very powerful and very popular tool in machine learning. This library contains a variety of algorithms, which usually come along with their own set of hyperparameters. This allows to combine many different tunes and flavors of these algorithms within one package. We can model various classification, regression or ranking tasks by using trees and linear functions, by applying different regularization schemes and by adjusting many other aspects of the individual algorithms.

These options are governed by hyperparameters. They can be separated into two classes: parameters that set a model’s characteristics and parameters that adjust a model’s behavior. An example for the first type is the model’s objective, where we set the type of the prediction variable. Obviously, a binary classification task will have a different output than a numerical prediction. The second set of parameters manages the training process. For example, the learning rate, usually called eta, adjusts the information gain for each learning step and thus prevents overfitting.

This article focuses on two specific parameters, that appear to be entangled and might cause some confusion: the objective and the booster.

For more information on the full set of parameters see the official XGBoost documentation.


Short overview of two XGBoost parameters: booster and objective

  • The booster parameter sets the type of learner. Usually this is either a tree or a linear function. In the case of trees, the model will consist of an ensemble of trees. For the linear booster, it will be a weighted sum of linear functions.
  • The objective determines the learning task, thus the type of the target variable. The available options include regression, logistic regression, binary and multi classification or rank. This option allows to apply XGBoost models to several different types of use cases. The default value is “reg:squarederror” (previously called “reg:linear” which was confusing and was therefore renamed (see details)).

It should be noted, that the objective is independent of the booster. Decision trees are not only able to perform classification tasks but also to predict continuous variables with a certain granularity for the data input range used in the training.

Thus, the objective is always determined by the modeling task at hand, while the two common booster choices can be valid for the same problem.


Visualizing different boosters

To illustrate the differences between the two main XGBoost booster tunes, a simple example will be given, where the linear and the tree tune will be used for a regression task. The analysis is done in R with the “xgboost” library for R.

In this example, a continuous target variable will be predicted. Thus, the correct objective is “reg:squarederror”. The two main booster options, gbtree and gblinear, will be compared.

The dataset is designed to be simple. The input parameter x is a continuous variable, ranging from 0 to 10. No noise is added to keep the task easy. The target variable is generated from the input parameter:


The training data is chosen to be a subset of the full dataset, by selecting two subranges, [1:4] and [6:9]. This is illustrated in the figure below (yellow data points). By this, it can be tested how well the model behaves on unseen data.

With this training data, two XGBoost models are generated, m1_gbtree with the gbtree booster and m2_gblinear with the gblinear booster. The trained models are then used to generate predictions for the whole data set.

RMSE (full data)
MAE (full data)
RMSE (train data)
MAE (train data)
m1_gbtree 4.91 2.35 0.05 0.03
m2_gblinear 7.74 6.39 4.50 3.89

The predictions for the full dataset are shown in the plot above along with the full dataset. The first model, which uses trees, predicts the training data well in those regions, where the model was supplied with training data. However, in the outer regions (x<1  and x>9) as well as in the central region (4<x<6) discrepancies arise. The tree-based model replicates the prediction of the closest known datapoint, thus generating horizontal lines. This is always the case when trees are used for continuous predictions. No formula is learned which allows for inter- or extrapolation.

The second model uses a linear function for each learner in the gradient boosting process. The weighted combination of these learners is still a linear function. This explains the model’s behavior: The predictions follow a linear curve rather than the non-linear behavior of the data.

When looking at the metrics for the full dataset, the tree-based model shows a lower RMSE (4.9 versus 7.7) and MAE (2.4 versus 6.4) than the linear model. It should be noted, that the other hyperparameter of the models were not tuned and as a result, these numbers do not necessarily reflect the optimum. Nevertheless, they show how poor the models perform on the full dataset. The metrics that only consider training data reflect the differences in the modeling. The tree-based model represents the training data well, while the linear model does not. This is due to the fact, that the dependency of the target variable with the input variable is non-linear.

Can the models be improved, if a non-linear variable is supplied? As a test, each model is trained on modified input data, which is based on the original input variable as well as a new variable x int =x² .

The new variable contains the interaction term, which causes the non-linear behavior in the first place.

Additionally, a simple linear regression model is added for comparison, with and without interaction term.

RMSE (full data)
MAE (full data)
RMSE (train data)
m1_gbtree 4.91 2.35 0.05  
m6_gbtree_int 4.91 2.35 0.05  
m2_gblinear 7.74 6.39 4.50  
m3_gblinear_int 0.00 0.00 0.00  
m4_lin_reg 7.74 6.39 4.50  
m5_lin_reg_int 0.00 0.00 0.00  

From this we can learn a few things:

Firstly, the interaction term significantly improves the linear models. They show perfect agreement with the full dataset. Here, the regression function exactly models the true relation between input and target variables. In addition, this trained function allows to extrapolate well to unseen data.

Secondly, the tree based model did not improve by including the interaction term. This can be explained by considering again how trees work for a regression task. A tree splits the input space of the training data into fine categories, which are represented by its leaves. The prediction value for each leaf is learned from the target variable, thus the target variable is discretized. Adding more input variables refines the splitting of the input space. In this example, the original input variable x is sufficient to generate a good splitting of the input space and no further information is gained by adding the new input variable.

Finally, the linear booster of the XGBoost family shows the same behavior as a standard linear regression, with and without interaction term. This might not come as a surprise, since both models optimize a loss function for a linear regression, that is reducing the squared error. Both models should converge to the optimal result, which should be identical (though maybe not in every last digit). This comparison is of course only valid when using the objective “reg:squarederror” for the XGBoost model.



In this article, the two main boosters gblinear and gbtree of the XGBoost family were tested with non-linear and non-continuous data. Both boosters showed conceptual limits regarding their ability to extrapolate or handle non-linearity. Tree-based models allow to represent all types of non-linear data well, since no formula is needed which describes the relation between target and input variables. This is an enormous advantage if these relations and interactions are unknown. Linear models on the other hand cannot learn other relations than pure linear ones. If these additional interactions can be supplied, linear models become quite powerful.

The second aspect considered the fact that the training data does not always cover the full data range of the use case. Here, the model needs to inter- or extrapolate from known datapoints to the new regions. In the case of trees, no formula is available, which would allow to navigate these areas and provide meaningful prediction values. In contrast, this is the main advantage of linear regression models – if the same assumptions can be applied to the new data. In other words, if the new data behaves in the same way.


Do you have any question? Simply email to:

Date: December 2019
Author: Verena Baussenwein
© 2019 avato consulting ag
All Rights Reserved.