Number of found documents: 262
Published from to

A Bootstrap Comparison of Robust Regression Estimators
Kalina, Jan; Janáček, Patrik
2022 - English
The ordinary least squares estimator in linear regression is well known to be highly vulnerable to the presence of outliers in the data and available robust statistical estimators represent more preferable alternatives. It has been repeatedly recommended to use the least squares together with a robust estimator, where the latter is understood as a diagnostic tool for the former. In other words, only if the robust estimator yields a very different result, the user should investigate the dataset closer and search for explanations. For this purpose, a hypothesis test of equality of the means of two alternative linear regression estimators is proposed here based on nonparametric bootstrap. The performance of the test is presented on three real economic datasets with small samples. Robust estimates turn out not to be significantly different from non-robust estimates in the selected datasets. Still, robust estimation is beneficial in these datasets and the experiments illustrate one of possible ways of exploiting the bootstrap methodology in regression modeling. The bootstrap test could be easily extended to nonlinear regression models. Keywords: linear regression; robust estimation; nonparametric bootstrap; bootstrap hypothesis testing Fulltext is available at external website.
A Bootstrap Comparison of Robust Regression Estimators

The ordinary least squares estimator in linear regression is well known to be highly vulnerable to the presence of outliers in the data and available robust statistical estimators represent more ...

Kalina, Jan; Janáček, Patrik
Ústav informatiky, 2022

Recent Trends in Machine Learning with a Focus on Applications in Finance
Kalina, Jan; Neoral, Aleš
2022 - English
Machine learning methods penetrate to applications in the analysis of financial data, particularly to supervised learning tasks including regression or classification. Other approaches, such as reinforcement learning or automated machine learning, are not so well known in the context of finance yet. In this paper, we discuss the advantages of an automated data analysis, which is beneficial especially if a larger number of datasets should be analyzed under a time pressure. Important types of learning include reinforcement learning, automated machine learning, or metalearning. This paper overviews their principles and recalls some of their inspiring applications. We include a discussion of the importance of the concept of information and of the search for the most relevant information in the field of mathematical finance. We come to the conclusion that a statistical interpretation of the results of theautomatic machine learning remains crucial for a proper understanding of the knowledge acquired by the analysis of the given (financial) data. Keywords: statistical learning; automated machine learning; metalearning; financial data analysis; stock market investing Fulltext is available at external website.
Recent Trends in Machine Learning with a Focus on Applications in Finance

Machine learning methods penetrate to applications in the analysis of financial data, particularly to supervised learning tasks including regression or classification. Other approaches, such as ...

Kalina, Jan; Neoral, Aleš
Ústav informatiky, 2022

The 2020 Election In The United States: Beta Regression Versus Regression Quantiles
Kalina, Jan
2021 - English
The results of the presidential election in the United States in 2020 desire a detailed statistical analysis by advanced statistical tools, as they were much different from the majority of available prognoses as well as from the presented opinion polls. We perform regression modeling for explaining the election results by means of three demographic predictors for individual 50 states: weekly attendance at religious services, percentage of Afroamerican population, and population density. We compare the performance of beta regression with linear regression, while beta regression performs only slightly better in terms of predicting the response. Because the United States population is very heterogeneous and the regression models are heteroscedastic, we focus on regression quantiles in the linear regression model. Particularly, we develop an original quintile regression map, such graphical visualization allows to perform an interesting interpretation of the effect of the demographic predictors on the election outcome on the level of individual states. Keywords: elections results; electoral demography; quantile regression; heteroscedasticity; outliers Fulltext is available at external website.
The 2020 Election In The United States: Beta Regression Versus Regression Quantiles

The results of the presidential election in the United States in 2020 desire a detailed statistical analysis by advanced statistical tools, as they were much different from the majority of available ...

Kalina, Jan
Ústav informatiky, 2021

Application Of Implicitly Weighted Regression Quantiles: Analysis Of The 2018 Czech Presidential Election
Kalina, Jan; Vidnerová, Petra
2021 - English
Regression quantiles can be characterized as popular tools for a complex modeling of a continuous response variable conditioning on one or more given independent variables. Because they are however vulnerable to leverage points in the regression model, an alternative approach denoted as implicitly weighted regression quantiles have been proposed. The aim of current work is to apply them to the results of the second round of the 2018 presidential election in the Czech Republic. The election results are modeled as a response of 4 demographic or economic predictors over the 77 Czech counties. The analysis represents the first application of the implicitly weighted regression quantiles to data with more than one regressor. The results reveal the implicitly weighted regression quantiles to be indeed more robust with respect to leverage points compared to standard regression quantiles. If however the model does not contain leverage points, both versions of the regression quantiles yield very similar results. Thus, the election dataset serves here as an illustration of the usefulness of the implicitly weighted regression quantiles. Keywords: linear regression; quantile regression; robustness; outliers; elections results Fulltext is available at external website.
Application Of Implicitly Weighted Regression Quantiles: Analysis Of The 2018 Czech Presidential Election

Regression quantiles can be characterized as popular tools for a complex modeling of a continuous response variable conditioning on one or more given independent variables. Because they are however ...

Kalina, Jan; Vidnerová, Petra
Ústav informatiky, 2021

Multifractal approaches in econometrics and fractal-inspired robust regression
Kalina, Jan
2021 - English
While the mainstream economic theory is based on the concept of general economic equilibrium, the economies throughout the world have recently been facing serious transformations and challenges. Thus, instead of a convergence to equilibrium, the economies can be regarded as unstable, turbulent or chaotic with properties characteristic for fractal or multifractal processes. This paper starts with a discussion of recent data analysis tools inspired by fractal or multifractal concepts. We pay special attention to available data analysis tools based on reciprocal weights assigned to individual observations - these are inspired by an assumed fractal structure of multivariate data. As an extension, we consider here a novel version of the least weighted squares estimator of parameters for the linear regression model, which exploits reciprocal weights. Finally, we perform a statistical analysis of 31 datasets with economic motivation and compare the performance of the least weighted squares estimator with various weights. It turns out that the reciprocal weights, inspired by the fractal theory, are not superior to other choices of weights. In fact, the best prediction results are obtained with trimmed linear weights. Keywords: chaos in economics; fractal market hypothesis; reciprocal weights; robust regression; prediction Available in digital repository of the ASCR
Multifractal approaches in econometrics and fractal-inspired robust regression

While the mainstream economic theory is based on the concept of general economic equilibrium, the economies throughout the world have recently been facing serious transformations and challenges. Thus, ...

Kalina, Jan
Ústav informatiky, 2021

On kernel-based nonlinear regression estimation
Kalina, Jan; Vidnerová, Petra
2021 - English
This paper is devoted to two important kernel-based tools of nonlinear regression: the Nadaraya-Watson estimator, which can be characterized as a successful statistical method in various econometric applications, and regularization networks, which represent machine learning tools very rarely used in econometric modeling. This paper recalls both approaches and describes their common features as well as differences. For the Nadaraya-Watson estimator, we explain its connection to the conditional expectation of the response variable. Our main contribution is numerical analysis of suitable data with an economic motivation and a comparison of the two nonlinear regression tools. Our computations reveal some tools for the Nadaraya-Watson in R software to be unreliable, others not prepared for a routine usage. On the other hand, the regression modeling by means of regularization networks is much simpler and also turns out to be more reliable in our examples. These also bring unique evidence revealing the need for a careful choice of the parameters of regularization networks Keywords: nonlinear regression; machine learning; kernel smoothing; regularization; regularization networks Available in digital repository of the ASCR
On kernel-based nonlinear regression estimation

This paper is devoted to two important kernel-based tools of nonlinear regression: the Nadaraya-Watson estimator, which can be characterized as a successful statistical method in various econometric ...

Kalina, Jan; Vidnerová, Petra
Ústav informatiky, 2021

Least Weighted Absolute Value Estimator with an Application to Investment Data
Vidnerová, Petra; Kalina, Jan
2020 - English
While linear regression represents the most fundamental model in current econometrics, the least squares (LS) estimator of its parameters is notoriously known to be vulnerable to the presence of outlying measurements (outliers) in the data. The class of M-estimators, thoroughly investigated since the groundbreaking work by Huber in 1960s, belongs to the classical robust estimation methodology (Jurečková et al., 2019). M-estimators are nevertheless not robust with respect to leverage points, which are defined as values outlying on the horizontal axis (i.e. outlying in one or more regressors). The least trimmed squares estimator seems therefore a more suitable highly robust method, i.e. with a high breakdown point (Rousseeuw & Leroy, 1987). Its version with weights implicitly assigned to individual observations, denoted as the least weighted squares estimator, was proposed and investigated in Víšek (2011). A trimmed estimator based on the 𝐿1-norm is available as the least trimmed absolute value estimator (Hawkins & Olive, 1999), which has not however acquired attention of practical econometricians. Moreover, to the best of our knowledge, its version with weights implicitly assigned to individual observations seems to be still lacking. Keywords: robust regression; regression median; implicit weighting; computational aspects; nonparametric bootstrap Fulltext is available at external website.
Least Weighted Absolute Value Estimator with an Application to Investment Data

While linear regression represents the most fundamental model in current econometrics, the least squares (LS) estimator of its parameters is notoriously known to be vulnerable to the presence of ...

Vidnerová, Petra; Kalina, Jan
Ústav informatiky, 2020

On the Effect of Human Resources on Tourist Infrastructure: New Ideas on Heteroscedastic Modeling Using Regression Quantiles
Kalina, Jan; Janáček, Patrik
2020 - English
Tourism represents an important sector of the economy in many countries around the world. In this work, we are interested in the effect of the Human Resources and Labor Market pillar of the Travel and Tourism Competitiveness Index on tourist service infrastructure across 141 countries of the world. A regression analysis requires to handle heteroscedasticity in these data, which is not an uncommon situation in various available human capital studies. Our first task is focused on testing significance of individual variables in the model. It is illustrated here that significance tests are influenced by heteroscedasticity, which remains true also for tests for regression quantiles or robust regression estimators, resistant to a possible contamination of data by outliers. Only if a suitable model is considered, which takes heteroscedasticity into account, the effect of the Human Resources and Labor Market pillar turns out to be significant. Further, we propose and present a new diagnostic tool denoted as aquintile plot, allowing to interpret immediately the heteroscedastic structure of the linear regression model for possibly contaminated data. Keywords: tourism infrastructure; human resources; regression; robustness; regression quantiles Fulltext is available at external website.
On the Effect of Human Resources on Tourist Infrastructure: New Ideas on Heteroscedastic Modeling Using Regression Quantiles

Tourism represents an important sector of the economy in many countries around the world. In this work, we are interested in the effect of the Human Resources and Labor Market pillar of the Travel and ...

Kalina, Jan; Janáček, Patrik
Ústav informatiky, 2020

Regression for High-Dimensional Data: From Regularization to Deep Learning
Kalina, Jan; Vidnerová, Petra
2020 - English
Regression modeling is well known as a fundamental task in current econometrics. However, classical estimation tools for the linear regression model are not applicable to highdimensional data. Although there is not an agreement about a formal definition of high dimensional data, usually these are understood either as data with the number of variables p exceeding (possibly largely) the number of observations n, or as data with a large p in the order of (at least) thousands. In both situations, which appear in various field including econometrics, the analysis of the data is difficult due to the so-called curse of dimensionality (cf. Kalina (2013) for discussion). Compared to linear regression, nonlinear regression modeling with an unknown shape of the relationship of the response on the regressors requires even more intricate methods. Keywords: regression; neural networks; robustness; high-dimensional data; regularization Fulltext is available at external website.
Regression for High-Dimensional Data: From Regularization to Deep Learning

Regression modeling is well known as a fundamental task in current econometrics. However, classical estimation tools for the linear regression model are not applicable to highdimensional data. ...

Kalina, Jan; Vidnerová, Petra
Ústav informatiky, 2020

Lexicalized Syntactic Analysis by Restarting Automata
Mráz, F.; Otto, F.; Pardubská, D.; Plátek, Martin
2019 - English
We study h-lexicalized two-way restarting automata that can rewrite at most i times per cycle for some i ≥ 1 (hRLWW(i)-automata). This model is considered useful for the study of lexical (syntactic) disambiguation, which is a concept from linguistics. It is based on certain reduction patterns. We study lexical disambiguation through the formal notion of h-lexicalized syntactic analysis (hLSA). The hLSA is composed of a basic language and the corresponding h-proper language, which is obtained from the basic language by mapping all basic symbols to input symbols. We stress the sensitivity of hLSA by hRLWW(i)-automata to the size of their windows, the number of possible rewrites per cycle, and the degree of (non-)monotonicity. We introduce the concepts of contextually transparent languages (CTL) and contextually transparent lexicalized analyses based on very special reduction patterns, and we present two-dimensional hierarchies of their subclasses based on the size of windows and on the degree of synchronization. The bottoms of these hierarchies correspond to the context-free languages. CTL creates a proper subclass of context-sensitive languages with syntactically natural properties. Keywords: Restarting automaton; h-lexicalization; lexical disambiguation Fulltext is available at external website.
Lexicalized Syntactic Analysis by Restarting Automata

We study h-lexicalized two-way restarting automata that can rewrite at most i times per cycle for some i ≥ 1 (hRLWW(i)-automata). This model is considered useful for the study of lexical (syntactic) ...

Mráz, F.; Otto, F.; Pardubská, D.; Plátek, Martin
Ústav informatiky, 2019

About project

NRGL provides central access to information on grey literature produced in the Czech Republic in the fields of science, research and education. You can find more information about grey literature and NRGL at service web

Send your suggestions and comments to nusl@techlib.cz

Provider

http://www.techlib.cz

Facebook

Other bases