Robust Regression Estimators: A Comparison of Prediction Performance
Kalina, Jan; Peštová, Barbora
2017 - English
Regression represents an important methodology for solving numerous tasks of applied econometrics. This paper is devoted to robust estimators of parameters of a linear regression model, which are preferable whenever the data contain or are believed to contain outlying measurements (outliers). While various robust regression estimators are nowadays available in standard statistical packages, the question remains how to choose the most suitable regression method for a particular data set. This paper aims at comparing various regression methods on various data sets. First, the prediction performance of common robust regression estimators are compared on a set of 24 real data sets from public repositories. Further, the results are used as input for a metalearning study over 9 selected features of individual data sets. On the whole, the least trimmed squares turns out to be superior to the least squares or M-estimators in the majority of the data sets, while the process of metalearning does not succeed in a reliable prediction of the most suitable estimator for a given data set.
Keywords:
robust estimation; linear regression; prediction; outliers; metalearning
Available at various institutes of the ASCR
Robust Regression Estimators: A Comparison of Prediction Performance
Regression represents an important methodology for solving numerous tasks of applied econometrics. This paper is devoted to robust estimators of parameters of a linear regression model, which are ...
Various Approaches to Szroeter’s Test for Regression Quantiles
Kalina, Jan; Peštová, Barbora
2017 - English
Regression quantiles represent an important tool for regression analysis popular in econometric applications, for example for the task of detecting heteroscedasticity in the data. Nevertheless, they need to be accompanied by diagnostic tools for verifying their assumptions. The paper is devoted to heteroscedasticity testing for regression quantiles, while their most important special case is commonly denoted as the regression median. Szroeter’s test, which is one of available heteroscedasticity tests for the least squares, is modified here for the regression median in three different ways: (1) asymptotic test based on the asymptotic representation for regression quantiles, (2) permutation test based on residuals, and (3) exact approximate test, which has a permutation character and represents an approximation to an exact test. All three approaches can be computed in a straightforward way and their principles can be extended also to other heteroscedasticity tests. The theoretical results are expected to be extended to other regression quantiles and mainly to multivariate quantiles.
Keywords:
Heteroscedasticity; Regression median; Diagnostic tools; Asymptotics
Available on request at various institutes of the ASCR
Various Approaches to Szroeter’s Test for Regression Quantiles
Regression quantiles represent an important tool for regression analysis popular in econometric applications, for example for the task of detecting heteroscedasticity in the data. Nevertheless, they ...
Výběr relevantních pravidel pro podporu klinického rozhodování
Kalina, Jan; Zvárová, Jana
2017 - Czech
Systémy pro podporu klinického rozhodování jsou důležitými telemedicínskými nástroji se schopností pomáhat lékařům při procesu rozhodování při stanovení diagnózy, terapie či prognózy pacientů. Navrhli a implementovali jsme prototyp systému pro podporu diagnostického rozhodování, který má podobu internetové klasifikační služby. Specifikem tohoto systému je sofistikovaná statistická komponenta, která umožňuje pracovat i s velkým počtem příznaků. Optimalizuje totiž výběr těch příznaků, které jsou nejdůležitější pro určení diagnózy. Její chování jsme ověřili při analýze dat genových expresí z kardiovaskulární genetické studie. Článek diskutuje principy mnohorozměrného statistického uvažování a ukazuje obtíže analýzy vysoce dimenzionálních dat, kdy počet pozorovaných proměnných (příznaků) převyšuje počet pozorování (pacientů). Clinical decision support systems represent important telemedicine tools with the ability to help physicians within the decision process leading to determining diagnosis, therapy or prognosis of patients. We proposed and implemented a prototype of a clinical decision support system, which has the form of an internet classification service. A specific property of this system is a sophisticated statistical component, which allows to handle also a large number of symptoms and signs. It namely optimizes the selection of such symptoms and signs which are the most relevant for determining the diagnosis. The performance of the prototype was verified on an analysis of gene expression data from a cardiovascular genetic study. The paper discusses principles of multivariate statistical thinking and reveals challenges of analyzing high-dimensional data with the number of observed variables (symptoms and signs) largely exceeding the number of observations (patients).
Keywords:
podpora rozhodování; mnohorozměrná statistika; extrakce pravidel; klasifikační analýza; redukce dimensionality
Available on request at various institutes of the ASCR
Výběr relevantních pravidel pro podporu klinického rozhodování
Systémy pro podporu klinického rozhodování jsou důležitými telemedicínskými nástroji se schopností pomáhat lékařům při procesu rozhodování při stanovení diagnózy, terapie či prognózy pacientů. Navrhli ...
Znalostní meze (super)inteligentních systémů
Wiedermann, Jiří
2016 - Czech
V příspěvku ukážeme nový pohled na inteligenci založený na znalostním přístupu k výpočtům. Výpočty budeme chápat jako procesy, které generují znalosti nad danou znalostní doménou v rámci příslušné znalostní teorie. V tomto kontextu budeme uvažovat inteligenci jako schopnost získávat informace a transformovat je na znalosti, které jsou dále využívány pro řešení problémů. Hlavním výsledkem příspěvku je poznatek, že pokud je znalostní doména konečná a neměnná, pak lze konstruovat inteligentní systémy s tzv. samo-zlepšující se znalostní teorii, které dříve nebo později dosáhnou takový stav poznání o dané doméně, který již nelze dále kvalitativně vylepšovat. Systém tak dosáhne meze své inteligence. Based on epistemic approach to computations we present a new perspective on intelligence. Computations will be seen as processes generating knowledge over the given knowledge domain in accordance with the respective knowledge theory. In this context intelligence will be seen as an ability to gain information and transform it to knowledge used for problem solving. The main result of the paper states that as long as the epistemic domain is finite and fixed then intelligent systems with so-called self-improving theories can be designed which soon on later will reach a state of knowledge about the underlying domain which cannot be improved any further. The system will reach the limits of its own intelligence.
Keywords:
znalost; inteligence; inteligentní systém; znalostní teorie; knowledge; intelligence; intelligent system; epistemic theory
Available at various institutes of the ASCR
Znalostní meze (super)inteligentních systémů
V příspěvku ukážeme nový pohled na inteligenci založený na znalostním přístupu k výpočtům. Výpočty budeme chápat jako procesy, které generují znalosti nad danou znalostní doménou v rámci příslušné ...
Principy mnohorozměrného statistického uvažování
Kalina, Jan
2016 - Czech
Available on request at various institutes of the ASCR
Principy mnohorozměrného statistického uvažování
Some Robust Distances for Multivariate Data
Kalina, Jan; Peštová, Barbora
2016 - English
Numerous methods of multivariate statistics and data mining suffer from the presence of outlying measurements in the data. This paper presents new distance measures suitable for continuous data. First, we consider a Mahalanobis distance suitable for high-dimensional data with the number of variables (largely) exceeding the number of observations. We propose its doubly regularized version, which combines a regularization of the covariance matrix with replacing the means of multivariate data by their regularized counterparts. We formulate explicit expressions for some versions of the regularization of the means, which can be interpreted as a denoising (i.e. robust version) of standard means. Further, we propose a robust cosine similarity measure, which is based on implicit weighting of individual observations. We derive properties of the newly proposed robust cosine similarity, which includes a proof of the high robustness in terms of the breakdown point.
Keywords:
multivariate data; distance measures; regularization; robustness; high dimension
Available on request at various institutes of the ASCR
Some Robust Distances for Multivariate Data
Numerous methods of multivariate statistics and data mining suffer from the presence of outlying measurements in the data. This paper presents new distance measures suitable for continuous data. ...
Přehled metod strojového učení
Kalina, Jan
2016 - Czech
Available on request at various institutes of the ASCR
Přehled metod strojového učení
Principy statistického uvažování
Kalina, Jan
2016 - Czech
Available on request at various institutes of the ASCR
Principy statistického uvažování
The thermal regime of ice pits of the Borec hill
Türkott, L.; Martinčíková, Eva; Potop, V.
2014 - English
The ecological stability of the sites with stenoec organisms is important factor for maintaining them at given location. Phonolite system of the Borec hill creates a unique labyrinth of vents. Thermal anomalies occur during the year in the fissure system and create specific microclimate. Flow direction is given by the temperature gradient inside and outside of the system. The lower part of fissure system is located in the debris fields, while the upper part on top of the hill. Phonolite rocks are cooled down by air streaming from the debris fields during the winter. Direction of the air flow changes in the spring and summer. The cold air is exhaled from these vents on the lower parts of system and creates ice pits with the typical vegetation.
Keywords:
ice pit; air temperature; Boreč hill; ventalore
Available on request at various institutes of the ASCR
The thermal regime of ice pits of the Borec hill
The ecological stability of the sites with stenoec organisms is important factor for maintaining them at given location. Phonolite system of the Borec hill creates a unique labyrinth of vents. Thermal ...
Inconspicuous Appeal of Amorphous Computing Systems
Wiedermann, Jiří
2014 - English
Amorphous computing systems typically consist of myriads of tiny simple processors that are randomly distributed at fixed positions or move randomly in a confined volume. The processors are “embodied” meaning that each of them has its own source of energy, has a “body” equipped with various sensors and communication means and has a computational control part. Initially, the processors have no identifiers and from the technological reasons, in the interest of their maximal simplicity, their computational, communication, sensory and locomotion (if any) parts are reduced to an absolute minimum. The processors communicate wirelessly, e.g., in an airborne medium they communicate via a short-range radio, acoustically or optically and in a waterborne medium via molecular communication. In the extreme cases the computational part of the processors can be simplified down to probabilistic finite state automata or even combinatorial circuits and the system as a whole can still be made universally programmable. From the theoretical point of view the structure and the properties of the amorphous systems qualify them among the simplest (non-uniform) universal computational devices. From the practical viewpoint, once technology will enable a mass production of the required processors a host of new applications so far inaccessible to classical approaches to computing will follow.
Keywords:
amorphous computing; computational universality; computational complexity
Available on request at various institutes of the ASCR
Inconspicuous Appeal of Amorphous Computing Systems
Amorphous computing systems typically consist of myriads of tiny simple processors that are randomly distributed at fixed positions or move randomly in a confined volume. The processors are “embodied” ...
NRGL provides central access to information on grey literature produced in the Czech Republic in the fields of science, research and education. You can find more information about grey literature and NRGL at service web
Send your suggestions and comments to nusl@techlib.cz
Provider
Other bases