Product Code Database
Example Keywords: xbox -jewel $25
   » » Wiki: Winsorizing
Tag Wiki 'Winsorizing'.
Tag

Related Products

Tandberg Data 1019777 Data Transfer Cable SAS Fan out Cable 6.56 ft 4 x SFF 8088 Mini SAS 4 x SFF 8088 Mini SAS

Modern businesses and workgroups face the challenge of finding a robust storage solution that can scale with the exponential increase in digital content their organization creates. #160;From shared documents, old archived files and large rich media files, ..

It also includes the detailed coverage of whole Bristol Channel. With BlueChart g2 card, you'll have access to detailed mapping capabilities which include standardized depth contours, smooth data transition between zoom levels, harmonious transition across..

This book has the step-by-step guidance you need to learn how to use your phone''s many features and functions. Newly updated to cover both the latest features you''ll find on the 6s and 6s Plus as well as perennial iPhone features that..

Data Quality Requirements Analysis And Modeling

Featuring a crisp, QVGA screen and running the latest Windows core operating system, the WDT3200 portable data terminal can handle today's inventory applications and is ready for future programming needs.

Get time-saving tips on data entry, formatting, and more Find out how to put worksheets on the Web Get up and running fast even if you don't know a spreadsheet from a bedsheet Excel 97 can help you keep records, crunch numbers, and track trends once you ge..

Winsorizing or winsorization is the transformation of by limiting in the statistical data to reduce the effect of possibly spurious . It is named after the engineer-turned-biostatistician (1895–1951). The effect is the same as clipping in signal processing.

The distribution of many statistics can be heavily influenced by outliers, values that are 'way outside' the bulk of the data. A typical strategy to account for, without eliminating altogether, these outlier values is to 'reset' outliers to a specified (or an upper and lower percentile) of the data. For example, a 90% winsorization would see all data below the 5th percentile set to the 5th percentile, and all data above the 95th percentile set to the 95th percentile. Winsorized are usually more robust to outliers than their more standard forms, although there are alternatives, such as trimming (see below), that will achieve a similar effect.


Example
Consider a simple data set consisting of:
{92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, 101, 86, 85, 15, 89, 89, 28, −5, 41}
(N = 20, mean = 101.5)
The data below the 5th percentile lie between −40 and −5 inclusive, while the data above the 95th percentile lie between 101 and 1053 inclusive (pertinent values are shown in bold). Winsorization effectively resets the outlier values to the values of the data at the 5th and 95th percentiles. Accordingly, a 90% winsorization would result in the following data set:
{92, 19, 101, 58, 101, 91, 26, 78, 10, 13, −5, 101, 86, 85, 15, 89, 89, 28, −5, 41}
(N = 20, mean = 55.65)

After winsorization the mean has dropped to nearly half its previous value, and is consequently more in line or congruent with the data set from which it is calculated.


Explanation, and distinction from trimming/truncation
Note that winsorizing is not equivalent to simply excluding data, which is a simpler procedure, called trimming or truncation, but is a method of censoring data.

In a trimmed estimator, the extreme values are discarded; in a winsorized estimator, the extreme values are instead replaced by certain percentiles (the trimmed minimum and maximum).

Thus a is not the same as a . For instance, the 10% trimmed mean is the average of the 5th to 95th percentile of the data, while the 90% winsorized mean sets the bottom 5% to the 5th percentile, the top 5% to the 95th percentile, and then averages the data. Winsorizing thus does not change the total number of values in the data set, N. In the example given above, the trimmed mean would be obtained from the smaller (truncated) set:

{92, 19, 101, 58, 91, 26, 78, 10, 13, 101, 86, 85, 15, 89, 89, 28, −5, 41}
(N = 18, trimmed mean = 56.5)

In this case, the winsorized mean can equivalently be expressed as a of the 5th percentile, the truncated mean, and the 95th percentile (for this case of a 10% winsorized mean: 0.05 times the 5th percentile, 0.9 times the 10% trimmed mean, and 0.05 times the 95th percentile). However, in general, winsorized statistics need not be expressible in terms of the corresponding trimmed statistic.

More formally, they are distinct because the are not independent.


Uses
Winsorization is used in the survey methodology context in order to "trim" extreme survey non-response weights. It is also used in the construction of some when looking at the range of certain factors (for example growth and value) for particular stocks.


Coding methods
Python can winsorize data using library: import numpy as np from scipy.stats.mstats import winsorize winsorize(np.array(92,), limits=0.05,)

R can winsorize data using the DescTools package:Andri Signorell et al. (2021). DescTools: Tools for descriptive statistics. R package version 0.99.41. library(DescTools) a<-c(92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, -40, 101, 86, 85, 15, 89, 89, 28, -5, 41) DescTools::Winsorize(a, probs = c(0.05, 0.95))


See also


External links
Page 1 of 1
1
Post Comment
Font Size...
Font Family...
Font Format...

Page 1 of 1
1

Account

Social:
Pages:  ..   .. 
Items:  .. 

Navigation

General: Atom Feed Atom Feed  .. 
Help:  ..   .. 
Category:  ..   .. 
Media:  ..   .. 
Posts:  ..   ..   .. 

Statistics

Page:  .. 
Summary:  .. 
1 Tags
10/10 Page Rank
5 Page Refs