

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、統(tǒng)計基礎與STATISTICA軟件,工程應用數(shù)學,2,Introduction,IntroductionThere are many aspects of science and engineering problems. Understanding and solving such problems often involves certain quantitative aspects, in particular the acqu
2、isition and analysis of data. Treating these quantitative problems effectively involves the use of statistics. Statistics can be viewed as the prescription for making the quantitative learning process effective.,3,The
3、Learning Process,The Learning Process (認知過程) An experiment is like a window through which we view nature. Our view is never perfect. The observations that we make are distorted. The imperfections that are included in ob
4、servations are “noise”. A statistically efficient design reveals the magnitude and characteristics of the noise. It increases the size and improves the clarity of the experimental window. Using a poor design is like see
5、ing blurred shadows behind the window curtains or, even worse, like looking out the wrong window.,4,The Learning Process,Learning is an iterative process,5,The Aim,Introduction to the general kind of engineering problem
6、and the statistical concepts and methods to be discussed.Case Study introduces a specific example, including actual data.Analysis shows how the data suggest and influence the method of analysis and gives the solution.
7、Many solutions are stepped in detail, and results shown. The problems were solved using available computer programs (e.g., STATISTICA、SAS、SPSS、S-PLUS、MINITAB etc.).,6,Definitions and Basic Concepts,Population(總體) and Sam
8、ple(樣本)The sample is a group of n observations actually available. A population is a very large set of N observations (or data values) from which the sample of n observations can be imagined to have come.Random Variab
9、le(隨機變量)“the value of the next observation in an experiment.” “A random variable is the soul of an observation” and the converse, “An observation is the birth of a random variable.”Experimental Errors(實驗誤差)A guiding
10、principle of statistics is that any quantitative result should be reported with an accompanying estimate of its error. Replicated observations of some physical, chemical, or biological characteristic that has the true va
11、lue ηwill not be identical although the analyst has tried to make the experimental conditions as identical as possible.,7,Definitions and Basic Concepts,Experimental Errors(實驗誤差)This relation between the true value η an
12、d the observed (measured) value yi is yi = η+ei , where ei is an error or disturbance.Error, experimental error, and noise refer to the fluctuation or discrepancy in replicate observations from one experiment to another
13、. In the statistical context, error does not imply fault, mistake, or blunder. It refers to variation that is often unavoidable resulting from such factors as measurement fluctuations due to instrument condition, sampli
14、ng imperfections, variations in ambient conditions, skill of personnel, and many other factors. Such variation always exists and, although in certain cases it may have been minimized, it should not be ignored entirely.,8
15、,Example,ExampleA laboratory’s measurement process was assessed by randomly inserting 27 specimens having a known concentration of η=8.0 mg/L into the normal flow of work over a period of 2 weeks.This arrangement means
16、 that observed values are random and independent. The results in order of observation were 6.9, 7.8, 8.9, 5.2, 7.7, 9.6, 8.7, 6.7, 4.8, 8.0, 10.1, 8.5, 6.5, 9.2, 7.4, 6.3, 5.6, 7.3, 8.3, 7.2, 7.5, 6.1, 9.4, 5.4, 7.6, 8.1
17、, and 7.9 mg/L.The population is all specimens having a known concentration of 8.0 mg/L. The sample is the 27 observations (measurements). The sample size is n=27. The random variable is the measured concentration in
18、 each specimen having a known concentration of 8.0 mg/L.Experimental error has caused the observed values to vary about the true value of 8.0 mg/L. The errors are 6.9 ? 8.0=?1.1, 7.8?8.0=?0.2,+0.9,?2.8,?0.3,+1.6,+0.7, a
19、nd so on.,9,Plotting Data,The most effective statistical techniques for analyzing data are graphical methods. They are useful in the initial stage for checking the quality of the data, highlighting interesting features
20、of the data, and generally suggesting what statistical analyses should be done. Graphical methods are useful again after intermediate quantitative analyses have been completed.And again in the final stage for providing
21、 complete and readily understood summaries of the main findings of investigationsThe first step in data analysis should be to plot the data. Graphing data should be an interactive experimental process. Do not expect yo
22、ur first graph to reveal all interesting aspects of the data. Make a variety of graphs to view the data in different ways.,10,Plotting Data,Plotting the Data may:1. reveal the answer so clearly that little more analysi
23、s is needed.2. point out properties of the data that would invalidate a particular statistical analysis.3. reveal that the sample contains unusual observations4. save time in subsequent analyses.5. suggest an answer
24、that you had not expected.6. keep you from doing something foolish.,11,Plotting Data,The time spent making some different plots almost always rewards the effort. Many top-notch statisticians like to plot data by hand, b
25、elieving that the physical work of the hand stimulates the mind’s eye. Whether you adopt physical work method or use one of the many available computer programs (Origin PRO, SigmaPlot, Grapher, etc.), the goal is to fre
26、e your imagination by trying a variety of graphical forms. Keep in mind that some computer programs offer a restricted set of plots and thus could limit rather than expand the imagination.,12,Scatterplots and Statistic
27、al Plot,Scatterplots It has been estimated that 75% of the graphs used in science are scatterplots. Simple scatterplots are often made before any other data analysis is considered. The insights gained may lead to more e
28、legant and informative graphs, or suggest a promising model. Linear or nonlinear relations are easily seen. Showing Statistical Variation and PrecisionMeasurements vary and one important function of graphs is to show t
29、he variation. There are three very different ways of showing variation: a histogram, a box plot (or box-and-whisker plot), and with error bars that represent statistics such as standard deviations(標準偏差), standard errors,
30、 or confidence intervals(置信區(qū)間).A histogram shows the shape of the frequency distribution and the range of values,13,Plots of Residuals,Plots of ResidualsGraphing residuals is an important method that has applications i
31、n all areas of data analysis and model building. Residuals are the difference between the observed values and the smooth curve constructed from a model of the data. If the model fits the data, the residuals represent t
32、he measurement error. Measurement error is usually assumed to be random. A lack of randomness in the residuals therefore indicates some weakness in the fitted model.,14,Plots of Residuals –Example,The visual impression i
33、n the top panel in Figure is that the curve fits the data fairly well but the vertical deviations of points from the fitted curve are smaller for low values of time than for longer times. The graph of residuals in the b
34、ottom plot shows the opposite is true. The curve does not fit well at the shorter times and in this region the residuals are large and predominantly positive,15,Plots of Residuals,This process of plotting residuals flatt
35、ening the data. It shifts our attention from the fitted line to the discrepancies between prediction and observation. It is these discrepancies that contain the information needed to improve the model. Make it a habit t
36、o examine the residuals of a fitted model, including deviations from a simple mean. Check for normality by making a dot diagram or histogram. Plot the residuals against the predicted values, against the predictor varia
37、bles, and as a function of the time order in which the measurements were made. Residuals that appear to be random and to have uniform variance are persuasive evidence that the model has no serious deficiencies. If the
38、residuals show a trend, it is evidence that the model is inadequate. If the residuals spread out, it suggests that a data transformation is probably needed.,16,Plots of Residuals – Another Example,Left Figure is a calibr
39、ation curve(標準曲線) for measuring chloride using an ion chromatograph. There are three replicate measures at each concentration level. The hidden variation of the replicates is revealed in Right Figure, which has flattened
40、 the data by looking at deviations from the average of the three values at each level. An important fact is revealed: the measurement error tends to increase as the concentration increases. This must be taken into accou
41、nt when fitting the calibration curve to the data.,17,A Note on Clarity and Style of Plot,TufteClarity(清楚) Simplicity (簡潔)ClevelandClarity (清楚)Precision (精確)Efficiency (有效)WainerElegance (典雅)Grace (優(yōu)雅)Impact (效
42、果),William Playfair (1786), a pioneer and innovator in the use of statistical graphics, desires to tell a story graphically as well as dramatically.,18,Should We Always Plot the Data?,Example. five values:pH = 5, COD=23
43、00mg/L, BOD=1500mg/L, TSS=875mg/L, TDS=5700mg/LThese five values say it all, and better than the graph. Do not use an axe to hack your way through an open door.Aside from being unnecessary, this chart has three major f
44、aults.,It confuses units - pH is not measured in mg/L. Three-dimensional effects make it more difficult to read the numerical values. Using a log scale makes the values seem nearly the same when they are much differen
45、t. The 875 mg/L TSS and the 1500 mg/L COD have bars that are nearly the same height.,19,STATISTICA的統(tǒng)計分析功能,,,20,STATISTICA的統(tǒng)計分析功能,1、Basic Statistics and Tables(基本統(tǒng)計和表格分析)包括描述性統(tǒng)計,相關性分析,獨立或非獨立樣本的t檢驗,頻數(shù)統(tǒng)計表,概率計算及其他差異顯著性檢驗(兩
46、個均值或百分率的檢驗)等。這是最基本的統(tǒng)計分析項目,也是用的最多的統(tǒng)計分析項目,一般簡單的統(tǒng)計分析靠它就可以圓滿解決問題。 2、Multiple Regression(多元回歸分析)逐步回歸分析,固定非線性分析,殘差分析和基于回歸模型的預測等。如果您要調查研究人的智商是否與吃魚和吃豆腐有關,就可以用回歸法來分析。 3、ANOVA/MANOVA(方差分析)有單因素和多因素方差分析、協(xié)方差分析和重復測量方差分析等。兩個以上樣本平均
47、數(shù)差異的顯著性檢驗,就可利用方差分析。如:比較幾種教學方法哪一種對學習成績提高最快,比較幾種牌號汽油的行程率等等。,21,STATISTICA的統(tǒng)計分析功能,4、Nonparametrics/Distribution(非參數(shù)性統(tǒng)計分析)包括Chi-square卡方檢驗,Kolmogorov-smirnov檢驗,Wilcoxon配對符號等級檢驗,兩個獨立樣本Mann-Whitney檢驗,多個相關樣本Cochran Q檢驗和多個獨立樣本
48、Kruskal-Wallis檢驗等等。 5、分布擬合(Distribution Fitting)對連續(xù)性分布進行擬合,如正態(tài)分布、均勻分布等。6、高級線性/非線性模型(Advanced Linear/Nonlinear Models)包含各種線性和非線性模型化分析功能。如Nonlinear Estimation(非線性估計):包括一般非線性模型,逐步Logit分析,最大似然估計等。 7、工業(yè)統(tǒng)計與6-σ(Industrial
49、Statistics & Six-Sigma)包括質量控制、過程分析、實驗設計、6-σ分析,22,STATISTICA的統(tǒng)計分析功能,8、多元分析(Multivariate Exploratory Analysis):(1)、Cluster Analysis(聚類分析):包括K-Means聚類,雙邊聯(lián)合聚類等。聚類分析實質上是尋找一種能客觀反映元素之間親疏關系的統(tǒng)計量,然后根據(jù)這種統(tǒng)計量把元素分成若干類,是物以類聚的一種統(tǒng)計
50、分析方法。(2)、Factor Analysis(因子分析):初始因子模型、旋轉因子模型等。例如,學生的各科成績受智力、計算能力、表達能力和靈活性等因子的影響,雖然可以通過考試或檢查等手段獲得學生的各科成績,但那些對各科成績起支配作用的因子的狀態(tài)不能直接測定到,這時候因子分析就派上用場了。(3)、Canonical Analysis(典型分析):典型相關性分析,典型因子協(xié)效應分析。主要用于研究兩組多變量之間的相關性。(4)、Mul
51、tidimensional Scaling(多維尺度分析):多維距離或相似性估計等。(5)、Reliability/Item Analysis(信度/項目分析):包括trachoric相關性分析,Crobach α系數(shù),分半(split-h(huán)alf)信度分析等。假如希望在任何時間、地點、對任何人,都有可靠的交通工具,測試交通工具手段的可靠性顯然是需要的。(6)、Discriminant Analysis(判別分析):逐步判別法,分類統(tǒng)
52、計等。判別分析的任務是根據(jù)已掌握的一批分類明確的樣品,建立較好的判別函數(shù),使產(chǎn)生錯判的事例最少,進而對給定的一個新樣品,判斷它來自哪個總體。如在環(huán)境檢測中,根據(jù)對某地區(qū)的環(huán)境污染的綜合測定結果判斷該地區(qū)屬于哪一種污染類型等。 9、數(shù)據(jù)挖掘技術(Data Mining),分類樹等技術,神經(jīng)網(wǎng)絡。10、分布計算器(Probability Calculator)。,23,STATISTICA軟件的圖形界面,,24,STATISTICA的基
53、本操作過程,(1)數(shù)據(jù)的輸入,主要通過SpreadSheet Window(數(shù)據(jù)編輯窗口)完成。 其結構類似于Excel的工作表,缺省的數(shù)據(jù)表是10×10的單元格集,可以更改變量(Variable)或觀測值(Case)的數(shù)量。要注意的是,由于空的單元格要按缺省值計算,故要刪除不需要的Case。Variable和Case的刪除可以通過EDIT菜單的DELETE命令執(zhí)行,Variable和Case的增加則通過Format菜單上的
54、Variables和Cases命令執(zhí)行。,STATISTICA可以打開的文件類型包括Excel, dBASE, SPSS, Lotus/Quattro Worksheets等程序產(chǎn)生的文件和擴展名為txt, csv, htm, rtf等文本格式,并以STATISTICA數(shù)據(jù)文件的格式保存。,25,STATISTICA的基本操作過程,(2)選擇功能模塊,主要通過Statistics菜單中的命令來完成。,,,26,STATISTICA的基本
55、操作過程,(3)定義分析方法,選擇分析數(shù)據(jù)的自變量和因變量。,27,STATISTICA的基本操作過程,(4)顯示分析結果。Stattistics的分析結果的默認輸出方式是Workbook窗口,包括表格和圖形,分析結果的另外一種輸出方式是Report方式,,28,Report方式的選項,“File”菜單的“Outpur Manager …”命令,,29,Report窗口,,30,應用實例1——描述性統(tǒng)計,1、描述性統(tǒng)計(Descrip
56、tive statistics)描述性統(tǒng)計是統(tǒng)計的基礎。其任務是為每個統(tǒng)計變量提供基礎信息:平均值(mean)最小值與最大值(minimum and maximum values)測量值的變化( variation of measures ),也就是分布的形狀(shape of the distribution)標準偏差(standard deviation)標準誤差(standard error)(1)、平均值定義
57、平均值是最常用的統(tǒng)計描述量,它給出了變量的一種“趨向中心”的信息,當然是要在在滿足置信區(qū)間的條件下。置信區(qū)間是群體的“真實”平均值信息在我們可以接受的可信度范圍內的一個尺度。,,31,1、描述性統(tǒng)計,(1)、平均值例如:如果平均值為23,在p=0.05的置信區(qū)間的下限和上限分別為19和27,那么群體平均值大于19或小于27的可能性為95% 。如果p水平取一個較小的值(也就是降低可信度),那么置信區(qū)間會變寬,同時也增加了估計的可靠性。
58、如我們熟悉的天氣預報,減小p水平值,置信區(qū)間越寬,則預報也越模糊。需要注意的是:置信區(qū)間依賴于樣本的大小(sample size)和數(shù)據(jù)值的變化(variation of data values)。樣本越大,平均值越可靠;數(shù)據(jù)值變化越大,平均值的可信度越低。另外,置信區(qū)間的計算假設群體變量是隨機的,并服從正態(tài)分布。如果這個假設不滿足,那么即使樣本足夠大,估計值也是無效的。,32,1、描述性統(tǒng)計,(2)分布的形狀(shape of
59、 the distribution)描述統(tǒng)計變量的另一個重要方面就是分布的形狀,它表達了變量的值在不同變化范圍的頻率,并采用柱狀圖描繪這個分布的頻率。通常研究人員感興趣的是將這個柱狀圖與正態(tài)分布圖進行比較來判斷。,柱狀圖可以檢驗分布質量,例如,分布是雙峰的(有兩個頂點),這可能是由于樣本是不均勻的,他可能來自兩個不同的群體,一個更接近正態(tài)分布,一個則要差一些。這種情況下,需分別對兩個子樣本進行分析。,33,2、相關性(Correlat
60、ion),相關性是兩個或多個變量之間的聯(lián)系的一種度量,通過相關系數(shù)(correlation coefficients)處理不同類型的數(shù)據(jù)。最重要的一種相關性的是線性相關,也稱皮爾遜相關(Pearson r),是最廣泛使用的相關系數(shù)類型。,假設兩個變量在最小區(qū)間上進行測量,那么皮爾遜相關是指兩個變量之間的相互比例關系,這個比例值就是相關系數(shù)(r) 相關系數(shù)的變化范圍是從-1.00 到 +1.00。-1.00表示負相關性,+1.00表示正
61、相關性,0.00表示沒有相關性。,34,2、相關性(Correlation),皮爾遜相關系數(shù)(r)不依賴于特定的變量單位。例如,高度和重量就是可以用來進行相關性分析,而不管它們的單位為英寸和磅,還是厘米與公斤。,比例表示它們是線性的,可以用一條向上或向下的直線表示。這條線可以稱作回歸線或最小二乘線,也就是所有的點與直線的距離的平方和最小。尤其是距離的平方(r2)更是反映兩個變量的變化比例關系的重要結果。,35,3、t-檢驗——獨立樣本的
62、t-檢驗(t-test for Independent Samples),t-檢驗是評價兩個樣本的區(qū)別的最重要方法。例如,t-檢驗可以用來測試兩組患者使用不同治療方式取得效果的差異。理論上,即使樣本量很小,只要每一組樣本服從正態(tài)分布(怎么判斷?正態(tài)分布假設可以通過柱狀圖顯示的數(shù)據(jù)分布判斷,或者正態(tài)分布假設檢驗),就可以使用t-檢驗。在t檢驗結果中的p-水平表達了拒絕假設檢驗(兩組樣本觀測沒有區(qū)別)的可能性(概率)。 為了執(zhí)行獨立樣
63、本的t-檢驗,需要一個自變量(如下表中的“GENDER”)和至少一個因變量(如測試分數(shù)“WCC”)。自變量的平均值將被根據(jù)不同的組(如“male”和“female”)進行分別計算并作比較。 如果因變量有多個,則分別對每一個因變量作t-檢驗。,36,例1:描述性統(tǒng)計分析,問題描述:以STATISTICA自帶的“Adstudy.sta”數(shù)據(jù)文件說明該方法的使用,這是一個包含25個變量和50個測試數(shù)據(jù)的文件。該假想的問題是研究男性和女性對兩
64、個廣告的評價,假設針對每一個廣告的回答都是隨機的。變量1是性別(Gender: male, female),變量2是廣告(Advert: Coke,Pepsi?)。他們在23個不同的方面(Measure01 to Measure23)對不同的廣告分別作出評價,在每個方面在0~9的范圍內給出答案。,37,描述性統(tǒng)計,第一步:啟動STATISTICA軟件,打開位于“/Examples/Datasets”目錄下的數(shù)據(jù)文件“Adstudy.s
65、ta”。也可以從統(tǒng)計模塊中打開數(shù)據(jù)文件。 第二步:描述性統(tǒng)計(Descriptive Statistics) 在“Basic Statistics and Tables” 對話框中,選擇“Descriptive statistics”,對所有的變量進行描述性統(tǒng)計分析。,38,描述性統(tǒng)計分析結果,默認的在統(tǒng)計結果表格中包含有選擇變量的平均值(mean)、有效例數(shù)(valid N)、標準偏差(standard deviation
66、)、最小值和最大值(minimum and maximum)。,39,相關性分析,第三步:相關性分析: 單擊“Cancel”鍵返回“Basic Statistics and Tables”對話框,選擇“Correlation matrices”,單擊“OK”按鈕。或者雙擊“Correlation matrices”選項。,40,相關性分析,則顯示“Product-Moment and Partial Correlations”對
67、話框。,單擊“One variable list”按鈕,在變量選擇窗口中可以選擇一個、多個甚至所有的變量,在這里,單擊“Select all”選中所有變量。,然后單擊“Summary”按鈕進行相關性分析,顯示相關性分析結果的表格。,,41,相關性分析結果的表格,高亮顯示重要的相關性:默認的情況下,表格用不同的顏色顯示的結果是統(tǒng)計重要度p<.05的相關系數(shù)。,用戶可以設定高亮顯示相關系數(shù)的水平,相關系數(shù)的絕對值越大,參數(shù)間的相關性越
68、高。,相關系數(shù)為正,也就是正相關,否則為負相關。,42,相關性分析結果,設定統(tǒng)計重要度的方法是,再次選擇“Product-Moment and Partial Correlations”對話框,單擊”O(jiān)ptions”標簽,改變p-水平的值,例如,0.001。單擊“Summary”按鈕,則產(chǎn)生新的相關性分析結果表格,在所有結果中,滿足這個統(tǒng)計重要度的結果高亮顯示,可以容易的發(fā)現(xiàn)相關性最高的點,43,相關性分析結果,本例中Measure05
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 數(shù)學與應用數(shù)學 + 軟件工程
- 數(shù)學與應用數(shù)學+軟件工程
- sas軟件與統(tǒng)計應用論文
- matlab軟件與基礎數(shù)學實驗
- 軟件統(tǒng)計測試技術與應用.pdf
- 軟件工程基礎
- 統(tǒng)計應用數(shù)學
- 工程技術 基礎研究 成果應用 軟件工程與標準化 工程管理
- 1統(tǒng)計軟件與-多元統(tǒng)計總論
- spss統(tǒng)計軟件及應用課程論文
- 軟件工程的應用與實踐
- 統(tǒng)計基礎知識與統(tǒng)計實務
- 統(tǒng)計基礎知識與統(tǒng)計實務.
- 統(tǒng)計圖表應用及常用統(tǒng)計軟件的使用
- 軟件工程基礎知識
- --面向對象軟件工程基礎
- 軟件再工程理論應用與研究.pdf
- 工程量軟件的應用與研究
- 排列、組合、概率與統(tǒng)計——數(shù)學基礎知識與典型例題復習
- 統(tǒng)計基礎與實務答案
評論
0/150
提交評論