[Python] 如何證明兩組樣本有顯著性差異(t-test假設檢驗)

來源:互聯網
上載者:User

原文地址:http://blog.csdn.net/ariesjzj/article/details/7546930

 

現有兩組樣本資料,假如它們分別基於兩套不同的方法,或者測於不同的裝置,又或是出自兩個人之手,如何證明它們有或沒有顯著性差別呢?當然可以拿個Excel表把資料畫個圖,然後找一堆人來投票,看覺得差不多還是覺得差得多的人哪方票數高。但終歸這種做法有些主觀,不夠說明力。機率統計給出了一種更為客觀的基於統計的方法,這裡是一個Python的實現:

#!/usr/bin/python# Paired difference hypothesis testing - Student's t-test# Author:       Jin Zhuojun# History:      Create      Tue May  8 16:12:21 CST 2012import stringimport mathimport sysfrom scipy.stats import  timport matplotlib.pyplot as pltimport numpy as np############### Parameters ###############ver = 1verbose = 0alpha = 0.05def usage():    print """    usage: ./program data_file(one sample in one line)    """def main():    ##########    # Sample #    ##########    if (len(sys.argv) < 2):        usage()        sys.exit()    # f = open('./data.txt')    # f = open('./test.txt')    f = open(sys.argv[1])    try:        sample1_text = f.readline()        sample2_text = f.readline()    finally:        f.close()    if (verbose):        print("sample1 text: ", sample1_text)        print("sample2 text: ", sample2_text)    sample1_text_split = sample1_text.split()    sample2_text_split = sample2_text.split()    if (verbose):        print(sample1_text_split)        print(sample2_text_split)        print("len = ", len(sample1_text_split))    assert(len(sample1_text_split) == len(sample2_text_split))    sample_len = len(sample1_text_split)    sample1 = []    sample2 = []    for i in range(sample_len):        sample1.append(string.atof(sample1_text_split[i]))        sample2.append(string.atof(sample2_text_split[i]))    sample_diff = []    for i in range(sample_len):        sample_diff.append(sample1[i] - sample2[i])    if (verbose):        print("sample_diff = ", sample_diff)    ######################    # Hypothesis testing #    ######################    sample = sample_diff    numargs = t.numargs    [ df ] = [sample_len - 1,] * numargs    if (verbose):        print("df(degree of freedom, student's t distribution parameter) = ", df)    sample_mean = np.mean(sample)    sample_std = np.std(sample, dtype=np.float64, ddof=1)    if (verbose):        print("mean = %f, std = %f" % (sample_mean, sample_std))    abs_t = math.fabs( sample_mean / (sample_std / math.sqrt(sample_len)) )    if (verbose):        print("t = ", abs_t)    t_alpha_percentile = t.ppf(1 - alpha / 2, df)    if (verbose):        print("abs_t = ", abs_t)        print("t_alpha_percentile = ", t_alpha_percentile)    if (abs_t >= t_alpha_percentile):        print "REJECT the null hypothesis"    else:        print "ACCEPT the null hypothesis"    ########    # Plot #    ########    rv = t(df)    limit = np.minimum(rv.dist.b, 5)    x = np.linspace(-1 * limit, limit)    h = plt.plot(x, rv.pdf(x))    plt.xlabel('x')    plt.ylabel('t(x)')    plt.title('Difference significance test')    plt.grid(True)    plt.axvline(x = t_alpha_percentile, ymin = 0, ymax = 0.095,             linewidth=2, color='r')    plt.axvline(x = abs_t, ymin = 0, ymax = 0.6,             linewidth=2, color='g')    plt.annotate(r'(1 - $\alpha$ / 2) percentile', xy = (t_alpha_percentile, 0.05),            xytext=(t_alpha_percentile + 0.5, 0.09), arrowprops=dict(facecolor = 'black', shrink = 0.05),)    plt.annotate('t value', xy = (abs_t, 0.26),            xytext=(abs_t + 0.5, 0.30), arrowprops=dict(facecolor = 'black', shrink = 0.05),)    leg = plt.legend(('Student\'s t distribution', r'(1 - $\alpha$ / 2) percentile', 't value'),             'upper left', shadow = True)    frame = leg.get_frame()    frame.set_facecolor('0.80')    for i in leg.get_texts():        i.set_fontsize('small')    for l in leg.get_lines():        l.set_linewidth(1.5)    normalized_sample = [0] * sample_len     for i in range(0, sample_len):        normalized_sample[i] = (sample[i] - sample_mean) / (sample_std / math.sqrt(sample_len))    plt.plot(normalized_sample, [0] * len(normalized_sample), 'ro')    plt.show()if __name__ == "__main__":    main()

註:因為matplotlib顯示時座標比例老是不對,所以畫圖的時候把座標寫死了。

把上面的指令碼存為main.py,再如資料檔案為data.txt(每行表示一組樣本):

0.26    0.30    0.40    0.49    0.43    0.84    1.18    0.25    0.67
0.18    0.33    0.57    0.32    0.58    0.29    0.98    0.17    0.79

運行:

$ ./main.py data.txt

得到

這裡假設null hypothesis為兩組資料無顯著性差異,alternative hypothesis為有顯著性差異。,t value落在了拒絕域外,因此我們拒絕alternative hypothesis,接受null hypothesis,即兩組資料無顯著性差異。

 

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.