Python對srt的解析

來源:互聯網
上載者:User

最過在看Desperate_Housewives_-_Season_1,奇藝上只有中文字幕,對於我等希望練習英語聽力的人來講是一大缺憾。網上遍尋不到合適的工具來顯示外掛字幕。正好最近在學習Python,於是心想求人不如求已,自已動手做一個得了。

凡事得有步驟,我的構想如下:
1. 分析SRT格式檔案;
2. 提取時間資訊和要顯示的字元,此為最重要的部分,最好的方式是調用Python的Regex來提取相關的資訊;
3. 調用pyosd顯示,類似於QQ音樂播放器的歌詞顯示功能;

關於SRT的說明,可以參考http://en.wikipedia.org/wiki/SubRip。不過因為工作中經常接觸外掛字幕,所以對於SRT也有一定的瞭解.
The SubRip file format is "perhaps the most basic of all subtitle formats."[10] SubRip files are named with the extension .srt, and contain formatted plain text. The time format used is hours:minutes:seconds,milliseconds. The decimal separator used is the comma, since the program was written in France. The line break used is often the CR+LF pair. Subtitles are numbered sequentially, starting at 1.

    Subtitle number          //相當於index,標記subtitle的序號
    Start time --> End time        //開始與結束時間,duration可以據此計算出來  
    Text of subtitle (one or more lines)     //字幕資訊
    Blank line[11][10]           //空白行

以下是實現的代碼,很rough, 我還在修改中,只是實現了部分功能:

import reimport pyosdimport sysimport getoptimport timeclass srtParsing():    index = 0    #hour minute sec = 0    duration = 0    print time.time()        def srtGetIndex(self, line):reg = re.compile('\d')if(reg.search(line)):    print line    def srtGetTimeStamp(self, line):reg = re.compile('\-\-\>')p = pyosd.osd()if(reg.search(line)):    print line    time = line.split('-->')    #START TIME:    hour_end = time[1].split(':')    minute_end = int(hour_end[1])    sec_end = hour_end[2].split(',')    hour_end = int(hour_end[0])    mis_end = int(sec_end[1])    sec_end = int(sec_end[0])    print "end-->h:%d m:%d s:%d,mis:%d" %(hour_end, minute_end, sec_end, mis_end)    #END TIME:    hour_start = time[0].split(':')    minute_start = int(hour_start[1])    sec_start = hour_start[2].split(',')    hour_start = int(hour_start[0])    mis_start = sec_start[1]    sec_start = int(sec_start[0])    time_start = hour_start * 60 * 60 + minute_start * 60 + sec_start    print "start time :%d" %time_start    time_end = hour_end * 60 * 60 + minute_end * 60 + sec_end    print "end time:%d" %time_end    duration = time_end - time_start    print duration    p.set_timeout(duration)    def srtGetSubInfo(self, line):reg = re.compile(r'^[a-zA-Z]')p = pyosd.osd()p.set_pos(pyosd.POS_BOT)p.set_colour("YELLOW")p.set_align(1)#p.set_shadow_offset(10)p.set_vertical_offset(100)if(reg.search(line)):    print line    p.display(line)    p.wait_until_no_display()if __name__ == "__main__":    srt = srtParsing()    f=open("/home/workspace/subtitle/src/dh.srt")    for line in f:        srt.srtGetTimeStamp(line)        srt.srtGetSubInfo(line)

  下一步工作是對時間的控制,需要從系統中擷取時間與標籤對比,從而精確控制顯示.

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.