C # Console program gets the real URL, video title, and release time collection of the video Insoya video area.

Source: Internet
Author: User

Preparatory work

Cause this is because of this site: Http://i.youku.com/kmsfan This is a game called Adventure Island Information Forum, I used to be in the video, and now I do not play this game, but many players often go to my site to watch the video, so I feel a little embarrassed , I think it is better to develop an automatic upload and download tool, and not to delay my time. Some plugins need to be prepared, and these plugins can be found in nuget, such as Newtonsoft.json, Htmlagilitypack, but so far they have only been used. And I did not download YouTube video, I only did Daum TV video parsing, because most of the Insoya video area is uploaded to Daum TV video.

Structure analysis of Insoya video area

I do not put the code, we first have to the site of the file analysis have a general understanding of it? My idea is: To parse the day's video. Because Insoya is a Korean website, so the Korean and other think of the secret to do not spit groove. This is the address of the video area: HTTP://WWW.INSOYA.COM/BBS/ZBOARD.PHP?ID=UCC

You can point the finger to the link, you can find: The video ID is a self-growth. Like what:

http://www.insoya.com/bbs/zboard.php?id=ucc&no=58158

http://www.insoya.com/bbs/zboard.php?id=ucc&no=58157

http://www.insoya.com/bbs/zboard.php?id=ucc&no=58156 ... a little bit off ...

If we want to get the title of the video, URL, and release date, we can get it from the inside, and here's the box:

Need to know a few things

First, we get the real address of the video from Daum TV, the specific can refer to Youku, because a bit difficult, the video address is usually just give the ID, so we must also get the real address: But StackOverflow above I got the answer, haha thanks to this man. The following is the call API for DAUMTV, and the subsequent vid is the ID of the video.

        public static string Daumapi = "Http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?vid=" ;

The light has the API not, because we are from the Inosya website to get the real address of the video, so we also need to have a parsing tool is htmlagilitypack. For this we need to know how the site stores these things in HTML nodes (headers, URLs, etc.). Here are some pictures to help you with your doubts. The following image points to the URL, we think, if these IDs are continuous, then I am not on the first page to get a maximum ID, the other ID is not available?

The following figure is the content page, we can see the time is stored in this class inside, maybe we have questions, why I want to label class it, hey, don't worry, wait for you to know.

Code 1: Get the real address of the file

Because we are from the Insoya website to obtain, Insoya use is the IFRAME way, so we did not in DAUMTV inside to get. My idea is to get the real page address of Daum TV.

 Public Static stringGetdownloadurls (stringVideourl) {            stringvid =""; if(Videourl = =NULL)            {                Throw NewArgumentNullException ("video Url can not be null"); }            //is the standard DAUMTV link and outputs the link address and video ID            BOOLIsdaumurl = Trynormalizedaumtvurl (Videourl, outVideourl, outvid); if(!Isdaumurl) {                Throw NewArgumentException ("It is a Daum Url"); }            stringVlink =""; Try            {                //get the JSON returned via the Daum API                varJSON = Loadjson (Daumapi +vid); //get the real address of the file via JSON. Vlink =Getvidelink (JSON); }            Catch(Exception ex) {}returnVlink; }
getdownloadurls (get download URL)
/// <summary>        ///convert the video address obtained from insoya.com to the standard Daum TV video address/// </summary>        /// <param name= "url" ></param>        /// <param name= "Normalizeurl" ></param>        /// <returns></returns>         Public Static BOOLTrynormalizedaumtvurl (stringUrl out stringNormalizeurl, out stringvId) {URL=URL.            Trim (); stringVideoID =""; if(URL.) IndexOf ("Videofarm") != -1)            {                intFirstparam = URL. IndexOf ('='); intSecondparam = URL. IndexOf ('&')-1; VideoID= URL. Substring (Firstparam +1, Secondparam-Firstparam); URL="http://tvpot.daum.net/v/"+VideoID; } vId=VideoID; Normalizeurl=URL; return true; }
trynormalizedaumtvurl (convert Insoya embedded video address to standard Daum TV video address)
        Private StaticJobject Loadjson (stringURL) {                      stringPagesource =httphelper.downloadstring (URL); if(!Isvideovalid (Pagesource)) {                Throw NewException ("Video Not valid"); }            returnJobject.parse (Pagesource); }     Public classHttphelper { Public Static stringDownloadstring (stringURL) {            using(varClient =NewWebClient ()) {Client. Encoding=System.Text.Encoding.UTF8; returnclient.            downloadstring (URL); }                }    }
Loadjson (Download the returned JSON as a string)Code 2: Parsing HTML nodes and storing

First we need to get the HTML node data and store it. My idea is to go in from the main page, that is: HTTP://WWW.INSOYA.COM/BBS/ZBOARD.PHP?ID=UCC and then the largest ID, so that the first page of everything can be taken to O (╯-╰) O. , with the htmlagilitypack have not understood their own Baidu bar.

 Private Static intGetmaxidofvideo () {htmlweb docweb=NewHtmlweb (); HTMLDocument Doc= Docweb.load ("http://www.insoya.com/bbs/zboard.php?id=ucc&page=1&divpage=12"); IList<int> b =Newlist<int>(); foreach(Htmlnode numbersinchDoc. Documentnode.descendants ("TD"). Where (D=>d.attributes.contains ("class") &&d.attributes["class"]. Value.contains ("Eng W_num")))            {               intA; int. TryParse (Numbers. InnerText, outa); if(A! =0) {B.add (a); }            }            returnB.max (); }
Getmaxidofvideo (get the ID of the latest video on the video page)

The following is the most important, is to crawl the video title, video URL, etc., of course, we need to create a model and then return the model of the list.

     Public classVideomodel { Public intVideoID {Get;Set; }  Public stringVideourl {Get;Set; }  Public stringTitle {Get;Set; }  PublicDateTime Pubtime {Get;Set; } }
Videomodel (Video entity model)
 Public StaticIlist<videomodel>Grabvideoinfo () {intmax=Getmaxidofvideo (); intStaticmax =Max; IList<VideoModel> models =NewList<videomodel>();  Do{Videomodel model=NewVideomodel (); Htmlweb Innerdocweb=NewHtmlweb (); HTMLDocument Innerdoc= Innerdocweb.load (INSOYAUCC +max); //title                foreach(Htmlnode titleinchInnerDoc.DocumentNode.Descendants ("a"). Where (d =D.attributes.contains ("name") && d.attributes["name"]. Value.contains ("pv9")) {model. Title=title.                    InnerText;  Break; }                //Date                foreach(Htmlnode titleinchInnerDoc.DocumentNode.Descendants ("span"). Where (d =D.attributes.contains ("class") && d.attributes["class"]. value=="Eng") {DateTime date; Datetime.tryparse (title. InnerText, outdate); if(Date! =NULL) {model. Pubtime=date;  Break; }                                    }                //Video Address                foreach(Htmlnode titleinchInnerDoc.DocumentNode.Descendants ("iframe"). Where (d =D.attributes.contains ("title") && d.attributes["title"]. Value.contains ("MAPLESTORY_UCC")))                {                   stringOldurl = title. attributes["src"].                   Value; stringNewurl =Getdownloadurls (Oldurl); Model. Videourl=Newurl;  Break; } model. VideoID=Max; Models.                ADD (model); Console.WriteLine ("ID:"+max +"Has accomplish grabbed!"); --Max; }  while(Max >= Staticmax- -); returnmodels; }
grabvideoinfo (Get video model)

Well, so far we can grab the video model.

Work that has not been completed

Of course, only completed 30%, but also to automatically upload video to Youku, and to the Korean translation into the most appropriate Chinese, if I do come out, will and everyone to share, first is to upload Youku. Interested enthusiasts can look at the Youku open platform.

C # Console program gets the real URL, video title, and release time collection of the video Insoya video area.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.