讀取網頁(3)

WebClient 畢竟還是有一些限制,例如 Cookie,這就沒辦法了,你必須要自行操作 Header。
可是 WebClient 並沒有任何成員可以讓你實現這塊。

而 WebClient 的底層,其實是用 WebRequest 來實現的。這裡的例子使用了
HttpWebRequest,但其實 HttpWebRequest 也是繼承自 WebRequest。
要加上 Cookie 的話,你得指定 HttpWebRequest 的 CookieContainer 屬性。

所以我們很快就可以寫出 wgetInWebRequest()。
using System;
using System.Text;
using System.Text.RegularExpressions;
using System.Net;
using System.Web;
using System.IO;
using System.Diagnostics;
using System.Collections;

public class Network
{
    public static string wgetInWebRequest( string url, CookieContainer cookies, Encoding encoding )
    {
        string responseData = "";
            
        try
        {
            HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create( url );

            // 加上 Cookie
            webRequest.CookieContainer = cookies;

            // 如果你有 Proxy 的話~
            // webRequest.Proxy = new WebProxy( "your_proxy", 3128 );

            // 加上 User Agent,用來模擬瀏覽器~
            //webRequest.Headers.Add( "User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)");

            StreamReader responseReader;
            responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream(), encoding );

            // and read the response
            responseData = responseReader.ReadToEnd();
        }
        catch( Exception ex )
        {
            Debug.WriteLine( ex.ToString() );
        }
        finally
        {
            Debug.WriteLine( responseData );
        }
        return responseData;
    }
}
我想,也許你會問這有什麼用?
一般網站應用系統登入以後,都會設置 Cookie 以表示你已經登入了,換句話說,你只要先設法模擬登入,然後取得 WebResponse 所得到的 Cookie,接著再把得到的 Cookie 放到 WebRequest.CookieContainer,你接下來所有對該網站應用程式的 Request 就已經是有特定使用者身份的了~
讓我舉個 ASP.Net 網站的例子吧~
    public static void Login( string user, string pwd)
    {
        string loginPage = wgetInWebRequest( "http://your_web_app/login.aspx", null, Encoding.Default );

        // ASP.Net 會在頁面埋一個 __VIEWSTATE 隱藏表單變數,先取得!
        Regex rx = new Regex(@"\<input\ type=""hidden""\ name=""__VIEWSTATE""\ value=""(?<viewstate>.+)""\ /\>");
        string viewstate = "";

        try {
            // Find matches.
            MatchCollection matches = rx.Matches( loginPage );
                
            if( matches.Count == 1 )
            {
                // 要作 UrlEncode
                viewstate = HttpUtility.UrlEncode( matches[0].Groups["viewstate"].Value );

                // 用來收 cookie 的容器
                CookieContainer cookies = new CookieContainer();

                // now post to the login form
                HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create( "http://your_web_app/login.aspx" );

                // 模擬瀏覽器
                //webRequest.Headers.Add( "User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)");

                // POST
                webRequest.Method = "POST";
                webRequest.ContentType = "application/x-www-form-urlencoded";

                // 收到的 cookies 會放到 cookies 變數
                webRequest.CookieContainer = cookies;

                // 如果你有 proxy 的話
                // webRequest.Proxy = new WebProxy( "your_proxy", 3128 );

                // 這邊要看 login 表單有哪些變數而定,請自行變化
                string postData = string.Format( "__VIEWSTATE={0}&user={1}&password={2}&Submit=Submit", viewstate, user, pwd );
                StreamWriter requestWriter = new StreamWriter(webRequest.GetRequestStream());
                requestWriter.Write(postData);
                requestWriter.Close();

                // 收到內容啦,但是我們不需要他的結果,只要 cookies
                webRequest.GetResponse().Close();
                    
                // now we can send out cookie along with a request for the protected page
                string responseData = wgetInWebRequest( "http://your_web_app/default.aspx", cookies, Encoding.Default );
                Debug.WriteLine( responseData );
            }
            else
                Debug.WriteLine( "Internal error, too many ViewState." );
        }
        catch( Exception ex ) {
            Debug.WriteLine( ex.ToString() );
        }
        finally {
        }
    }

大致上就是這樣子,我不作太多的說明囉~
想要作更多的話,可以再研究HTTP Protocal並配合Sniffer之類的軟體去監看網路封包,來了解詳細的流程。