http://www.html5rocks.com/en/tutorials/internals/howbrowserswork/#Parser_Lexer_combination
Grammars
Parsing is based on the syntax rules the document obeys:the language or format it were written in. Every format you can parse must has deterministic grammar consisting of vocabulary and syntax rules. It's called a Context free grammar. Human languages is not such languages and therefore cannot is parsed with conventional parsing techniques.
Parser–lexer Combination
Parsing can separated into a sub processes:lexical analysis and syntax analysis.
Lexical analysis was the process of breaking the input into tokens. Tokens is the language vocabulary:the collection of valid building blocks. In human language it would consist of all the words that appear in the dictionary for that language.
Syntax analysis is the applying of the language Syntax rules.
Parsers usually divide the work between a components:the lexer (sometimes called tokenizer) that's Responsibl E for breaking the input to valid tokens, and the parser that's responsible for constructing the parse tree by Analyzing the document structure according to the language syntax rules.
The lexer knows how to strip irrelevant characters like white spaces and line breaks.
Figure:from source document to parse trees
The parsing process is iterative. The parser would usually ask the lexer for a new token and try to match the token with one of the syntax rules. If a rule is matched, a node corresponding to the token would be added to the parse tree and the parser would ask for anothe R token.
If No rule matches, the parser would store the token internally, and keep asking for tokens until a rule matching all the I Nternally stored tokens is found. If no rule is found then the parser would raise an exception. This means the document is not valid and contained syntax errors.
How Browsers work:behind The scenes of modern web Browsers