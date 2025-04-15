Lexical analysis

This is the first step of data parsing. It all starts when a string of raw, unstructured data enters the parser. Then, a lexer, also known as a scanner or tokenizer, transforms that stream of data into a sequence of tokens.

This step is lexical because the parser creates tokens by establishing meaningful lexical units, such as delimiters and keywords. At the same time, the parser discards lexically irrelevant information, such as parenthesis, comments, and white spaces.

Let’s say, the input character stream (2 + 8)^3 enters the parser. The lexer will split this stream into the following tokens: (, 2, +, 8,), ^, 3. Basically, lexical analysis is a process of token generation.

Syntactic analysis

The next stage is checking if the generated tokens make up an allowable, meaningful expression in accordance with pre-defined rules. Sounds super tricky? Keep on reading, we’ll try to make it clearer.

Here we need to introduce one more concept from computer sciences – context-free grammar. It’s a set of rules that specify the syntax of a language. Simply put, context-free grammar is about computer codes that define what is a valid sequence of tokens. It clearly defines what components can make up a valid expression in tokens and the order in which those components must go.

The result of syntactic analysis is a parse tree, consisting of branches with leaf nodes. Drawing a parse tree means drawing a hierarchical structure that shows which elements in the initial input are meaningful and what their role is.

In its simplest form, lexical analysis creates tokens, while syntactic analysis draws trees from those tokens.

Semantic analysis

The concept of parsing doesn’t include semantic analysis but this is something that always follows parsing. It’s performed by semantic analyzers, not parsers themselves.

Semantic analysis is a process of translating the initial source code (written in a high-level programming language) into object code (written in a low-level programming language). It’s a process that transforms source code into an executable program.

At this stage, semantic analyzers identify all the remaining errors from syntax analysis and generate an annotated parse tree, also known as an error-free parse tree. This stage is vital because parsing can’t detect all the errors in the source code.

If the initial string of symbols is free of errors, semantic analysis gives the green light for raw and unstructured data to become a clearly readable piece of information.