Recently, an old friend of mine called me for help. He has worked as a journalist for many years and recently gained the right to publish many of his early columns. He wanted to put his work on the web, but his columns were saved as plain text files, and he had neither the time nor the desire to learn HTML in order to convert them into Web pages. Since I was the only computer-savvy person in his phone book, he called me to see if I could help him.
"Let me handle it," I said, "and call me back in one hours." "Of course, when he called me a few hours later, I had prepared a solution for him." It takes a little bit of PHP, and I harvest his endless thanks and a box of red wine.
So what have I done in the next one hours? This is the content of this article. I'll show you how to use PHP to quickly transform pure ASCII text into readable HTML markup.
First, let's look at an example of a plain text file that my friend wants to convert:
Green for mars!
John R. Doe
The idea of little green men from Mars, long a staple of science fiction, may soon turn out to be less fantasy and more FA Ct.
Recent samples sent by the latest Mars exploration team, indicate a high presence of chlorophyll in the atmosphere. Chlorophyll, you'll recall, is what makes plants green. It's quite likely, therefore, that organisms on Mars would have, through continued exposure to the green stuff, developed a Greenish tinge on their outer exoskeleton.
An interview with Dr. Rushel Bunter, the head of ASDA ' s Mars colonization Project blah blah ...
What does this mean for you? So, it means blah blahblah ...
Track follow-ups to the story online at http://www.mars-connect.dom/. To-pictures of the latest samples, log on to http://www.asdamcp.dom/galleries/220/
Fairly standard text: it has a title, a signature, and many paragraphs of text. What you really need to do to translate this document into HTML is to keep the layout of the original text on a Web page using HTML's branch and paragraph tags. Special punctuation marks need to be converted into corresponding HTML symbols, and hyperlinks need to be clickable.
The following PHP code (listing a) completes all of the above tasks:
List A
Let's take a look at how it works:
The following is a reference fragment:
<?php
Set source file name and path
$source = "Toi200686.txt";
Read raw text as array
$raw = File ($source) or Die ("Cannot read file");
Retrieve and second lines (title and author)
$slug = Array_shift ($raw);
$byline = Array_shift ($raw);
Join remaining data into string
$data = Join (", $raw);
Replace special characters with HTML entities
Replace line breaks with <br/>
$html = NL2BR (Htmlspecialchars ($data));
Replace multiple spaces with single spaces
$html = preg_replace ('/ss+/', ', ', $html);
Replace URLs with <a href...> elements
$html = Preg_replace ('/s (w+://) (s+)/', ' <a href= ' "target=" _blank "></a>", $html);
Start building Output page
Add Page Header
$output =<<< HEADER
<style>
. slug {font-size:15pt; Font-weight:bold}
. byline {Font-style:italic}
</style>
<body>
HEADER;
Add page Content
$output. = "<div class= ' slug ' > $slug </div>";
$output. = "<div class= ' byline ' >by $byline </div><p/>";
$output. = "<div> $html </div>";
Add page Footer
$output .=<<< FOOTER
</body>
FOOTER;
Display in Browser
Echo $output;
and/or
Write output to a new. html file
File_put_contents (basename ($source, substr ($source, Strpos ($source, '. ')). ". html", $output) or Die ("Cannot write file");
?>
The first step is to read the pure ASCII file into a PHP array. This can be done easily by using the file () function, which converts each row of the file into an element in an array with a numeric index.
Then the title and the author line (I assume that both are the first two lines of the file) are extracted from the array by the Array_shift () function and placed in a separate variable. The remaining members of the array are then concatenated into a string. This string now includes the body of the entire article.
Special symbols such as "'", "<" and ">" are converted to corresponding HTML symbols through the Htmlspecialchars () function. To preserve the original format of the article, branches and segments are converted to HTML through the NL2BR () function
Elements. Multiple spaces in the middle of a story are compressed into a space by a simple string substitution.
The URL in the text of the article is detected with a regular expression, with elements on both sides. When the page is displayed in a Web browser, it converts the URL into a clickable hyperlink.
Then use the standard HTML rules to create the output HTML page. The title, author, and body of the article are formatted with CSS style rules. Although this script does not do this, you can customize the appearance of the final page in this place, and you can add graphic elements, colors, or other dazzling content to the template.
Once the HTML page is built, it can be sent to the browser or saved as a static file with File_put_contents (). Note that when you save, the original file name is decomposed, and a new file name (called filename.html) is created for the newly created Web page. You can then publish the Web page to a Web server, save it to a CD, or edit it further.
Note: When you use this script to create and save HTML files to disk, make sure that the script has write permissions to the file's saved directory.
As you can see, if you have a well-formed ASCII plain text data file, you're fairly quick to convert it into a Web page that you can use with PHP. If you already have a Web site and you plan to add new Web pages, it is fairly easy to debug the template used by the page builder to fit the original web site's appearance. You try it yourself!