Discussion:
[cowiki-dev] Complete reverse parser refusal
Daniel T. Gorski
2005-08-02 23:08:59 UTC
Permalink
Hi,

please take the attached file (it is a legacy XML from delelnet.org) and put
it 1 to 1 in the "content" field of the "cowiki_node" table. Try to edit
this document then.

I get only "strong end strong end em end insanity in tcell" debug output -
and nothing else. Seems like crash or exit. No errors emmited neither in
script nor in debug log.

Self backed PHP version here: PHP 5.0.4 (cgi) (built: Apr 26 2005 02:22:01)

The reverse parser is the latest from CVS HEAD, it worked better before your
commit two or three hours ago :)

regards dtg
Archie Campbell
2005-08-03 17:35:29 UTC
Permalink
Should be better now, Dan. My commit was too hasty.

Regards,

Archie
Post by Daniel T. Gorski
Hi,
please take the attached file (it is a legacy XML from delelnet.org) and put
it 1 to 1 in the "content" field of the "cowiki_node" table. Try to edit
this document then.
I get only "strong end strong end em end insanity in tcell" debug output -
and nothing else. Seems like crash or exit. No errors emmited neither in
script nor in debug log.
Self backed PHP version here: PHP 5.0.4 (cgi) (built: Apr 26 2005 02:22:01)
The reverse parser is the latest from CVS HEAD, it worked better before your
commit two or three hours ago :)
regards dtg
------------------------------------------------------------------------
<h1>Welcome to the develnet.org projects repository</h1><p><tt>develnet.org</tt> is the host of projects written and mainly maintained by <link idref="150"></link>. You will find downloads of free projects here as well as information about commercial ones and how to acquire licenses and support. Besides, this platform is used as a personal homepage with a few tutorials and more or less interesting stuff. Enjoy.</p><p><strong>Currently running projects:</strong></p>
<list><ul><li><link idref="2"></link> - web collboration tool (GNU General Public License)</li><li><link idref="3"></link> - to be documented (distributed under a commercial license)</li></ul></list>
<hr/>
<h2>Latest confidential information leaked from the well secured laboratory:</h2>
<plugin name="doc.recent" limit="12" cutoff="30" title="Recently changed documents" style="float:right; margin: 0px 0px 5px 15px;"/>
<table><tr valign="top"><td colspan="1"> <em><strong>2 Feb 2005</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_cowiki.gif"/></td></tr></table>
<q><p>After the transfer of the coWiki project to the new maintainer Paul Hanchett, the new <link href="http://www.cowiki.org">site</link> is up and running. Latest versions and updates are available there. I'll successively close most coWiki related pages here on develnet.org.</p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>12 Jan 2005</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_cowiki.gif"/></td></tr></table>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>03 Jan 2005</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_cowiki.gif"/></td></tr></table>
<q><p>The coWiki web collaboration software is looking for a new maintainer. <link idref="324">Read more ...</link></p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>01 Dec 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_internal.gif"/></td></tr></table>
<q><p>Ah, here we go again. A few hardware parts of the server that were hosting <tt>develnet.org</tt> decided to meet their maker the <link idref="2"></link> way: collaboratively.</p><p>The server needed a few days off for vacation and repair.</p><p>Hence some of the <tt>develnet.org</tt> services - like demo, snaps and the bug eating machine - may be not reachable. We will re-build these services successively, so please expect failures over the next few days. Sorry for the inconvenience.</p><p>The <link strref="Download / CVS">CVS system</link> is not affected but the mailing lists are.</p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>25 Nov 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_cowiki.gif"/></td></tr></table>
<q><p>Internationalization weeks at develnet.org! <link strref="Wagner Sartori Junior"></link> provided a Portuguese localization for the coWiki project. Thanks.</p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>10 Nov 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_cowiki.gif"/></td></tr></table>
<q><p><link idref="320"></link> added a Slovak localization to the coWiki project. Thanks a lot.</p><p><em>Attention:</em> If you need a version that is working with the latest PHP 5.0.2 release, please check out a version <em>before</em> October 10th 2004. Current coWiki HEAD is <em>not</em> stable. Stay tuned.</p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>16 Oct 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_cowiki.gif"/></td></tr></table>
<q><p>The <link href="http://cvs.develnet.org/cgi-bin/cvsweb.cgi/cowiki/">coWiki CVS repository</link> is in an unstable state. Please do not check out any source for production environments. We will inform you if we think that it <em>might</em> work - anyway it's a development version, bad suprises included.</p><p><em>Update:</em> If you need a version that is working with the latest PHP 5.0.2 release, please check out a version <em>before</em> October 10th 2004.</p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>19 Sep 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_vividsites.gif"/></td></tr></table>
<q><p>A teaser from the author of coWiki: the first beta <link href="http://demo.vividsites.de">demo version of vividsites 3.0.0</link> opened to the public. At the time of writing the demo installation might be especially interesting for German speaking people only as it ain't been localized yet. Anyway drop an eye on it.</p><p>vividsites is a HTML-based descriptive data management engine written in PHP4 for closed user groups (CUGs) that allows a programmer to adapt, create and control databases and its tables with only a few lines of descriptive XML markup.</p><p>It is modular and highly customizable - no additional programming is usually required for database schema changes, existing legacy databases or validation and plausibility rules. You are able to build an appropriate backend data management tool for your customers quickly - and it will look and work like exclusively created for them.</p><p>Additionally it provides roles/privileges based access, a workflow system (dual control) and a powerful search capabilities of any associated database tables by default. More about this product to come these days <link idref="3">here</link>.</p><p><table><tr valign="top"><td colspan="1"> <plugin name="embed" src="embed/news_vividsites.gif"/> </td><td colspan="1"> is closed source.</td></tr></table></p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>18 Sep 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_internal.gif"/></td></tr></table>
<q><p><tt>develnet.org</tt> has been down for maintenance for a few days due to some strange undefinable data loss in the coWiki database so we took the chance to restructure a few pages. The origin of this 'bug' is still not found. Sorry for your inconvenience.</p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>27 Jul 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_cowiki.gif"/></td></tr></table>
<q><p><link strref="Tomasz Dudzisz"></link> contributed two very stylish HTML templates to the coWiki project. The 'Africa' template is a warm and eye friendly template and 'Spitsbergen' provides a bluish arctic flair. You or your readers may change the preferred template in the 'Preferences' menu or - if you are an administrator - in the <tt>core.conf</tt> configuration file. Great job, thanks Tomasz!</p></q>
<table align="center"><tr valign="top"><td colspan="1"> <em>Africa V1.0</em> </td><td colspan="2"> <em>Spitsbergen V1.0</em></td></tr><tr valign="top"><td colspan="1"><plugin name="embed" src="embed/tpl_africa_tn.gif" alt="Africa" style="border: 1px #AAAAAA solid"/></td><td colspan="1"> </td><td colspan="1"><plugin name="embed" src="embed/tpl_spitsbergen_tn.gif" alt="Spitsbergen" style="border: 1px #AAAAAA solid"/></td></tr></table>
<q><p>These templates are available in CVS only - until the next stable coWiki release.</p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>8 Jul 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_cowiki.gif"/></td></tr></table>
<q><p>A Polish localization by <link strref="Tomasz Dudzisz"></link> has been added to the coWiki development branch.</p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>15 Feb 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_internal.gif"/></td></tr></table>
<q><p>The CVS server and its hardware has been set up and should work again as expected. If you run into troubles, please let us know at the <link strref="Mailing lists">developer mailing list</link>.</p><p>The web interface to our CVS at <uri strref="http://cvs.develnet.org/"/> is going to be fixed in the next few days. Its repair won't be announced separately.</p><p>Hang the mindless hax0rs and script kids.</p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>12 Feb 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_internal.gif"/></td></tr></table>
<q><p>The <tt>develnet.org</tt> CVS server has been hacked from <tt>*.dcenter.bezeqint.net</tt>. Unfortunately I detected defective hard drives on this machine while I was trying to clean up the mess. Hence the CVS service won't be availabe for a few days, sorry.</p></q>
<hr/>
<table><tr valign="top"><td colspan="1"> <em><strong>19 Jan 2004</strong></em> </td><td colspan="1"> <plugin name="embed" src="embed/news_cowiki.gif"/></td></tr></table>
<q><p><link strref="Hakan Küçükyilmaz"></link> has added Turkish localization to the latest coWiki development version. Neat.</p><p>FYI: The main coWiki core development is still frozen until the PHP5 developers know what they want. The one and only stable coWiki version is 0.3.3 with PHP beta1</p></q>
------------------------------------------------------------------------
---------------------------------------------------------------------
Daniel T. Gorski
2005-08-04 18:22:11 UTC
Permalink
On 03 Aug 18:35, Archie Campbell wrote:

Hi Archie,
Post by Archie Campbell
Post by Daniel T. Gorski
please take the attached file (it is a legacy XML from delelnet.org) and
put it 1 to 1 in the "content" field of the "cowiki_node" table. Try to
edit this document then.
Should be better now, Dan. My commit was too hasty.
With your new changes and the _given legacy XML_ data I get now (while
trying to edit):

--- start of browser output ---

Error on line 48 before

has added Turkish localization to the latest coWiki development version.
Neat.

FYI: The main coWiki core development is still frozen until the PHP5
developers know what they want. The one and only stable coWiki version is
0.3.3 with PHP beta1
Error on line 49 before

--- end of browser output ---

The generated HTML is:

--- start of source output ---

Error on line 48 before <q><p> has added Turkish localization to
the latest coWiki development version. Neat.</p><p>FYI:
The main coWiki core development is still frozen until the PHP5
developers know what they want. The one and only stable coWiki version
is 0.3.3 with PHP beta1</p></q>
Error on line 49 before

--- end of source output ---

Then, below this output, the edit "window"/box begins. Any ideas? Did you
try what I suggested, to take the legacy XML and try to work with it?

I am also wondering as I got doubleqoutes (") emitted by the Firefox browser
at the location where the <q> and </q> are, but there are no quotes in the
source. Just asking myself if <q> is a valid HTML element and if Firefox
spits out these quotes automatically as this doesn't happen with IE. But this
not the "real" problem :)

regards dtg
Paul Hanchett
2005-08-04 18:45:26 UTC
Permalink
Archie,

Can you or Daniel give me a "dumb manager" description of how the parser
works (is supposed to work)?

Paul
Post by Daniel T. Gorski
Hi Archie,
Post by Archie Campbell
Post by Daniel T. Gorski
please take the attached file (it is a legacy XML from delelnet.org) and
put it 1 to 1 in the "content" field of the "cowiki_node" table. Try to
edit this document then.
Should be better now, Dan. My commit was too hasty.
With your new changes and the _given legacy XML_ data I get now (while
--- start of browser output ---
Error on line 48 before
has added Turkish localization to the latest coWiki development version.
Neat.
FYI: The main coWiki core development is still frozen until the PHP5
developers know what they want. The one and only stable coWiki version is
0.3.3 with PHP beta1
Error on line 49 before
--- end of browser output ---
--- start of source output ---
Error on line 48 before <q><p> has added Turkish localization to
The main coWiki core development is still frozen until the PHP5
developers know what they want. The one and only stable coWiki version
is 0.3.3 with PHP beta1</p></q>
Error on line 49 before
--- end of source output ---
Then, below this output, the edit "window"/box begins. Any ideas? Did you
try what I suggested, to take the legacy XML and try to work with it?
I am also wondering as I got doubleqoutes (") emitted by the Firefox browser
at the location where the <q> and </q> are, but there are no quotes in the
source. Just asking myself if <q> is a valid HTML element and if Firefox
spits out these quotes automatically as this doesn't happen with IE. But this
not the "real" problem :)
regards dtg
---------------------------------------------------------------------
Archie Campbell
2005-08-04 21:12:53 UTC
Permalink
Right. Seemingly the non-ascii character data in "Hakan Küçükyilmaz"
causes an error from the PHP5 xmlparser at the heart of the ReverseParser.

Apparently the solution is for this part of our coWiki-XML to read
thus...(to be strict XML)

<q><p><link><strref><![CDATA[Hakan Küçükyilmaz]]></strref></link> has
added ...etc.

This is fairly demanding. Also, will we be unable to benefit from PHP5
xml manipulation functions whilst our pseudo-XML is not strict XML? That
is, will I be forced to make an xml parser for the ReverseParser?

Apparently so, or not at all. Voting for...

1-XMLconverter to run over legacy documents and tweak to WikiParser to
emit strict CDATA etc.

or

2-XMLparsing ReverseParser that eats our pseudo-XML without problems.

I'm not going to start right away, because it's late, but this matter is
pretty important.

Regards,

Archie
Post by Daniel T. Gorski
Hi Archie,
Post by Archie Campbell
Post by Daniel T. Gorski
please take the attached file (it is a legacy XML from delelnet.org) and
put it 1 to 1 in the "content" field of the "cowiki_node" table. Try to
edit this document then.
Should be better now, Dan. My commit was too hasty.
With your new changes and the _given legacy XML_ data I get now (while
--- start of browser output ---
Error on line 48 before
has added Turkish localization to the latest coWiki development version.
Neat.
FYI: The main coWiki core development is still frozen until the PHP5
developers know what they want. The one and only stable coWiki version is
0.3.3 with PHP beta1
Error on line 49 before
--- end of browser output ---
--- start of source output ---
Error on line 48 before <q><p> has added Turkish localization to
The main coWiki core development is still frozen until the PHP5
developers know what they want. The one and only stable coWiki version
is 0.3.3 with PHP beta1</p></q>
Error on line 49 before
--- end of source output ---
Then, below this output, the edit "window"/box begins. Any ideas? Did you
try what I suggested, to take the legacy XML and try to work with it?
I am also wondering as I got doubleqoutes (") emitted by the Firefox browser
at the location where the <q> and </q> are, but there are no quotes in the
source. Just asking myself if <q> is a valid HTML element and if Firefox
spits out these quotes automatically as this doesn't happen with IE. But this
not the "real" problem :)
regards dtg
---------------------------------------------------------------------
Archie Campbell
2005-08-07 14:37:38 UTC
Permalink
OK. Obviously option 1 is superior to option 2, because a pseudo-XML
eater is required in both options, plus option 1 involves the production
of strict XML, which will no doubt become useful at a later date. For
example, allowing the ReverseParser and HtmlTransformer to be designed
around the PHP5 xmlparser, drastically reducing the amount of work to be
done.

1.

Therefore, I'm going to start on a class.WikiPseudoXmlConverter.php that
will take legacy wiki database node data, in the old pseudo-XML format
and output strict XML.

<link strref="Hakan Küçükyilmaz">Contributor</link>

will become...

<link><strref><![CDATA[Hakan
Küçükyilmaz]]></strref><![CDATA[Contributor]]></link>

2.

class.WikiParser.php will have to be altered to output strict XML,

3.

class.WikiReverseParser.php will be altered to reflect the changes. Code
recognising attributes will be changed to stateful recognition of
attribute-elements, as above.

4.

class.FrontHtmlTransformer.php will be refactored to use the builtin xml
parser.

5.

class.ParserTest.php changed to incorporate class.PseudoXmlConverter.php
as $x. Tests become three-way.
a) Wiki to strict XML (WikiParser), b) strict XML to Wiki
(WikiReverseParser), c) pseudo-XML (currently included as target, now
useful as input) to strict XML (WikiPseudoXmlConverter).

Comments are welcomed. This is currently more important than issue #239.
Bear in mind that issue #239 is a lot simpler than the XML refactoring;
it can wait.

Patience will be required. I'm only going to commit *after* step 5 has
been reached, and I've been able to throw the unit tests at the three
main files, Parser ReverseParser and PseudoXmlConverter.

Regards,

Archie
Post by Archie Campbell
Right. Seemingly the non-ascii character data in "Hakan Küçükyilmaz"
causes an error from the PHP5 xmlparser at the heart of the ReverseParser.
Apparently the solution is for this part of our coWiki-XML to read
thus...(to be strict XML)
<q><p><link><strref><![CDATA[Hakan Küçükyilmaz]]></strref></link> has
added ...etc.
This is fairly demanding. Also, will we be unable to benefit from PHP5
xml manipulation functions whilst our pseudo-XML is not strict XML?
That is, will I be forced to make an xml parser for the ReverseParser?
Apparently so, or not at all. Voting for...
1-XMLconverter to run over legacy documents and tweak to WikiParser to
emit strict CDATA etc.
or
2-XMLparsing ReverseParser that eats our pseudo-XML without problems.
I'm not going to start right away, because it's late, but this matter
is pretty important.
Regards,
Archie
Post by Daniel T. Gorski
Hi Archie,
Post by Archie Campbell
Post by Daniel T. Gorski
please take the attached file (it is a legacy XML from delelnet.org) and
put it 1 to 1 in the "content" field of the "cowiki_node" table. Try to
edit this document then.
Should be better now, Dan. My commit was too hasty.
With your new changes and the _given legacy XML_ data I get now (while
--- start of browser output ---
Error on line 48 before
has added Turkish localization to the latest coWiki development version.
Neat.
FYI: The main coWiki core development is still frozen until the PHP5
developers know what they want. The one and only stable coWiki version is
0.3.3 with PHP beta1
Error on line 49 before
--- end of browser output ---
--- start of source output ---
Error on line 48 before <q><p> has added Turkish localization to
The main coWiki core development is still frozen until the PHP5
developers know what they want. The one and only stable coWiki version
is 0.3.3 with PHP beta1</p></q>
Error on line 49 before
--- end of source output ---
Then, below this output, the edit "window"/box begins. Any ideas? Did you
try what I suggested, to take the legacy XML and try to work with it?
I am also wondering as I got doubleqoutes (") emitted by the Firefox browser
at the location where the <q> and </q> are, but there are no quotes in the
source. Just asking myself if <q> is a valid HTML element and if Firefox
spits out these quotes automatically as this doesn't happen with IE. But this
not the "real" problem :)
regards dtg
---------------------------------------------------------------------
Daniel T. Gorski
2005-08-09 04:18:04 UTC
Permalink
Post by Archie Campbell
OK. Obviously option 1 is superior to option 2, because a pseudo-XML
eater is required in both options, plus option 1 involves the production
of strict XML, which will no doubt become useful at a later date.
Full ACK.
Post by Archie Campbell
1.
Therefore, I'm going to start on a class.WikiPseudoXmlConverter.php that
will take legacy wiki database node data, in the old pseudo-XML format
and output strict XML.
<link strref="Hakan Küçükyilmaz">Contributor</link>
will become...
<link><strref><![CDATA[Hakan
Küçükyilmaz]]></strref><![CDATA[Contributor]]></link>
Such a conversion can be made _once_ by the installer (and its so called
"Migrator" classes). That means, that you won't need to check on each
request (read and save), whether you are working with a legacy document or
with a (new) strict one.

This seems to be a proper solution to me - and future oriented.

BTW: As I started with coWiki, PHP 5 did not support the DOM-XML by default
as it does today. Hence a rewriting of the event based XML reader/writer we
have now, to use the meanwhile build-in DOM-XML reader/writer would ease
much of the work and would be less error prone.

<http://cowiki.tigris.org/issues/show_bug.cgi?id=41>

regards dtg
Daniel T. Gorski
2005-08-11 04:31:07 UTC
Permalink
Post by Daniel T. Gorski
Post by Archie Campbell
1.
Therefore, I'm going to start on a class.WikiPseudoXmlConverter.php that
will take legacy wiki database node data, in the old pseudo-XML format
and output strict XML.
<link strref="Hakan Küçükyilmaz">Contributor</link>
will become...
<link><strref><![CDATA[Hakan
Küçükyilmaz]]></strref><![CDATA[Contributor]]></link>
Such a conversion can be made _once_ by the installer (and its so called
"Migrator" classes). That means, that you won't need to check on each
request (read and save), whether you are working with a legacy document or
with a (new) strict one.
This seems to be a proper solution to me - and future oriented.
Of course it would be only safe with a transactional persistence (InnoDB) -
especially if a bigger amount of documents needs to be converted.

regards dtg

Loading...