Friday, May 18, 2007

Pictures from Paris

Blow are some pictures I took this afternoon in Paris. The res ones with the number 30 on them are of a depth ruler marked on a side of a river barge. The road sign is next to crossing signs at the end of a sidewalk not far from the closest metro station near Novotel Paris Tour Eiffel Hotel. The Eiffel tower is a reminder of the location. The last one is of some bolt and a writing on one of the sides of a bridge over the river Siene.




Appealing talks for me at XML Prague 2007

XML Prague 2007 will take place in June 16th and 17th.
From the published program I found special interest in:

Generative XPath
Oleg Paraschenko
Saint-Petersburg State University

The most convenient approach to navigate over XML trees is to use XPath queries. But there is no reason to limit ourselves to XML only. Indeed, it's useful to have XPath for navigating over arbitrary tree-like structures. There are a number of projects, in which developers have tried to implement XPath over project-specific hierarchical data. Unfortunately, most of these attempts resulted in something that resembled XPath, but was not XPath. The problem is that implementing XPath, even version 1.0, is a difficult task. We propose an alternative approach. Generative XPath is an XPath 1.0 processor that can be adapted to different hierarchical memory structures and different programming languages. Customizing Generative XPath to a specific environment is several magnitudes of order easier than implementing XPath from scratch.

The Generative XPath framework consists of three components:

  • XPath compiler,
  • XML virtual machine,
  • native (customization) layer.

The XPath compiler transforms XPath expressions to an executable code for the virtual machine. During execution, the code interacts with the native layer to access the tree nodes and its properties.

This paper explains what the virtual machine is, what is expected from the customization layer, and how they work together. Also, background information about the design and implementation of Generative XPath is given.



XML Processing by Streaming
Mohamed Zergaoui
Innovimax

The first part will be to present the state of the art of XML Streaming processing by reviewing the products in place (joost, cocoon, saxon, etc.), the API available (SAX, Stax, XOM), languages (CDuce, XDuce, XJ), and the spec in progress or stalled (STX, XML Processing, XQuery update). Speaking of what is currently in preparation (i.e. an XML Streaming XG at W3C). And taking the time to present what has already been done in SGML time (Balise and Omnimark, cursor idea that can be find in Arbortext OID in ACL, etc.)

Then the goal is to present all the area where some work has still to be done and give some hints on an elaborated vision of XML Processing trough different kind of process : around constraints, normalizing, streamable path, multilayer transformation, and last but not least constraints aware streamable path. Some light will be spot on static analysis of XSLT and XQuery to detect streamable instances. What are the needed evolutions of the cursor model? What are XDuce-like languages added values?

Thursday, May 17, 2007

O'reilly benchnarking XML parsers

O'reilly published results on XML parser benchmarks that they did:

Quote: "Our intention was to find the right components for our high performance web service security gateway, so that it could be run on a small dedicated appliance. The limited resources of such a device brought the C tests into the game, since the Java virtual machine already needs a lot of memory. Object model parsers are the most important parser types in the context of web service security because they can be used to alter a XML document in memory."
See:
and

Frank Mantec on XTech 2007: ``SOAP is a dead duck in the www''

I attended Frank Mantec's (Google inc.) talk today at XTech 2007 on Google Data API (see: http://2007.xtech.org/public/schedule/detail/33). He had made a very surprising claim:

"SOAP is a dead duck in the www"

And he said it as someone who worked for Microsoft for years and developed from scratched the WSDL. He was saying that interoperability across programming languages that are not from the same vendor (e.g., .NET languages and Java) is close to non existing and that the chances that it will be easy to connect via SOAP across languages and vendors is 0.00something% chance to happen. He said that WSDL was and still is a BIG MISTAKE.

WOW!

Finally, I don't feel that I'm the only one who was not able to make C, .NET, Java and Perl talk SOAP together without ugly "magic" and some "voodoo"and though that the reason for that was WSDL...

All this happened when questions from the audience came asking how come ATOM and REST approach was used rather than SOAP for the Google Data API.

Pentax to be bought by Hoya?

I just heard on BBC news that Hoya are to buy Pentax.
see: http://www.asahi.com/english/Herald-asahi/TKY200705170106.html

Some pictures I shot today in Paris

Here are a few shots I took this evening, sometime around 21:00-21:30.
You can probably notice the dirt on my caemra's sensor and also the rain drops on my lense, as I was taking these pictures while it was raining.



Wednesday, May 16, 2007

Music and Lyrics (2007) movie

During the flight from Tel-Aviv to Paris I got to see a very nice movie called Music and Lyrics.
If you have a chance to see it, then go. It is a very nice romantic comedy.

No wifi at xtech 2007

What seemed to be a minor problem on Tuesday, now looks like a big problem: no wifi access on xtech 2007. If you lookup available connections you see that idealliance has a network up and running but when you approach the conference organizers and ask for the necessary credentials to be able to use the connection they say that they have a problem and that there is no wifi internet connection.

I payed for internet access via adsl at the hotel room (man!! 15 euros for every 24h!!) in order to be in touch with work, and for those who don't want to pay up there's a yoyo netowrk that is mostly up and sometimes has good connection that you can highjack (it doesn't seem to be using any security at all!!).

Hopefully, tomorrow morning and on Friday the organizers will have some wifi available for the many many nerds here that are in great need to feed their internet craving :-)

XML processor implementors on XTech 2007?

I didn't get to meet anyone here at XTech 2007 who implements XML processors yet. I wonder if there are any here. I did meet with a few web application developers interested in Widget/API and some Widgets/API developers. I also met people who work on data, such as for libraries or large databases, so they are, in a sense, processing content. I also got to talk to some w3c and other XML gurus, who know their standards :-)

What I am missing here are people who have hands on experience in implementing XML 1.0/1.1 or XML Schema standards. Hopefully, there are some such developers attending the conference. If there are I hope I can find them, say hey and talk.

Extreme Markup Languages 2007

The Markup Theory & Practice Conference will take place on August 7-10 2007 in Montréal, Canada. On August the 6th the International Workshop on Markup of Overlapping Structures will take place at the same location.

I saw some interesting talks in the program that was just published and I listed below the talks that seem interesting to me in person. I hope to be able to get a copy of the papers or some other form of publication from the authors, as I doubt that I'll be able to attend.

Here's a copy&paste of the abstracts I find most attractive:

Writing an XSLT optimizer in XSLT

Michael Kay, Saxonica

In principle, XSLT is ideally suited to the task of writing an XSLT or XQuery optimizer. After all, optimizers consist of a set of rules for rewriting a tree representation of the query or stylesheet, and XSLT is specifically designed as a language for rule-based tree rewriting. The paper illustrates how the abstract syntax tree representing a query or stylesheet can be expressed as an XML data structure making it amenable to XSLT processing, and shows how a selection of rewrites can be programmed in XSLT. The key question determining whether the approach is viable in practice is performance. Some simple measurements suffice to demonstrate that there is a significant performance penalty, but not an insurmountable one: further work is needed to see whether it can be reduced to an acceptable level.


Streaming validation of schemata: the Lazy Typing discipline

Paolo Marinelli, Fabio Vitali, Stefano Zacchiroli, University of Bologna

Assertions, identity constraints, and conditional type assignments are (planned) features of XML Schema which rely on XPath evaluation. The XPath subset exploitable in those features is limited, for several reasons, including (apparently) to avoid buffering in evaluation of an expression. We divide XPath into subsets with varying streamability characteristics. We also identify the larger XPath subset which is compatible with the typing discipline we believe underlies some of the choices currently present in the XML Schema specification. Such a discipline requires that the type of an element be decided when its start tag is encountered and its validity when its end tag is encountered. An alternative “lazy typing” discipline is proposed in which both type assignment and validity assessment are fired as soon as they are available. Our approach is more flexible, giving schema authors control over the trade-off between using larger XPath subsets (and thus increasing buffering requirements) and expeditiousness.



Localization of schema languages

Felix Sasaki, World Wide Web Consortium

Internationalization is the process of making a product ready for global use. Localization is the adaptation of a product to a specific locale (e.g., country, region, or market). Localization of XML schemas (XSD, DTD, Relax NG) can include translation of element and attribute names, modification of data types, and content or locale-specific modifications such as currency and dates. Combining the TEI ODD (One Document Does it all) approach for renaming and adaptation of documentation, the Common Locale Data Registry (CLDR) for the modification of data types, and the new Internationalization Tag Set (W3C 2007), the authors have produced an implementation that will take as input a schema without any localization and some external localization parameters (such as the locale, the schema language, any localization annotations, and the CLDR data) and produce a localized schema for XSD and Relax NG. For a DTD, the implementation produces a Schematron document for validation of the modified data types that can be used with a separate renaming stylesheet to generate a localized DTD.





Applying structured content transformation techniques to software source code

Roy Amodeo, Stilo International

In structured content processing, benefits of modeling information content rather than presentation include the ability to automate the publication of information in many formats, tailored for different audiences. Software programs are a form of content, usually authored by humans and “published” by compilers to the computer that runs these programs. However, programs are not written solely for use by machines. If they were, programming languages would have no need for comments or programming style guidelines. The application developers and maintainers themselves are also an audience. Modeling software programs as XML instances is not a new idea. This paper takes a fresh look at the challenge of producing XML markup from programming languages by recasting it as a content processing problem using tools developed in the same way as any other content-processing application. The XML instances we generate can be used to craft transformation and analysis tools useful for software engineering by leveraging the marked up structure of the program rather than th native syntax.



Characterizing XQuery implementations: Categories and key features

Liam Quin, World Wide Web Consortium

XQuery 1.0 was published as a W3C Recommendation in January 2007, and there are fifty or more XQuery implementations. The XQuery Public Web page at W3C lists them but gives little or no guidance about choosing among them. The author proposes a simple ontology (taxonomy) to characterize XQuery implementations based on emergent patters of the features appearing in implementations and suggests some ways to choose among those implementations. The result is a clearer view of how XQuery is being used and also provides insights that will help in designing system architectures that incorporate XQuery engines. Although specific products are not endorsed in this paper, actual examples are given. With XML in use in places as diverse as automobile engines and encyclopedias, the most important part of investigating an XML tool’s suitability to task is often the tool’s intended usage environment. It is not unreasonable to suppose that most XQuery implementations are useful for something. Let's see!


Building a C++ XSLT processor for large documents and high performance

Kevin Jones, Jianhui Li, & Lan Yi, Intel

Some current XML users require an XSLT processor capable of handling documents up to 2 gigabytes. To produce a high-speed processor for such large documents, the authors employed a data representation that supports minimal inter-record linking to provide a small, in-memory representation. XML documents are represented as a sequence of records; these records can be viewed as binary encodings of events produced by an XML parser based on the XPath data model. The format is designed to support documents in excess of the 32-bit boundary; its current theoretical limit is 32 gigabytes. To offset the slower navigation speed for a records-based data format, the processor uses a new Path Map algorithm for simultaneous XPath processing. The authors carried out a series of experiments comparing their newly constructed XSLT processor to an object-model-based XSLT processor (the Intel® XSLT Accelerator Software library).


Converting into pattern-based schemas: A formal approach

Antonina Dattolo, University of Napoli Federico II
Angelo Di Iorio, Silvia Duca, Antonio Angelo Feliziani, & Fabio Vitali, University of Bologna

A traditional distinction among markup languages is how descriptive or prescriptive they are. We identify six levels along the descriptive/prescriptive spectrum. Schemas at a specific level of descriptiveness that we call "Descriptive No Order" (DNO) specify a list of allowable elements, their number and requiredness, but do not impose any order upon them. We have defined a pattern-based model based on a set of named patterns, each of which is an object and its composition rule (content model); we show that any schema can be converted into a pattern-based schema without loss of information at the DNO level. We present a formal analysis of lossless conversions of arbitrary schemas as a demonstration of the correctness and completeness of our pattern model. Although all examples are given in DTD syntax, the results should apply equality to XSD, Relax NG, or other schema languages.


Declarative specification of XML document fixup

Henry S. Thompson, University of Edinburgh

The historical and social complications of the development of the HTML family of languages defy easy analysis. In the recent discussion of the future of the family, one question has stood out: should ‘the next HTML’ have a schema or indeed any form of formal definition? One major constituency has vocally rejected the use of any form of schema, maintaining that the current behavior of deployed HTML browsers cannot usefully be described in any declarative notation. But a declarative approach, based on the Tag Soup work of John Cowan, proves capable of specifying the repair of ill-formed HTML and XHTML in a way that approximates the behavior of existing HTML browsers. A prototype implementation named PYXup demonstrates the capability; it operates on the PYX output produced by the Tag Soup scanner and fixes up well-formedness errors and some structural problems commonly found in HTML in the wild based on an easily understood declarative specification.




Some picture that I shot of Paris while attending XTech 2007

Here are a few shots I took on a bridge a few minutes walk away from the Novotel Paris Tour Eiffel hotel I'm strying at.

(I'll rotate the images... as soon as I can find how to do it with the Windows laptop that Martin gave me... ahhhh!! Where's GIMP when you need it?)



Tuesday, May 15, 2007

Priscilla Walmsley's tutorial on XPath 2.0, XSLT 2.0 and XQuery 1.0 on XTech 2007

I attended today a full day tutorial given by Priscilla Walmsley's on XPath 2.0, XSLT 2.0 and XQuery 1.0 on XTech 2007.

It was fun!!

Priscella is a very nice person and a very good presenter.

I learned a lot from her presentation and from the answers she gave to my numerous questions. I actually used her suggestion to ask questions freely and indeed asked a lot of them.

I found her book on XML Schema very useful both for designing XML Schemas and for insight of various "dark corners" of the (sometimes though to understand) XML Schema recommendation. And now after attending her tutorial, I think that I'll go ahead and buy her newly published book on XQuery.

I had some good luck to find Eric van der Vlist and Priscella Walmsley chatting together during the lunch break today. So I stepped in and introduced myself (which was not hard given the fact the I spent all morning in her class...). I had a good chance of describing some of the grief I was having with questions about XML Schema and that the only references other than the (sometimes) cryptic w3c recommendation are their books on XML Schema and the xmlschema-dev mailing list. They agreed. Then I asked them a technical question related to the namespace="##local" in an xsd:any in an XML Schema with no defined targetNamespace and non-qualified globally defined elements. It was a breeze for them to answer, and they also reasoned why and when this is useful.

I hope that I'll be able to talk with these two nice people again in the remaining days of this conference.