Shlomo Yona's Blog -- יומן הרשת של שלמה יונה

Wednesday, May 16, 2007

Music and Lyrics (2007) movie

During the flight from Tel-Aviv to Paris I got to see a very nice movie called Music and Lyrics.
If you have a chance to see it, then go. It is a very nice romantic comedy.

What seemed to be a minor problem on Tuesday, now looks like a big problem: no wifi access on xtech 2007. If you lookup available connections you see that idealliance has a network up and running but when you approach the conference organizers and ask for the necessary credentials to be able to use the connection they say that they have a problem and that there is no wifi internet connection.

I payed for internet access via adsl at the hotel room (man!! 15 euros for every 24h!!) in order to be in touch with work, and for those who don't want to pay up there's a yoyo netowrk that is mostly up and sometimes has good connection that you can highjack (it doesn't seem to be using any security at all!!).

Hopefully, tomorrow morning and on Friday the organizers will have some wifi available for the many many nerds here that are in great need to feed their internet craving :-)

XML processor implementors on XTech 2007?

I didn't get to meet anyone here at XTech 2007 who implements XML processors yet. I wonder if there are any here. I did meet with a few web application developers interested in Widget/API and some Widgets/API developers. I also met people who work on data, such as for libraries or large databases, so they are, in a sense, processing content. I also got to talk to some w3c and other XML gurus, who know their standards :-)

What I am missing here are people who have hands on experience in implementing XML 1.0/1.1 or XML Schema standards. Hopefully, there are some such developers attending the conference. If there are I hope I can find them, say hey and talk.

Extreme Markup Languages 2007

The Markup Theory & Practice Conference will take place on August 7-10 2007 in Montréal, Canada. On August the 6th the International Workshop on Markup of Overlapping Structures will take place at the same location.

I saw some interesting talks in the program that was just published and I listed below the talks that seem interesting to me in person. I hope to be able to get a copy of the papers or some other form of publication from the authors, as I doubt that I'll be able to attend.

Here's a copy&paste of the abstracts I find most attractive:

Writing an XSLT optimizer in XSLT

Michael Kay, Saxonica

In principle, XSLT is ideally suited to the task of writing an XSLT or XQuery optimizer. After all, optimizers consist of a set of rules for rewriting a tree representation of the query or stylesheet, and XSLT is specifically designed as a language for rule-based tree rewriting. The paper illustrates how the abstract syntax tree representing a query or stylesheet can be expressed as an XML data structure making it amenable to XSLT processing, and shows how a selection of rewrites can be programmed in XSLT. The key question determining whether the approach is viable in practice is performance. Some simple measurements suffice to demonstrate that there is a significant performance penalty, but not an insurmountable one: further work is needed to see whether it can be reduced to an acceptable level.

Streaming validation of schemata: the Lazy Typing discipline

Paolo Marinelli, Fabio Vitali, Stefano Zacchiroli, University of Bologna

Assertions, identity constraints, and conditional type assignments are (planned) features of XML Schema which rely on XPath evaluation. The XPath subset exploitable in those features is limited, for several reasons, including (apparently) to avoid buffering in evaluation of an expression. We divide XPath into subsets with varying streamability characteristics. We also identify the larger XPath subset which is compatible with the typing discipline we believe underlies some of the choices currently present in the XML Schema specification. Such a discipline requires that the type of an element be decided when its start tag is encountered and its validity when its end tag is encountered. An alternative “lazy typing” discipline is proposed in which both type assignment and validity assessment are fired as soon as they are available. Our approach is more flexible, giving schema authors control over the trade-off between using larger XPath subsets (and thus increasing buffering requirements) and expeditiousness.

Localization of schema languages

Felix Sasaki, World Wide Web Consortium

Internationalization is the process of making a product ready for global use. Localization is the adaptation of a product to a specific locale (e.g., country, region, or market). Localization of XML schemas (XSD, DTD, Relax NG) can include translation of element and attribute names, modification of data types, and content or locale-specific modifications such as currency and dates. Combining the TEI ODD (One Document Does it all) approach for renaming and adaptation of documentation, the Common Locale Data Registry (CLDR) for the modification of data types, and the new Internationalization Tag Set (W3C 2007), the authors have produced an implementation that will take as input a schema without any localization and some external localization parameters (such as the locale, the schema language, any localization annotations, and the CLDR data) and produce a localized schema for XSD and Relax NG. For a DTD, the implementation produces a Schematron document for validation of the modified data types that can be used with a separate renaming stylesheet to generate a localized DTD.

Applying structured content transformation techniques to software source code

Roy Amodeo, Stilo International

In structured content processing, benefits of modeling information content rather than presentation include the ability to automate the publication of information in many formats, tailored for different audiences. Software programs are a form of content, usually authored by humans and “published” by compilers to the computer that runs these programs. However, programs are not written solely for use by machines. If they were, programming languages would have no need for comments or programming style guidelines. The application developers and maintainers themselves are also an audience. Modeling software programs as XML instances is not a new idea. This paper takes a fresh look at the challenge of producing XML markup from programming languages by recasting it as a content processing problem using tools developed in the same way as any other content-processing application. The XML instances we generate can be used to craft transformation and analysis tools useful for software engineering by leveraging the marked up structure of the program rather than th native syntax.

Characterizing XQuery implementations: Categories and key features

Liam Quin, World Wide Web Consortium

XQuery 1.0 was published as a W3C Recommendation in January 2007, and there are fifty or more XQuery implementations. The XQuery Public Web page at W3C lists them but gives little or no guidance about choosing among them. The author proposes a simple ontology (taxonomy) to characterize XQuery implementations based on emergent patters of the features appearing in implementations and suggests some ways to choose among those implementations. The result is a clearer view of how XQuery is being used and also provides insights that will help in designing system architectures that incorporate XQuery engines. Although specific products are not endorsed in this paper, actual examples are given. With XML in use in places as diverse as automobile engines and encyclopedias, the most important part of investigating an XML tool’s suitability to task is often the tool’s intended usage environment. It is not unreasonable to suppose that most XQuery implementations are useful for something. Let's see!

Building a C++ XSLT processor for large documents and high performance

Kevin Jones, Jianhui Li, & Lan Yi, Intel

Some current XML users require an XSLT processor capable of handling documents up to 2 gigabytes. To produce a high-speed processor for such large documents, the authors employed a data representation that supports minimal inter-record linking to provide a small, in-memory representation. XML documents are represented as a sequence of records; these records can be viewed as binary encodings of events produced by an XML parser based on the XPath data model. The format is designed to support documents in excess of the 32-bit boundary; its current theoretical limit is 32 gigabytes. To offset the slower navigation speed for a records-based data format, the processor uses a new Path Map algorithm for simultaneous XPath processing. The authors carried out a series of experiments comparing their newly constructed XSLT processor to an object-model-based XSLT processor (the Intel® XSLT Accelerator Software library).

Converting into pattern-based schemas: A formal approach

Antonina Dattolo, University of Napoli Federico II
Angelo Di Iorio, Silvia Duca, Antonio Angelo Feliziani, & Fabio Vitali, University of Bologna

A traditional distinction among markup languages is how descriptive or prescriptive they are. We identify six levels along the descriptive/prescriptive spectrum. Schemas at a specific level of descriptiveness that we call "Descriptive No Order" (DNO) specify a list of allowable elements, their number and requiredness, but do not impose any order upon them. We have defined a pattern-based model based on a set of named patterns, each of which is an object and its composition rule (content model); we show that any schema can be converted into a pattern-based schema without loss of information at the DNO level. We present a formal analysis of lossless conversions of arbitrary schemas as a demonstration of the correctness and completeness of our pattern model. Although all examples are given in DTD syntax, the results should apply equality to XSD, Relax NG, or other schema languages.

Declarative specification of XML document fixup

Henry S. Thompson, University of Edinburgh

The historical and social complications of the development of the HTML family of languages defy easy analysis. In the recent discussion of the future of the family, one question has stood out: should ‘the next HTML’ have a schema or indeed any form of formal definition? One major constituency has vocally rejected the use of any form of schema, maintaining that the current behavior of deployed HTML browsers cannot usefully be described in any declarative notation. But a declarative approach, based on the Tag Soup work of John Cowan, proves capable of specifying the repair of ill-formed HTML and XHTML in a way that approximates the behavior of existing HTML browsers. A prototype implementation named PYXup demonstrates the capability; it operates on the PYX output produced by the Tag Soup scanner and fixes up well-formedness errors and some structural problems commonly found in HTML in the wild based on an easily understood declarative specification.

Some picture that I shot of Paris while attending XTech 2007

Here are a few shots I took on a bridge a few minutes walk away from the Novotel Paris Tour Eiffel hotel I'm strying at.

(I'll rotate the images... as soon as I can find how to do it with the Windows laptop that Martin gave me... ahhhh!! Where's GIMP when you need it?)

Tuesday, May 15, 2007

Priscilla Walmsley's tutorial on XPath 2.0, XSLT 2.0 and XQuery 1.0 on XTech 2007

I attended today a full day tutorial given by Priscilla Walmsley's on XPath 2.0, XSLT 2.0 and XQuery 1.0 on XTech 2007.

It was fun!!

Priscella is a very nice person and a very good presenter.

I learned a lot from her presentation and from the answers she gave to my numerous questions. I actually used her suggestion to ask questions freely and indeed asked a lot of them.

I found her book on XML Schema very useful both for designing XML Schemas and for insight of various "dark corners" of the (sometimes though to understand) XML Schema recommendation. And now after attending her tutorial, I think that I'll go ahead and buy her newly published book on XQuery.

I had some good luck to find Eric van der Vlist and Priscella Walmsley chatting together during the lunch break today. So I stepped in and introduced myself (which was not hard given the fact the I spent all morning in her class...). I had a good chance of describing some of the grief I was having with questions about XML Schema and that the only references other than the (sometimes) cryptic w3c recommendation are their books on XML Schema and the xmlschema-dev mailing list. They agreed. Then I asked them a technical question related to the namespace="##local" in an xsd:any in an XML Schema with no defined targetNamespace and non-qualified globally defined elements. It was a breeze for them to answer, and they also reasoned why and when this is useful.

I hope that I'll be able to talk with these two nice people again in the remaining days of this conference.

Sunday, May 13, 2007

Sivan's Pet Eggplant

Sivan with her pet eggplant:

Wednesday, May 9, 2007

Telecommuting improves productivity?

The following paper from career news says that evidence shows that telecommuting improves productivity. Read the fine-print in the paper.

More on managing your boss...

Lobbying your opinion upwards to your superiors is something that is worth doing, but only if you think you can do it well.

Here are a few things to note what you're trying to promote your opinions to your boss:

* find a common terminology -- be confident that your boss indeed understands what you think s/he understands. Gently try to say the same thing in more than one way, or from more than one perspective. This will help make your point clear.
* try to think like your boss -- you need to be able to explain why doing things your way actually makes your boss look better and helps your boss achieve his/her goals better than the alternative way(s).
* be flexible to change priorities dynamically as a response to signals that you receive from you boss: think like an investor: if you were investing money in this project, how would you see your money being well spent -- what actions would you expect be taken?
* think globally and do your best to see a bigger picture -- you might want to consider other perspectives to your opinion by trying to take another point of view, for example, what gives more value to clients, what can give a larger impact financially vs. minimum usage of resources (such as time, money, expertise...). Once you consider your issue from other perspectives and weight how they might look with your boss's glasses, you might either find more good reasons for your agenda or realize that your agenda has low probability to be accepted, and why.
* Quality makes a good argument -- if you can clearly and wisely argue that your agenda increases quality, it will give you extra credit. At the same time, though, you should always think how your suggestions to your boss can be validated after the act and that the criteria for success is clear and the means to measure the success is also clear. What I mean here is that you should not only strive to make good suggestions and be convincing -- for your credibility's sake, you also need to make sure that both you and your boss understand the criteria for success of your suggestion, that you both understand how this criteria should be measured and that you both understand the "hidden" cost of this validity of success. So -- quality has more than one side here: one is promoting your idea by reasoning how it contributes to quality, and the other is that you can articulate the means of demonstrating the quality of your work once its done.
* If you feel that there is resistance to your ideas, try to think why, try to understand the motivation for resisting your idea -- you might then find out that you can find reasons how what seems to be a limitation is actually a plus, or that the benefits outweight the disadvantages. As soon as you can fairly reason to yourself that you have good points to remove the resistance, you might want to think how you lobby it to your boss without marking yourself as someone who likes to argue for the same of argument and as someone who does not know how to take a no. Letting your boss think that it was his/her idea at the first place will many times prove useful -- it can remove a great deal of subjective resistance that mostly stemmes from personal ego...

You can read about some of these ideas and other ideas in a nice article "The Language of Success".

The Quit-Lag Phenomenon

I read an interesting articled titled "The Quit-Lag Phenomenon". What it basically says is that once an employee is set to leave a company, that employee can become very productive by doing things that seemed impossible or not so important before. This is due to removal of stress, concern of company politics, long term tasks etc. I don't want to write too many details of the paper, as you can read it yourself. What I do want is to give the punch-line, as I understood it:

By removing stress factors from your working environment and by taking care of details, you'll be able to be more productive and be able to address tasks that otherwise would seem intimidating. I think this is something worth trying.

Sunday, May 6, 2007

מסיבת ל"ג בעומר עם ילדי הגן של אביב

Some photos from a party with the kindergarden kids of "גן נעים" for ל"ג בעומר

Friday, May 4, 2007

Healy's birth-party

Today we attended Healy's birthday party. She is 2.5 months old.
Here are some shots from the party:

Wednesday, May 2, 2007

Iterating over permutations of a list

While trying to write down a Perl program that tries to generate XML instances given an XML Schema I stumbled upon the problem of generating instances of xsd:all subtree.

What xsd:all means, basically, is that the listed elements can appear in any order (elements that are listed in an xsd:all can appear 0 or 1 times exactly).

So, in order to do this, the problem reduces to Create an iterator that returns the next permutation of a given list. Why do I want an iterator?

Consider the following implementation:


sub permute {
   my($list)=@_;
   return [[]] unless scalar @$list;
   my @permutations=();
   for(my $i=0; $i<@$list;++$i) {
      my @copy_of_list= @$list;
      my $removed_list_element = splice(@copy_of_list,$i,1);
      push(@permutations, map{[$removed_list_element,@$_]} @{permute(\@copy_of_list)});
   }
   return \@permutations;
}

The permute function receives a reference to a list, say, [ a b c ] and then it returns a list of lists, which is a list of all the permutations of the original list: [ [a b c] [a c b] [b a c] [b c a] [c a b] [c b a] ].

You can see that for a list with N items there are N! permutations.

Note that I'm not discussing lists that contain items that appear more than once. Those actually have less than N! permutations as disarranging places of similar items is meaningless. For example, the permutations of [a b a] are: [ [a b a] [ a a b] [b a a] ] because the remaining 3 permutations are the same.

Notice that the number of permutations grows rapidly as N becomes larger. This makes permute expensive.

We can try to do less calculations in permute by understanding that listing all permutations of two lists of the same length, say N, is actually listing the list of permutations for the list [0 1 2 ... N-1] and then mapping the elements to the proper positions. Combine this with a cache and we get a speedup whenever we want to permute a list of a length that was already computed before. Note also that once you permute a list of length N you also permuted lists of length 1 and of length 2 and so on up until N. So caching also helps when we want to list permutations of smaller lists than computed before.

Let's add a cache to the permute function:


{
   my %cache;
   sub permute {
      my($list)=@_;
      return [[]] unless scalar @$list;
      my $key = join ',',@$list;
      return $cache{$key} if exists $cache{$key};
      my @permutations=();
      for(my $i=0; $i<@$list;++$i) {
         my @copy_of_list= @$list;
         my $removed_list_element = splice(@copy_of_list,$i,1);
         push(@permutations, map{[$removed_list_element,@$_]} @{permute(\@copy_of_list)});
      }
      $cache{$key}=\@permutations;
      return \@permutations;
   }
}

Our cache is simply a hash. The key to the have is a serialization of the given list elements.

OK, now, but how do we map the elements back onto the list of permutations of indices? Consider the following code:


sub permutmap {
     my($original_list,$indices_lists) = @_;
     my @list_of_lists;
     foreach my $il (@$indices_lists){
             my @l;
             foreach my $i (@$il) {
                     push @l,$original_list->[$i];
             }
             push @list_of_lists,\@l;
     }
     return \@list_of_lists;
}

Which can similarly be implemented without the explicit loops using map:


sub mapermute {
     my($original_list,$indices_lists) = @_;
     return [ map { [map {$original_list->[$_]} @$_] } @$indices_lists ];
}

So all you need now is to feed permute with 1 .. @list and then apply mapermute or permutemap on the result.

Still, I have a problem -- there are too many permutations, listing them is expensive, and not necessarily useful (I might not be able to read them all and use them all). It would be great if I could iterate over the permutations until I have enough, and not generate any permutation beyond the ones that I need. This requires a different approach.

I found a few ones such as:
http://www.perlmonks.org/?node_id=29374
and
http://www.perlmonks.org/?node_id=520116
and also
http://hop.perl.plover.com/announce/07
although the latter hides some of the magic only for subscribers.
You can Google for more results.

Now, let's try and see how to convert my recursion based solution into an iterator.

A colleague at work suggested I take a look at http://en.wikipedia.org/wiki/Permutation#Numbering_permutations
What we can learn from that link is basically:

Numbering permutations

Factoradic numbers can be used to assign unique numbers to permutations, such that given a factoradic of k one can quickly find the corresponding permutation.

Algorithm to generate permutations

For every number $k$ ( $0 \le k < n!$ ) this following algorithm generates the corresponding permutation of the initial sequence $\left( s_j \right)_{j=1}^n$ :

 function permutation(k, s) {
    var int factorial:= 1;
    for j = 2  to length(s) {
       factorial := factorial* (j-1);
       swap s[j - ((k / factorial) mod j)] with s[j];
    }
    return s;
}

Notation

k / j denotes integer division of k by j, i.e. the integral quotient without any remainder, and
k mod j is the remainder following integer division of k by j.

So, let's see how this translates to Perl:


sub permute {
        my ($k,$s) = @_;
        my @list=@$s;
        my $factorial= 1;
        local $[=1;
        for(my $j=2; $j<=@list;++$j) {
                $factorial*=($j-1);
                my $i=$j-(int($k/$factorial)%$j);
                @list[$i,$j]=@list[$j,$i];
        }
        return @list;
}

Surprisingly, for me, that is, if you iterate over from 1 up to N!-1 rather than up to N! the missing permutation will be the original list itself. I wonder if this is a general property of this formulation/solution and how to prove it.

See:


for (1 .. 6) { # 3! equals 6...
        print permute($_,[qw(a b c)]),"\n";
}

will produce all permutations, while iterating up to 5 will not produce [a b c].

I'm still pondering on what kind of refactoring is required in order to turn my recursive implementation into an iterative one, without using algebric tricks like the above mentioned implementation.