Wednesday, January 31, 2007
So far I only got to hear about the "exploding entity": which is something you'd get when your XML instance contains some DTD declaration which defines entities in a recursive manner such that the final tokens that result from the definition never get recognized due to the left production nature of the definition. This results in a memory usage by the XML processor which might end up crashing it while parsing the XML instance trying to resolve the entity definition. This kind of an attack is usually being referred to as an XML Bomb.
Here's an example, extracted from Hardening Network Security, chapter 5:
<!DOCTYPE foobar [
<!ENTITY x0 "hello">
<!ENTITY x1 "&x0;&x0;">
<!ENTITY x2 "&x1;&x1;">
<!ENTITY x3 "&x2;&x2;">
<!ENTITY x4 "&x3;&x3;">
<!ENTITY x98 "&x97;&x97;">
<!ENTITY x99 "&x98;&x98;">
<!ENTITY x100 "&x99;&x99;">
A known buzzword is the XXE (Xml eXternal Entity) Attack: This is a fancy name for an attack on some application which parses XML, as part of its implementation, and is parsing XML data from some untrusted sources, which may lead to a denial of service (DoS) attack, exposure of sensitive information, or some other damage to the application or the infrastructure that it uses. This can happen, for example, when referring to some entity which is being defined as an access to some local file (e.g., some password file...). Processing of file inclusions and other attachments can be considered an XXE.
Another buzzword is XDoS: XML Denial of Service. This is a term to describe attacks on an XML parser which result in causing it to consume too much memory, slow down operations, or just work for nothing. It might also refer to cases where the DoS is on some other component of the application and the XML was the took for facilitating the attack.
Additional attacks are: signature redirects ...
Monday, January 29, 2007
When one is required to implement a tool which implements complex and detailed standards, which rely on other complex standards, especially when short in time, one seeks to classify the features into two main categories: "frequently used" and "rarely used". The motivation for this classification is that "frequently used" features get to be considered and implemented first while the "rarely used" ones get time and attention later on.
Starting with XML Schema (of course with the aid of the very useful book Definitive XML Schema), I was faced with the need to perform such categorization. Not having enough time to properly read the standard throughout, read commentary about it in mailing lists such as the xml-dev mailing list, and complement the knowledge with explanations from books and of course from actual practice (examining freely available XML Schemas for example), I was forces to an ad-hoc approach.
I browsed the web for some available summary on XML Schema, hopefully, including the above mentioned categorization. To my happiness, I was successful. I came across the article Profiling XML Schema by by Paul Kiel, which was published on xml.com on September 20th, 2006.
My short term approach will be to try and confirm the results and conclusions that were presented in this article with several examples of WSDLs and XML Schemas available on the wild (e.g., Google's XML based interfaces for its services).
Let's see how it goes.
Saturday, January 27, 2007
We drove to Netaniya's Winter puddle/swamp.
It is a nice area with Eucalyptus trees which, during rain season, becomes swampy.
There are ducks there, some parrots, some crows, and very nice wild flowers. Next to it there are several amusement facilities where the kids can play. We had fun.
See what I wrote about the place and about out good time there on a forum in Tapuz: חוויות בשלולית החורף כפי שפורסם בפורום הורים בתפוז
See pictures at: http://yeda.cs.technion.ac.il/~yona/aviv/2007/1.2007/
Friday, January 26, 2007
It claims to do "Critique Perl source code for best-practices ".
The best-practices are a-la Damian Conway's book "Perl Best Practices"
There's also a web service at: http://perlcritic.com/
Thursday, January 25, 2007
You can read it along with other reviews I wrote at: http://www.amazon.com/gp/pdp/profile/A10UA4V0Z4691N
A nice introduction to Finite-State Automata and Regular Expression with Critique about Regular Expression Engines' implementations
See: Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) by Russ Cox
His regexp page contains more interesting information and implementations.
Wednesday, January 24, 2007
Tuesday, January 23, 2007
Monday, January 22, 2007
Sunday, January 21, 2007
* Ligature Ltd
* Linguistic Agents Ltd
* Information Retrieval Group at the IBM Haifa Research Labs
* Knowledge Center for Processing Hebrew, Technion, ITT.
* The Pudding
* Nielsen BuzzMetrics
* White Smoke
- Compilers: Principles, Techniques, and Tools (2nd Edition)
- Advanced Compiler Design and Implementation
- Optimizing Compilers for Modern Architectures: A Dependence-based Approach
- Querying XML, : XQuery, XPath, and SQL/XML in context (The Morgan Kaufmann Series in Data Management Systems)
Saturday, January 20, 2007
If so, you probably know that there is a problem because there is more than one way to represent the Hebrew text in Latin script.
Should the city פתח-תקוה be transliterated into Petach-Tiquwa or into Ptax-Tiqwa or into Petah-Tikva...?! (there are plenty more possibilities, and those driving to that city can laugh about it while viewing the road signs...).
Apparently there are too many ways, some of them are even documented. See for example: http://en.wikipedia.org/wiki/Romanization_of_Hebrew. Sadly, there are also some ad-hoc ones...
I find the suggestions made in ISO standard 259-3.1999 (Conversion of Hebrew Characters Into Latin Characters. Part 3: Phonemic Conversion. ISO/TC46/SC2) useful. However, this standard has not yet been adopted.
I recently saw an article by Prof. Uzzi Ornan, which was presented in SIGRTS in volume 13 number 1 in January 2007. Ornan discusses the problems that the Hebrew script poses and the resulting requirements on a Latin transliteration. He further lists the pros and cons of several alternatives and suggests an alternative that has more advantages then other suggested alternative.
I recommend reading this interesting text by Ornan: http://sigtrs.huji.ac.il/papers/131-ornan-072006.pdf
I also tried to start a discussion about this in Linguistics forum in Tapuz and in the Hebrew Language forum in Tapuz.
Here's the abstract of that text:
When designing computer systems, one is often faced with a choice between using a more or less powerful language for publishing information, for expressing constraints, or for solving some problem. This finding explores tradeoffs relating the choice of language to reusability of information. The "Rule of Least Power" suggests choosing the least powerful language suitable for a given purpose.
Friday, January 19, 2007
Since my employer doesn't support, as far as I know, academic publications, and since I already experienced the process of submitting research work for refereed conferences and journals, I feel ready to explore the process of filing for a patent.
I wrote a letter today to my manager and to the VP R&D that I report to. In that letter I enclosed a short description of the idea and its application. Let's see how it proceeds from there on.
Then we moved to Hod-Ha$aron and a year later Sivan was born.
Two and a half months before we moved to our current home, in Kfar-Yona, Nir was born.
In the picture you can see, from left to right, Sivan, Michal, Nir and Aviv. I am missing from the picture, as I was busy taking it... :-)
This blog will probably serve for a variety of things that I'm involved in, such as (but not limited to):
* Perl programming, especially events related to Israel.pm
* Interesting links
* Occasional posts on my family
Let's see how it goes...