Wednesday, January 31, 2007

XML processor attacks

I have posted a query to the xml-dev mailing list asking about a summary of XML parser attacks.

see: http://lists.xml.org/archives/xml-dev/200701/msg00343.html

So far I only got to hear about the "exploding entity": which is something you'd get when your XML instance contains some DTD declaration which defines entities in a recursive manner such that the final tokens that result from the definition never get recognized due to the left production nature of the definition. This results in a memory usage by the XML processor which might end up crashing it while parsing the XML instance trying to resolve the entity definition. This kind of an attack is usually being referred to as an XML Bomb.

Here's an example, extracted from Hardening Network Security, chapter 5:

<!DOCTYPE foobar [
<!ENTITY x0 "hello">
<!ENTITY x1 "&x0;&x0;">
<!ENTITY x2 "&x1;&x1;">
<!ENTITY x3 "&x2;&x2;">
<!ENTITY x4 "&x3;&x3;">
...
<!ENTITY x98 "&x97;&x97;">
<!ENTITY x99 "&x98;&x98;">
<!ENTITY x100 "&x99;&x99;">
]>
<foobar>&x100;</foobar>


A known buzzword is the XXE (Xml eXternal Entity) Attack: This is a fancy name for an attack on some application which parses XML, as part of its implementation, and is parsing XML data from some untrusted sources, which may lead to a denial of service (DoS) attack, exposure of sensitive information, or some other damage to the application or the infrastructure that it uses. This can happen, for example, when referring to some entity which is being defined as an access to some local file (e.g., some password file...). Processing of file inclusions and other attachments can be considered an XXE.


Another buzzword is XDoS: XML Denial of Service. This is a term to describe attacks on an XML parser which result in causing it to consume too much memory, slow down operations, or just work for nothing. It might also refer to cases where the DoS is on some other component of the application and the XML was the took for facilitating the attack.

Additional attacks are: signature redirects ...

Monday, January 29, 2007

So, you want to implement an XML Schema Processor?

I was recently asked to look into the design and later on the implementation of XML processors that are able to do XML parsing, XML Schema validation, XPath/XQuery queries, WSDL and SOAP analysis and enforcement, and much much more.

When one is required to implement a tool which implements complex and detailed standards, which rely on other complex standards, especially when short in time, one seeks to classify the features into two main categories: "frequently used" and "rarely used". The motivation for this classification is that "frequently used" features get to be considered and implemented first while the "rarely used" ones get time and attention later on.

Starting with XML Schema (of course with the aid of the very useful book Definitive XML Schema), I was faced with the need to perform such categorization. Not having enough time to properly read the standard throughout, read commentary about it in mailing lists such as the xml-dev mailing list, and complement the knowledge with explanations from books and of course from actual practice (examining freely available XML Schemas for example), I was forces to an ad-hoc approach.

I browsed the web for some available summary on XML Schema, hopefully, including the above mentioned categorization. To my happiness, I was successful. I came across the article Profiling XML Schema by by Paul Kiel, which was published on xml.com on September 20th, 2006.

My short term approach will be to try and confirm the results and conclusions that were presented in this article with several examples of WSDLs and XML Schemas available on the wild (e.g., Google's XML based interfaces for its services).

Let's see how it goes.

Saturday, January 27, 2007

I made it to the final in תחרות הפאדיחה

Apparently, my story made it to the final stage in the פאדיחה contest in Tapuz.

See: http://www.tapuz.co.il/tapuzforum/main/Viewmsg.asp?forum=149&msgid=93059365

Winter puddle/swamp in Netaniya


We drove to Netaniya's Winter puddle/swamp.

It is a nice area with Eucalyptus trees which, during rain season, becomes swampy.

There are ducks there, some parrots, some crows, and very nice wild flowers. Next to it there are several amusement facilities where the kids can play. We had fun.

See what I wrote about the place and about out good time there on a forum in Tapuz: חוויות בשלולית החורף כפי שפורסם בפורום הורים בתפוז
See pictures at: http://yeda.cs.technion.ac.il/~yona/aviv/2007/1.2007/

Friday, January 26, 2007

Perl::Critic static code for Perl based on Perl Best Practices

Check out the Perl::Critic module.
It claims to do "Critique Perl source code for best-practices ".
The best-practices are a-la
Damian Conway's book "Perl Best Practices"

There's also a web service at: http://perlcritic.com/

FireBug for FireFox

There's a nice addon/plugin for FireFox called FireBug, which can be used in order to get a nice view on the source of the current page that you're viewing, including HTML, CSS, JS, network statistics, including debugging, and a few more features.

See: http://www.getfirebug.com/

Thursday, January 25, 2007

Added a new book review on Amazon

I just finished writing a book review to Definitive XML Schema by Priscilla Walmsley.

You can read it along with other reviews I wrote at: http://www.amazon.com/gp/pdp/profile/A10UA4V0Z4691N

NIST announces competition for new cryptographic hash algorithm

NIST announces competition for new cryptographic hash algorithm. This sounds like a nice opportunity to go public with ideas you have for a new cryptographic hash function.
See: http://www.networkworld.com/news/2007/012307-nist-cryptographic-algorithm.html

A nice introduction to Finite-State Automata and Regular Expression with Critique about Regular Expression Engines' implementations

Russ Cox explains and demonstrates how good old theory put to practice results in good computer programs while ignoring good theory leads to bad programs.

See: Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) by Russ Cox


His regexp page contains more interesting information and implementations.

Weekly sys-admin tips for Linux and Windows

Martin Kroser, a friend of mine, opened a blog with weekly sys-admin tricks for Windows and Linux. See: http://windowsandlinuxweeklytips.blogspot.com/

Sunday, January 21, 2007

Gnosis -- a cool addon plugin for FireFox

From the documentation of the Gnosis plugin:

"With a single click, Gnosis will identify the people, companies, organizations, geographies and products on the page you are viewing."

Check out:ClearForest Gnosis

Companies involved in Natural Language Processing in Israel

Here are some companies that are involved in Natural Language Processing in Israel:

* Ligature Ltd
* Linguistic Agents Ltd
* infoneto
* peer39
* Semingo
* Information Retrieval Group at the IBM Haifa Research Labs
* Knowledge Center for Processing Hebrew, Technion, ITT.
* The Pudding
* ClearForest
* PureSight
* Topixa
* Targetize
* internative
* 2001
* Wizsoft
* Activepoint
* Holtran
* Idioma
* Persay
* Celebros
* Nielsen BuzzMetrics
* Melingo
* Invoke
* Nuance
* Effective-i
* Quigo
* Intuview
* Answers.com
* Pidgin
* Babylon
* Google
* Mercado
* White Smoke
* CubeEffect

My 2006 פדיחה is in the Semi-finals

Check out: תחרות הפאדיחה - חצי גמר 1

New books that I ordered

I ordered a few books from Amazon:

Saturday, January 20, 2007

How to transliterate Hebrew in Latin script?

Have you ever had a chance to see how road signs, street signs and Hebrew text (e.g., Hebrew names) in Israel are represented in Latin Script?

If so, you probably know that there is a problem because there is more than one way to represent the Hebrew text in Latin script.

Should the city פתח-תקוה be transliterated into Petach-Tiquwa or into Ptax-Tiqwa or into Petah-Tikva...?! (there are plenty more possibilities, and those driving to that city can laugh about it while viewing the road signs...).

Apparently there are too many ways, some of them are even documented. See for example: http://en.wikipedia.org/wiki/Romanization_of_Hebrew. Sadly, there are also some ad-hoc ones...

I find the suggestions made in ISO standard 259-3.1999 (Conversion of Hebrew Characters Into Latin Characters. Part 3: Phonemic Conversion. ISO/TC46/SC2) useful. However, this standard has not yet been adopted.

I recently saw an article by Prof. Uzzi Ornan, which was presented in SIGRTS in volume 13 number 1 in January 2007. Ornan discusses the problems that the Hebrew script poses and the resulting requirements on a Latin transliteration. He further lists the pros and cons of several alternatives and suggests an alternative that has more advantages then other suggested alternative.

I recommend reading this interesting text by Ornan: http://sigtrs.huji.ac.il/papers/131-ornan-072006.pdf

I also tried to start a discussion about this in Linguistics forum in Tapuz and in the Hebrew Language forum in Tapuz.


The Rule of Least Power

I read an interesting text about sharing information and a simple principle which should help facilitating information sharing: The Rule of Least Power by Tim Berners-Lee and Noah Mendelsohn (editors).

Here's the abstract of that text:

Abstract

When designing computer systems, one is often faced with a choice between using a more or less powerful language for publishing information, for expressing constraints, or for solving some problem. This finding explores tradeoffs relating the choice of language to reusability of information. The "Rule of Least Power" suggests choosing the least powerful language suitable for a given purpose.

Friday, January 19, 2007

Photography



You can see some of the photographs that I take on my flickr account.
I update it occasionally.

Check it out at http://flickr.com/photos/shlomoyona

The picture you see was taken in my last visit to Istanbul in December 2006.

Patent idea

For a few months now, I'm playing with an idea to a new process that helps monitor QA's tests coverage. After sharing my idea, and its applications, to a few colleagues at work, I was encouraged to either publish an academic paper about it or submit it as a patent.

Since my employer doesn't support, as far as I know, academic publications, and since I already experienced the process of submitting research work for refereed conferences and journals, I feel ready to explore the process of filing for a patent.

I wrote a letter today to my manager and to the VP R&D that I report to. In that letter I enclosed a short description of the idea and its application. Let's see how it proceeds from there on.

Introducing my family

Michal and I go way back... we're together since high school. We went together to the Technion for our B.Sc. degrees, stayed in Haifa (Ne$er, to be exact) for a few more years (Aviv was born, Michal did her M.Sc. and I started mine).

Then we moved to Hod-Ha$aron and a year later Sivan was born.

Two and a half months before we moved to our current home, in Kfar-Yona, Nir was born.

In the picture you can see, from left to right, Sivan, Michal, Nir and Aviv. I am missing from the picture, as I was busy taking it... :-)

My first blog post

I wanted to start using a blog a while ago but I never got to do it... well, until now.

This blog will probably serve for a variety of things that I'm involved in, such as (but not limited to):

* Photography
* Gardening
* Perl programming, especially events related to Israel.pm
* Interesting links
* Occasional posts on my family


Let's see how it goes...