-
Notifications
You must be signed in to change notification settings - Fork 105
Weak Arabic Name handling #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for all the info. I studied Arabic for many years so I'm familiar with some of this, but I hadn't gotten all that detail. For "Mohamad Zein El Dine", you could get the parser to handle this correctly by adding "al" and "el" to Interesting that you consider "Mohamad Ali" as your first name. Potentially delving into the difference between Arab and Muslim, but some Google-ing indicates there are people with the name "Mohamad Ali" that consider "Mohamad" their first name and "Ali" their last name, so we probably wouldn't want to make assumptions about that name by default. In the parser's terminology, The parser does not currently do any checking of substrings, it just matches the entire word parts (split on spaces). So, if we wanted to add that, it would be a new thing. There are other languages that also have specific suffixes on specific name parts, eg from #85 in Russian/Ukrainian they apparently have suffixes that help indicate middle names.
That's probably what we'd need to parse those name suffixes that are added to a word, "-eddine" and "-allah". (not to be confused with the parser's re: middle names, it seems like it would be pretty easy to implement some switch to turn off middle names. The parser does a much worse job with Chinese names than Arabic, but I believe they also don't have middle names and instead sometimes have longer family names, so it could probably be useful for other languages as well. |
The library does not handle Arabic names well, even the most common patterns. I'm no expert on the topic, but I'm Arabic and know the common patterns.
Compound Names
My first name is "Mohamad Ali", but the library identifies "Ali" as my middle name. Arabic full names of the form "Mohamad X Surname" are almost always meant to have "Mohamad X" as a first name (with exceptions such as when X is "El" or "Al", in which case the surname is compound with the first word being "El" or "Al"). Other exceptions are "Bin" (the library handles these correctly). Examples: Mohamad Khalil, Mohamad Amin, Mohamad Ali, Mohamad El Amin, Mohamad Bin Salman, etc...
Well-known Surname Suffixes
Some names like "Mohamad Zeineddine" can be written as "Mohamad Zein El Dine". Here the first name is Mohamad and the surname is "Zein El Dine" which is equivalent to "Zeineddine". "El Dine"/"eddine" is an extremely common suffix to have in Arabic surnames (e.g. Zeineddine, Alameddine, Charafeddine, Safieddine, Saifeddine, etc...). Other suffixes like "-allah"/"-ullah"/"-ollah" are extremely common as well (e.g., Nasrallah). This is to say that "El Dine" and "Allah" are almost always the 2nd part of a surname (at least one more word is needed on the left to complete the surname)
Middle names hardly exist
An Arabic-looking name is a good hint that there is no middle name. Arabic cultures adopt chaining of names instead of middle names (first name, followed by father's name, followed by father's father's name, etc..., and then the surname).
Edit: Honestly, the Wikipedia page discusses this really well https://en.wikipedia.org/wiki/Arabic_name
The text was updated successfully, but these errors were encountered: