Closed Bug 116882 Opened 23 years ago Closed 23 years ago

A middle dot character is not displayed on this page

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla0.9.9

People

(Reporter: momoi, Assigned: ftang)

References

(
URL
)

Details

(Keywords: intl)

Attachments

(2 files, 1 obsolete file)

This is an image which points to the problem character. Compare this to NN4 or IE 5/6 23 years ago Katsuhiko Momoi 111.67 KB, image/jpeg		Details
patch v1 23 years ago Frank Tang 4.08 KB, patch	shanjian : review+	Details \| Diff \| Splinter Review
patch v2 23 years ago Frank Tang 4.07 KB, patch	nhottanscp : review+ kinmoz : superreview+ roc : approval+	Details \| Diff \| Splinter Review

Katsuhiko Momoi

Reporter

Description

•

23 years ago

** Observed with 2001-12-22 Win32 trunk build **

On the above page, there is one character which is not displayed
properly with Mozilla under Shift_JIS encoding. 

It looks like the character has the codepoint 0x81.

(There is a similar bug filed -- Bug 116880. But in that bug the codepoint for
  the problem character is 0x86 0xA6.) 

Neither NN4 nor IE 5.5. has a propblem in displaying this character.

Teruko Kobayashi

Updated

•

23 years ago

Keywords: intl, nsbeta1

Roy Yokoyama

Comment 1

•

23 years ago

over to Mr.Li.

Assignee: yokoyama → shanjian

Shanjian Li

Comment 2

•

23 years ago

The character in question is 0x81, which is followed by 0x20. 0x8120 is not a 
legal sjis byte sequence. It is very strange to see that both IE and Netscape4.x 
replace such sequence to 0x8145, which is middle-dot. But anyway, I don't think 
this is a mozilla problem. I believe mozilla's behavior is better than both IE 
and Netscape4.x. Why replace illegal byte sequence to 0x8145? (I tried another 
byte sequence 0x8136, which was also replaced by 0x8145.)

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → WORKSFORME

Frank Tang

Assignee

Comment 3

•

23 years ago

sorry, I cannot tell which character you refere to.

Katsuhiko Momoi

Reporter

Comment 4

•

23 years ago

> I believe mozilla's behavior is better than both IE 
> and Netscape4.x. Why replace illegal byte sequence to 0x8145?

Windows applications when they use Windows OS converters
map this codepoint to the middle dot character. I am sorry
but this is expected on Windows. The character is apparently
fairly widely used -- right or wrong. If you use Notepad, 
Word, and other Windows applications, you see the same
character, not "not found" character as we do on Mozilla.

How are we going to convince Windows users that what they
see in every other application is wrong? 

Let me re-open this for re-consideration and let me provide
additional facts.

ftang: If you want to see which character we are referring to,
just open the URL with Mozilla and compare it with NN4 or IE5/6.
You will see one character with a question mark with Mozilla
but expressed with a middle-dot character in other browsers
and applications.

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Shanjian Li

Comment 5

•

23 years ago

Kat, I am not convinced yet. Is this kind of practice common? Did user 
do this intentionally? I mean when they put 0x81, what they want is mid dot?
If MS just take 0x81 and map it to mid dot, that will be easy to understand 
it as a "feature". But to map a range of code points to one character does not 
make much sense.  Can you tell me how such page is created?

Frank Tang

Assignee

Comment 6

•

23 years ago

momoi, please attach a screen shot here (and circle with mark) . I cannot see
that ? mark.

Katsuhiko Momoi

Reporter

Comment 7

•

23 years ago

In comment 5, shanjian said:

> Is this kind of practice common? Did user do this intentionally? 
...
>Can you tell me how such page is created? 

Yes, this is the question we should be asking before we
decide on this bug. Let me dig around a bit more before
making a decision one way or the other. I suspect this is
an intentional character.

Katsuhiko Momoi

Reporter

Comment 8

•

23 years ago

Attached image This is an image which points to the problem character. Compare this to NN4 or IE 5/6 — Details

Frank Tang

Assignee

Comment 9

•

23 years ago

let's try to fix SJIS to Unicode conversion to map 0x8120 to U+30fb so we have
backward compatability ?
reassign back to ftang and mark it as M1.0

Assignee: shanjian → ftang

Status: REOPENED → NEW

Target Milestone: --- → mozilla1.0

Shanjian Li

Comment 10

•

23 years ago

As I mentioned in my previous comment, at least 0x8120 and 0x8136 are mapped to 
u30fb. I believe all characters in 0x8120 to 0x813f are mapped to u30fb, probably
even larger. Adding such nonsense conversion just for this page does not make any 
sense, unless momoi's investigation show that this is a common practice and many 
webpages are doing it. In our charset detector, 0x8120 to 0x813f are illegal byte
sequence. That may confuse some users when they switch detector on and off.

nhottanscp

Comment 11

•

23 years ago

nsbeta1+ per i18n triage

Keywords: nsbeta1 → nsbeta1+

Frank Tang

Assignee

Comment 12

•

23 years ago

let's fix this.

Status: NEW → ASSIGNED

Frank Tang

Assignee

Comment 13

•

23 years ago

p3

Priority: -- → P3

Frank Tang

Assignee

Comment 14

•

23 years ago

move to m0.9.9

Target Milestone: mozilla1.0 → mozilla0.9.9

Frank Tang

Assignee

Comment 15

•

23 years ago

let's merge this bug into 116882. basically , we want compatible with IE6 on
error handling to reduce risk of site compatability.

What I found by looking at IE6 is the following
a. IE6 treat 0xfd - 0xff as single byte. and convert them into f8f1-f8f3. We
currently treat it as 2 bytes characters and convert to fffd
b. if a lead byte is legal shift jis range but the 2nd byte are illegal range,
IE 6 treat it as a two byte characters and convert to 30fb. we currently treat
it as single byte character and convert it to 0xfffd
c. for valid shift jis , if a character have no definitation . IE6 map it ot
30fb but we map to fffd

we need to fix all the three above so we have IE6 parity in error handling.

also, I wrote a cgi which generate legal shift according to the Nadin book also
invalide shift jis. I post in http://warp/u/ftang/utf8test/sjis.cgi
I will try to push it out to
http://people.netscape.com/ftang/testscript/sjis/sjis.cgi

Frank Tang

Assignee

Comment 16

•

23 years ago

*** Bug 116880 has been marked as a duplicate of this bug. ***

Frank Tang

Assignee

Comment 17

•

23 years ago

Attached patch patch v1 (obsolete) — Details — Splinter Review

Frank Tang

Assignee

Comment 18

•

23 years ago

add nhotta and shanjian to the list.

Shanjian Li

Comment 19

•

23 years ago

Comment on attachment 70437 [details] [diff] [review]
patch v1

r=shanjian,
(I suggest to remove the break in  original line 147.)

Attachment #70437 - Flags: review+

nhottanscp

Comment 20

•

23 years ago

+                   // IE convert fc-ff as single byte and convert to
+                   // U+f8f1 to U+f8f3
+                   if((0xfd == *src) || (0xfe == *src) || (0xff == *src))
+                   {
+                     *dest++ = (PRUnichar) 0xf8f1 + 
+                                   (*src - (unsigned char)(0xfd));

Does this mean, mapping like this? 
0xfd -> 0xf8f1
0xfe -> 0xf8f2
0xff -> 0xf8f3
But the comment says fc-ff (includes fc).

So is the IE6 behavior to map 0x30fb (the case c) specific to Shift_JIS or the
similiar behavior for EUC-JP?

Frank Tang

Assignee

Comment 21

•

23 years ago

>Does this mean, mapping like this? 
>0xfd -> 0xf8f1
>0xfe -> 0xf8f2
>0xff -> 0xf8f3
>But the comment says fc-ff (includes fc).
good catch, it is fd-ff not fc. sorry. I will change the comment 

>So is the IE6 behavior to map 0x30fb (the case c) specific to Shift_JIS or the
>similiar behavior for EUC-JP?
Not sure, need develope more test. Let's fix it one by one. 
open bug 127275  for EUC-JP issue.

Frank Tang

Assignee

Updated

•

23 years ago

Attachment #70437 - Attachment is obsolete: true

Frank Tang

Assignee

Comment 22

•

23 years ago

Attached patch patch v2 — Details — Splinter Review

Frank Tang

Assignee

Comment 23

•

23 years ago

nhotta or shanjian, please r=

nhottanscp

Comment 24

•

23 years ago

Comment on attachment 70941 [details] [diff] [review]
patch v2 

r=nhotta

Attachment #70941 - Flags: review+

Frank Tang

Assignee

Updated

•

23 years ago

Blocks: 104148

kinmoz

Comment 25

•

23 years ago

Comment on attachment 70941 [details] [diff] [review]
patch v2 

sr=kin@netscape.com

Attachment #70941 - Flags: superreview+

Frank Tang

Assignee

Updated

•

23 years ago

Blocks: 104060
No longer blocks: 104148

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

23 years ago

Comment on attachment 70941 [details] [diff] [review]
patch v2 

a=roc+moz for 0.9.9

Attachment #70941 - Flags: approval+

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

23 years ago

Keywords: mozilla0.9.9+

Frank Tang

Assignee

Comment 27

•

23 years ago

fixed and check in.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago → 23 years ago

Resolution: --- → FIXED

Frank Tang

Assignee

Updated

•

22 years ago

No longer blocks: 104060

Teruko Kobayashi

Updated

•

22 years ago

Status: RESOLVED → VERIFIED

Teruko Kobayashi

Comment 28

•

22 years ago

Verified as fixed in 0329 Win32 trunk and 0402 0.9.9ec Win32 build.

Simon Montagu :smontagu

Comment 29

•

16 years ago

Tests: http://hg.mozilla.org/mozilla-central/rev/fb086cc13695

Flags: in-testsuite+

You need to log in before you can comment on or make changes to this bug.