Skip to content

Update param_decode to actually transcode to utf-8#1000

Closed
kjg wants to merge 1 commit into
mikel:masterfrom
kjg:transcode_params
Closed

Update param_decode to actually transcode to utf-8#1000
kjg wants to merge 1 commit into
mikel:masterfrom
kjg:transcode_params

Conversation

@kjg

@kjg kjg commented May 10, 2016

Copy link
Copy Markdown
Contributor

No description provided.

Comment thread lib/mail/version_specific/ruby_1_9.rb Outdated
str = uri_parser.unescape(str)
str = charset_encoder.encode(str, encoding) if encoding
str
decoded = str.encode(Encoding::UTF_8, :invalid => :replace, :replace => "")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the Unicode replacement char rather than drop the character entirely.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I remember that we talked about this in #978

My end goal is that b_value_decode q_value_decode and param_decode all use the same method to encode to UTF-8 and that they all use the unicode replacement car. We can get there using whatever path you'd like, but my idea was to:

  1. add this to param_decode first
  2. refactor the three implementations down to 1 method
  3. change that one method to use the unicode replacement char

How do you feel about that approach? Do you have something different in mind?

@jeremy

jeremy commented May 10, 2016

Copy link
Copy Markdown
Collaborator

This changes the contract of param_decode. It attempts to respect the charset that's specified and map it to the corresponding Ruby string encoding. Now it transcodes from the specified charset to utf-8. Is this desirable for non-utf-8 values? Can we round-trip multilingual values?

@kjg

kjg commented May 10, 2016

Copy link
Copy Markdown
Contributor Author

Yes, it does effectively change the contract of param_decode, however I believe the current behaviour is unexpected. The method performs a very similar function and is used in the same context as q and b value decode. Both of those methods return the value transcoded from the parsed encoding specified into UTF-8.

param_decode is the method that ultimately produces the value returned by part.filename for example. All other similar parameter lookups like message.subject for example always return the value in UTF-8 and I think it is unexpected for part.filename to behave differently.

@kjg kjg force-pushed the transcode_params branch 2 times, most recently from 003b93d to 8343490 Compare May 11, 2016 02:01
This was referenced May 17, 2016
@kjg

kjg commented Jan 23, 2017

Copy link
Copy Markdown
Contributor Author

@jeremy Any guidance on how you'd like me to proceed here?

@jeremy jeremy added this to the 2.7.0 milestone Jan 27, 2017
@jeremy

jeremy commented Jan 27, 2017

Copy link
Copy Markdown
Collaborator

Merged @ b79fcd2 updated for #1002.

@jeremy jeremy closed this Jan 27, 2017
@kjg kjg deleted the transcode_params branch January 27, 2017 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants