Skip to content

Conversation

jirihnidek
Copy link
Contributor

}
if config.DefaultConfig.DataHost != "" {
URL.Host = config.DefaultConfig.DataHost
}

// Try to get content from given URL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is superfluous. The line below it clearly describes itself in its code language without the need for additional documentation. This might seem pedantic, but unnecessary comments contribute to the overall entropy of a codebase. In the future, if this section of code were to be refactored, the author would be expected to ensure that the comments are still accurate to the code they are commenting.

URL, err := url.Parse(string(data.Content))
// The type of data.Content is raw string, but it has to be JSON string fragment
// containing URL. It means that it has to be something like '"http://example.com"'
// and not only 'http://example.com'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is going to break existing message dispatch services that send a URL that isn't encapsulated in a JSON string fragment. I think we might have to somehow determine whether it's a raw URL or a URL encapsulated in a JSON string fragment. Maybe something like:

URL, err := url.Parse(string(data.Content))
if err != nil {
    var urlStr string
    err := json.Unmarshal(data.Content, &urlStr)
    if err != nil {
        return fmt.Errorf()
    }
    URL, err := url.Parse(urlStr)
    ...

Copy link
Contributor Author

@jirihnidek jirihnidek Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If URL is not encapsulated in a JSON string fragment or it isn't some JSON document, then it raises error here:

[yggd] 2024/03/04 16:26:25 /home/jhnidek/github/redhat-insights/yggdrasil/internal/transport/mqtt.go:72: cannot receive data message: cannot unmarshal data message: unexpected end of JSON input

So... I do not have any good idea how to be compatible with old services.

If we will have to break old services, what about requesting sending URL in more complex JSON document. It would be possible to specify method, HTTP headers, body, etc. Something like this:

{
  "url": "http://example.com/path/to/app",
  "method": "GET",
  "x-my-header": "foo"
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea; it streamlines all the assumptions that message content is raw JSON. But now I'm not so sure about this entirely; message content was meant to be opaque to yggdrasil. Now, with it being json.RawMessage, we're assuming message content is some JSON-serializable format. What about workers that want to send/receive BSON, or some other binary format? They'll be required to encode their content into something like base64 first. I need to step back and think about how yggdrasil 0.2 is handling this.

Plus, I'm not sure how we're going to handle this protocol break with existing services yet. Will they be forced to maintain logic that encodes the message content in JSON if the rhc version is greater than or equal to 0.3?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked at the yggdrasil-0.2 branch and it defines the yggdrasil.Data.Content field as json.RawMessage. Additionally, the code that handles the URL unmarshals the message content into a string:

				var urlString string
				if err := json.Unmarshal(data.Content, &urlString); err != nil {
					log.Errorf("cannot unmarshal message content: %v", err)
					return
				}
				URL, err := url.Parse(urlString)
				if err != nil {
					log.Errorf("cannot parse message content as URL: %v", err)
					return
				}

I think this means that we are not, in fact, breaking existing services after all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In experimenting with this PR, I don't think the requirement of having the
URL encapsulated in a string is required. Unmarshalling the content into a
string, as is done here in line 213, succeeds so long as the data can be
interpolated into a string, since a URL string can be, it succeeds. (I hope this
makes sense.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking myself in circles. I'll leave the comments above for posterity, but they're not accurate, nor factual. The code proposed by this PR is correct. The content must first be unmarshalled into a string. This is because all incoming content must be a JSON-serializable data type. In the case of these "remote content" workers, it is correct to require that the content be a URL string. It is an error to directly parse the content as a URL, since the content is typed as raw JSON. Existing services are not broken by this change either. They correctly set the value of the "content" field to a string containing a URL. It's the responsibility of the client to correctly unmarshal the content when handling these "remote content" workers.

I find it helpful to think of the json.RawMessage data type as "everything after the colon". Consider the following Go (play.go.dev link):

type Person struct {
	Name    string          `json:"name"`
	RawName json.RawMessage `json:"raw_name"`
}

func main() {
	jsonDocument := []byte(`{"name":"Testy McTestface","raw_name":"Testy McTestface II"}`)

	var person Person

	if err := json.Unmarshal(jsonDocument, &person); err != nil {
		fmt.Printf("cannot unmarshal JSON: %v", err)
	}

	fmt.Printf("%#v", person)
}

The "Name" field gets its value set to the string "Testy McTestface". The "RawName" field gets its value set to a byte slice that includes the leading and trailing quotations. This is evident in the output:

main.Person{Name:"Testy McTestface", RawName:json.RawMessage{0x22, 0x54, 0x65, 0x73, 0x74, 0x79, 0x20, 0x4d, 0x63, 0x54, 0x65, 0x73, 0x74, 0x66, 0x61, 0x63, 0x65, 0x20, 0x49, 0x49, 0x22}}

The byte 0x22 is ".

Tying this all back to the yggdrasil.Data.Content field. Because it is typed as a json.RawMessage, when unmarshalling it from a JSON document into a yggdrasil.Data value, the "Content" field gets its value set to "everything after the colon" (including the quotation marks). So in order to properly parse it as a URL, it must first be unmarshalled from its JSON string format into a Go string.

This is as much an explanation for myself as it is to anyone else who might read this PR later.

Now, this comment is not factually correct either. If we must comment this at all, I think it needs to read something along the lines of:

// Because the content field is typed as json.RawMessage, it must first be unmarshalled into a Go string before parsing as a URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subpop Hi, I updated the comment. Is there anything else that should be fixed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think this is good.

@jirihnidek jirihnidek force-pushed the jhnidek/fix_remote_content_mode branch from 33ab195 to 4b12a7e Compare March 6, 2024 14:16
* When worker was in remote content mode, then yggdrasil still
  expects raw data as valid JSON document. URL like:
  http://example.com/ is not valid JSON document. It could be
  e.g. "http://example.com/"
* Solves issue #208
@jirihnidek jirihnidek force-pushed the jhnidek/fix_remote_content_mode branch from 4b12a7e to 1fb9fd7 Compare March 7, 2024 09:49
@subpop subpop merged commit 0043989 into main Mar 8, 2024
@subpop subpop deleted the jhnidek/fix_remote_content_mode branch March 8, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants