Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram #4426

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Histogram #4426

wants to merge 4 commits into from

Conversation

deecay
Copy link
Contributor

@deecay deecay commented Dec 8, 2019

What type of PR is this? (check all applicable)

  • Feature

Description

New visualization: Histogram by Plotly
No tests ready yet.

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

Example1: https://deploy-preview-4426--redash-preview.netlify.com/queries/270#493
Example2: https://deploy-preview-4426--redash-preview.netlify.app/queries/224#955

image

image

image

image

Copy link
Member

@arikfr arikfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @deecay! I'm sure this will make some people very happy :-)

Considering this is less likely to be used alongside other chart types, how about we create a dedicated visualization type for it with its own settings tab? I think it will reduce confusion on how to use it and what are the options that are available.

@deecay
Copy link
Contributor Author

deecay commented Dec 13, 2019

Could you advise what level of separation from chart visualization is necessary?

For example, I have added function prepareHistogramSeries(series, options) to chart/plotly/prepareDefaultData.js. Should I leave this function and the related code in the file here? Should I copy the whole file to histogram folder and leave the chart folder alone? If so, should I erase everything unrelated to histogram once I copy the file to histogram folder? Would it be better to move things to some common library?

@kravets-levko
Copy link
Collaborator

@arikfr Why do you think that histogram series cannot be used with other series types? Brief googling found pictures like this one:

image

@arikfr
Copy link
Member

arikfr commented Jan 21, 2020

@kravets-levko it looks like a line representation of the histogram. Can you think of a real use case? @deecay do you have a use case for mixing them?

@deecay
Copy link
Contributor Author

deecay commented Jan 21, 2020

@arikfr The curved line over histograms are density curves, which is a fairly common companion of the bar-histogram. The curve is a different representation of the same data set. I can't think of any other chart type mix to be honest.

@arikfr
Copy link
Member

arikfr commented Jan 22, 2020

Could you advise what level of separation from chart visualization is necessary?

The most important part is to have a dedicated interface for the Histogram visualization. The implementation can still share code with charts if it makes sense and/or makes code simpler.

Copy link
Collaborator

@kravets-levko kravets-levko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @deecay! Can you please resolve conflicts - I'd like to review it once more, and I guess we'll merge it. Thanks!

client/app/visualizations/chart/Editor/XAxisSettings.jsx Outdated Show resolved Hide resolved
@kravets-levko kravets-levko self-requested a review October 16, 2020 10:26
@deecay
Copy link
Contributor Author

deecay commented Oct 19, 2020

@kravets-levko

Resolved conflict and incorporated your point on utility function.

One thing I could not resolve is the issue around automatic update of the BinSize and BinStart. When an input value for these two inputs gets deleted in the editor, it should redraw the chart with default value but it doesn't. I've played around with debounce but I couldn't get it to work. Any help would be welcomed.

@kravets-levko
Copy link
Collaborator

Thank you @deecay! I'll check the issue with inputs, hope there's nothing serious. One more thing I'd like to revive in this discussion - having a new Histogram visualization vs. Histogram chart type (within existing Chart viz) - what's your point on this?

@deecay
Copy link
Contributor Author

deecay commented Oct 19, 2020

Hi @kravets-levko

On chart types:
About a year ago I have tried to implement plotly charts (treemap and radar) using "new Visualization" method.
It was difficult. What I realized was that the current structure of (/chart/ and /plotly/) functions was not easy to re-use in a new Visualization. I had to copy decent amount of code from the chart visualization, such as chart resizing, chart refreshing, Data Labels Editor interface, etc, and then tweak it. It made me want to just copy the chart visualization entirely and modify that copied code. But this method obviously decreases maintainability.

Ideally, we could refactor the code so that adding a new plotly-based visualization can be done with relatively small effort, avoiding duplicate code. This PR, the existing Chart viz method, has just +100 lines of code, with almost no duplicate code. So this is my preferred method for now.

If you could show me and example and/or mentor me, I might be able to see things differently.

Eager to hear your thoughts.

placeholder="Auto"
data-test="Chart.XAxis.BinSize"
defaultValue={options.binSize}
onChange={binSize => onOptionsChange({ binSize: cleanNumber(binSize) })}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kravets-levko using toNumber() from AxisSettings resolves the non-updating issue mentioned earlier. Should I move the toNumber function as a common function and use it here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sounds good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f1a8fe6

@borsi
Copy link

borsi commented Mar 22, 2021

This feature is so close to completion, I can feel it. :)

@mmcdermott
Copy link

Any ETA on this?

@susodapop
Copy link
Contributor

Currently only merging fixes directly related to the V10 beta. We will pipeline this for merge after the V10 release next month.

@gaganc-on
Copy link

Any updates on this

@susodapop susodapop added the Team Review Meets PR criteria, ready for full review label Apr 15, 2022
@thrau
Copy link
Contributor

thrau commented Jul 24, 2022

would be great if this would make it into a release!

@susodapop
Copy link
Contributor

Thanks for the ping! Will aim to merge and include this in V11 this summer.

@beknazar
Copy link

@susodapop Any news on this PR? This would be a great feature

@nathanlmeyers
Copy link

@susodapop yeah just wanted to bump this too!

@fibr
Copy link

fibr commented Mar 21, 2023

Bump. This feature would be very useful.

@guidopetri
Copy link
Contributor

@deecay , thanks for the PR! We've updated a lot of things now that we're Community-driven so - if you're still interested in getting this merged - would you mind rebasing off master to re-run the CI, as well as updating merge conflicts?

We're trying to clean up our PR todo list, so if you're not interested, that's fine - we'll close the PR in about a week if we don't hear back. If you're interested in reopening the PR afterwards, we would also very much welcome that.

@justinclift
Copy link
Member

Yeah, this looks like it'd be really useful. Hopefully it gets rebased. 😄

@eradman
Copy link
Collaborator

eradman commented Aug 29, 2023

@deecay do you have time to rebase this change? There are quite a few conflicts, but all of them appear to be easy to resolve

@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

Merging #4426 (7f9287a) into master (650ec90) will increase coverage by 0.03%.
Report is 12 commits behind head on master.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4426      +/-   ##
==========================================
+ Coverage   61.07%   61.10%   +0.03%     
==========================================
  Files         157      158       +1     
  Lines       12834    12883      +49     
  Branches     1747     1753       +6     
==========================================
+ Hits         7838     7872      +34     
- Misses       4755     4765      +10     
- Partials      241      246       +5     

see 4 files with indirect coverage changes

@deecay
Copy link
Contributor Author

deecay commented Sep 18, 2023

@guidopetri @justinclift
Rebased.

@justinclift
Copy link
Member

Awesome! 😄

@eradman Any interest in reviewing this PR?

@eradman
Copy link
Collaborator

eradman commented Sep 18, 2023

Awesome! 😄

@eradman Any interest in reviewing this PR?

Yes, I'll take this change for a test drive

@eradman
Copy link
Collaborator

eradman commented Sep 18, 2023

I ran into some trouble with larger data sets on Chrome and Firefox. With as few as 500 entries both browsers would spin at 90% CPU for several minutes.

Now that I spent some time with this I feel that the "binning" function should be more generic. In SQL we can bin values in may ways. For example this bins by powers of 2

WITH series AS (
    SELECT
      power(2, series-1)::int AS lo,
      power(2, series)::int AS hi
    FROM generate_series(0, 10) AS series
)
SELECT count(files), lo || '-' || hi AS range
FROM dir_stats
JOIN series ON (files >= series.lo AND files < series.hi)
GROUP BY series, lo, hi
ORDER BY lo

This could be easily adapted to date ranges, range categories, and so on.

A bar chart works fine, except that I do not believe that Redash supports multiple series per graph yet, so perhaps this is the feature we should focus on.

Secondarily, binning options could be added, but I feel that would be somewhat complicated to support multiple scales and data types.

@deecay
Copy link
Contributor Author

deecay commented Sep 18, 2023

@eradman

I ran into some trouble with larger data sets on Chrome and Firefox. With as few as 500 entries both browsers would spin at 90% CPU for several minutes.

Does the CPU spin for test deployed instance too? (2000 entries)
https://deploy-preview-4426--redash-preview.netlify.com/queries/270#493

@eradman
Copy link
Collaborator

eradman commented Sep 18, 2023

I ran into some trouble with larger data sets on Chrome and Firefox. With as few as 500 entries both browsers would spin at 90% CPU for several minutes.

Does the CPU spin for test deployed instance too? (2000 entries) https://deploy-preview-4426--redash-preview.netlify.com/queries/270#493

I seem to have no trouble with that demo. I'll try rebuilding my dev environment to see if this is repeatable

@eradman
Copy link
Collaborator

eradman commented Sep 19, 2023

Tried cleaning my test env (git clean, docker rm -f ...) and rebuilding dependencies I still hit the same problem with CPU. Is the demo link on netlify.com up to date?

What do the rest of you think?

My thoughts:

  1. The histogram feature should be a data filter, not a new graph type (Apologies in an advance! I know this contradicts early feedback)
  2. Binning on the client side won't scale very well, so this transform should happen on the server-side (this is the current limitation of pivot tables)
  3. The X axis should be able to handle dates and numbers

@deecay
Copy link
Contributor Author

deecay commented Sep 20, 2023

@eradman

Is the demo link on netlify.com up to date?

I think it is. And I think I've found the reason for the high CPU usage.

This problem happens when the X-axis data is specified as, or looks like, a date. In these cases plotly treats x-axis as type: date. The binning function in histogram treats the value of bin size differently between number axis and dates axis. If the x-axis is numbers, size: 1 means 1.0. But if x-axis is date, then size: 1 means 1ms (!). Therefore, when you have a data with timeseries with range scale of months or years, plotly tries to bin that data with 1ms-bin and gets overloaded.

I will look into what I can do with date binning. (your thought #3)

@deecay
Copy link
Contributor Author

deecay commented Oct 6, 2023

@eradman

I have added some features.

  1. Chart will not render if xAxis bin size is "too small". Here, "too small" means that the resulting bin count will be > 10000.
  2. xAxis bin size now works for date axis, too. Possible values for date axis bin size are:
    a. number in ms (86,400,000 will represent a day (1000 * 60 * 60 * 24) )
    b. "M1", "M2", etc. where M1 is one month, M2 is two months.
    See Example2: https://deploy-preview-4426--redash-preview.netlify.app/queries/224#955

@deecay deecay requested a review from eradman November 27, 2023 02:11
@denisov-vlad
Copy link
Member

@eradman @justinclift Could you review it please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.