Stats explorer
Followup to #184 (closed)
We are close to having the following data collected everyday on REX-DRI:
-
How many users connected to the plateforme, with a 24h granularity.
-
Information about exchange related contributions with a 24h granularity.
Example data
Here is a preview how what the two tables might contain:
- ex:
id | date | nb_connections |
---|---|---|
1 | 2020-05-01T00:00:00:0001Z | 5 |
2 | 2020-05-02T00:00:00:0001Z | 10 |
3 | 2020-05-03T00:00:00:0001Z | 3 |
- ex:
id | date | major | minor | exhange_semester | university | nb_contributions |
---|---|---|---|---|---|---|
1 | 2020-05-01T00:00:00:0001Z | IM | CMI | P2020 | EPFL | 2 |
2 | 2020-05-01T00:00:00:0001Z | IM | MAT | A2020 | EPFL | 1 |
3 | 2020-05-03T00:00:00:0001Z | GI | FDD | A2020 | Tokyo | 1 |
N.B. 'university' will actually be a foreign key to the univ model in the database.
Objectives
Those tables enable us to answer the following questions:
- Questions:
- How many user connected to the site in the last 30 days?
- How many user connected to the site between date1 and date2?
- Can we plot the number of user that connected to site between date1 and date2?
- Etc.
- Questions (here there are a lot of possibilities -- plus either count or plot):
- How many exchange related contributions between date1 and date2?
- What is the distribution of contributions between majors?
- ... between minors inside a major?
- etc. etc. etc.
Technical feasibility
Request
In the frontend, we want to:
- Be able to plot data or display it as a table,
- Be able to export tables as CSV,
- Be able to filter data,
- Be able to "groupby" data,
- Be able to combine the last two settings in whatever configuration,
- Be able to share a reproductible page (filter, groupby, etc.) only with a URL.
Constraints
- There are a lot of filtering/groupby possibilities, our tool has to be generic enough to cover every possibility.
- The solution should be highly maintainable.
- We must have a strict control about what query is performed in the database.
- We can assume that the end user has some data wrangling knowledge.
- It should be implementable under ~2 weeks.
Conclusion
How to do it?