Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved rhat diagnostic #3266

Merged
merged 36 commits into from
Apr 23, 2024
Merged

Improved rhat diagnostic #3266

merged 36 commits into from
Apr 23, 2024

Conversation

aleksgorica
Copy link
Collaborator

@aleksgorica aleksgorica commented Feb 6, 2024

Submission Checklist

  • Run unit tests: ./runTests.py src/test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary

Solving issue #3269
Changes in compute_potential_scale_reduction, based on Vehtari
Rhat computattion moved in rhat function
Added function rank_transform
Changed tests values to match new output

Intended Effect

rank_transform: Computes normalized average ranks for draws. Transforming them to normal scores using inverse normal transformation and a fractional offset.

rhat: computes rhat like before, the computatoinal part is just moved in a new function.

compute_potential_scale_reduction: copies draws in matrix object, then computes bulk rhat and tail rhat and returns the maximum

How to Verify

Compare the results with Arviz rhat function
arviz

Side Effects

Documentation

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Aleks Stepančič

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@bob-carpenter
Copy link
Contributor

Hi, @aleksgorica and welcome to the Stan project.

In terms of process, there should be an issue specifying the feature which this PR addresses. Please add an issue.

I would also suggest not removing the old functionality but instead just adding new functionality for the ranked version. That way, it won't break backward compatibility when it's added.

When it's ready to review for inclusion, please ping me and I can do it.

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving comments for now. Not sure if we want to modify this code directly or have a new function

Comment on lines 30 to 32
int rows = draws.rows();
int cols = draws.cols();
int size = rows * cols;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fyi Eigen types have use index values of Eigen::Index and std vectors use std::size_t

Generally for sizes that we know will not change we make them const

Suggested change
int rows = draws.rows();
int cols = draws.cols();
int size = rows * cols;
const Eigen::Index rows = draws.rows();
const Eigen::Index cols = draws.cols();
const Eigen::Index size = rows * cols;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and change rest of places that use int to use Eigen::Index for storing eigen matrix sizes)

Comment on lines 37 to 43
for (int col = 0; col < cols; ++col) {
for (int row = 0; row < rows; ++row) {
int index
= col * rows + row; // Calculating linear index in column-major order
valueWithIndex[index] = {draws(row, col), index};
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For column major Eigen matrices you can just iterate through each column by row using the operator(Eigen::Index). Also note we use snake_case for objects and CameCase for template parameters

Suggested change
for (int col = 0; col < cols; ++col) {
for (int row = 0; row < rows; ++row) {
int index
= col * rows + row; // Calculating linear index in column-major order
valueWithIndex[index] = {draws(row, col), index};
}
}
for (Eigen::Index i = 0; i < size; ++i) {
value_with_index[index] = {draws(i), index};
}

Comment on lines 63 to 69
for (int k = i; k < j; ++k) {
int index = valueWithIndex[k].second;
int row = index % rows; // Adjusting row index for column-major order
int col = index / rows; // Adjusting column index for column-major order
double p = (avgRank - 3.0 / 8.0) / (size - 2.0 * 3.0 / 8.0 + 1.0);
rankMatrix(row, col) = boost::math::quantile(dist, p);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to do the index without calculating rows / cols here like above

int rows = draws.rows();
int cols = draws.cols();
int size = rows * cols;
Eigen::MatrixXd rankMatrix = Eigen::MatrixXd::Zero(rows, cols);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We like to declare things as near as possible to where they are used so I'd move this down to right before the if

Comment on lines 50 to 58
int j = i;
double sumRanks = 0;
int count = 0;

while (j < size && valueWithIndex[j].first == valueWithIndex[i].first) {
sumRanks += j + 1; // Rank starts from 1
++j;
++count;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this while loop while always go off at least once, can you start j at j = i + 1 etc to avoid this? Then that while loop only happens for cases of duplicates

Comment on lines 76 to 77
* Computes square root of marginal posterior variance of the estimand by
* weigted average of within-chain variance W and between-chain variance B.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Computes square root of marginal posterior variance of the estimand by
* weigted average of within-chain variance W and between-chain variance B.
* Computes square root of marginal posterior variance of the estimand by the
* weighted average of within-chain variance W and between-chain variance B.

Comment on lines 92 to 104
for (int chain = 0; chain < num_chains; ++chain) {
boost::accumulators::accumulator_set<
double, boost::accumulators::stats<boost::accumulators::tag::mean,
boost::accumulators::tag::variance>>
acc_draw;
for (int n = 0; n < num_draws; ++n) {
acc_draw(draws(n, chain));
}
chain_mean(chain) = boost::accumulators::mean(acc_draw);
acc_chain_mean(chain_mean(chain));
chain_var(chain)
= boost::accumulators::variance(acc_draw) * unbiased_var_scale;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like you need an online way to update and view the current mean and variance. You can calculate the mean for each chain via draws.colwise().mean() and draws.array().mean() for the overall mean. Then a loop for the chain variance calculation.

Now that I'm reading this again should that be named `acc_chain_variance because of line 105? I've never used boost accumulators before so not sure if I'm following all the way here

*
*/

Eigen::MatrixXd rank_transform(const Eigen::MatrixXd& draws) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the error on jenkins. This makes it so that there are not multiple definitions for different translation units

Suggested change
Eigen::MatrixXd rank_transform(const Eigen::MatrixXd& draws) {
inline Eigen::MatrixXd rank_transform(const Eigen::MatrixXd& draws) {

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
arma/arma.stan 0.22 0.2 1.13 11.26% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.01 0.01 0.92 -8.7% slower
gp_regr/gen_gp_data.stan 0.02 0.02 0.97 -2.6% slower
gp_regr/gp_regr.stan 0.11 0.11 0.98 -1.79% slower
sir/sir.stan 80.14 78.73 1.02 1.76% faster
irt_2pl/irt_2pl.stan 4.27 3.94 1.08 7.62% faster
eight_schools/eight_schools.stan 0.06 0.05 1.05 4.36% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.26 0.26 1.01 1.3% faster
pkpd/one_comp_mm_elim_abs.stan 18.89 18.45 1.02 2.3% faster
garch/garch.stan 0.49 0.46 1.07 6.46% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.87 2.83 1.01 1.44% faster
arK/arK.stan 1.67 1.65 1.01 1.07% faster
gp_pois_regr/gp_pois_regr.stan 2.59 2.5 1.03 3.35% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.41 9.21 1.02 2.2% faster
performance.compilation 176.71 180.14 0.98 -1.94% slower
Mean result: 1.0212747855344297

Jenkins Console Log
Blue Ocean
Commit hash: f70fb84db501edcbf822691c02664c43b76287b2


Machine information No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 3351.692
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good! The suggested changes mostly cover some C++ things, some Qs about the codes math, and handling some edge cases.

As a first PR this is very good so far!

*/
inline double compute_potential_scale_reduction_rank(
std::vector<const double*> draws, std::vector<size_t> sizes) {
int num_chains = sizes.size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::vector's size type is std::size_t

Suggested change
int num_chains = sizes.size();
std::size_t num_chains = sizes.size();

or use auto

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is only the first size checked to see if it is zero and then we return a NaN? If one chain failed we should still be able to use information from all of the other chains. Looking at the rest of the code, unless there is a math reason to not ignore zero sized chains I think we should just prune them

std::vector<const double*> nonzero_chains_begins;
std::vector<std::size_t> nonzero_chain_sizes;
for (int i = 0; i < chain_sizes.size(); ++i) {
  if (!chain_sizes[i]) {
    nonzero_chains_begin.push_back(chain_begins[i]);
    nonzero_chains_sizes.push_back(chain_sizes[i]);
  }
}
if (!nonzero_chains_sizes.size()) {
  return std::numeric_limits<double>::quiet_NaN();
}

Comment on lines 112 to 115
size_t num_draws = sizes[0];
if (num_draws == 0) {
return std::numeric_limits<double>::quiet_NaN();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should num_draws be min_num_draws since it's the minimum number of draws received from each chain?

* @return potential scale reduction for the specified parameter
*/
inline double compute_potential_scale_reduction_rank(
std::vector<const double*> draws, std::vector<size_t> sizes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename these to be a little more clear. begin() is a function in standard library containers that means "an iterator pointing to the first element of a container" so using begin in the name here will signal to people "these pointers are the pointers to the first element of each chain"

Suggested change
std::vector<const double*> draws, std::vector<size_t> sizes) {
std::vector<const double*> chain_begins, std::vector<size_t> chain_sizes) {

Comment on lines 141 to 147
if (are_all_const) {
// If all chains are constant then return NaN
// if they all equal the same constant value
if (init_draw.isApproxToConstant(init_draw(0))) {
return std::numeric_limits<double>::quiet_NaN();
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is fine if each chain is constant, but each one is a different value? tbc I'm asking because idk if that is how the paper is written or not. I suppose this makes sense in the case of many short chains

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. The current implementation fails if different chains are constant. For example, C1: [1, 1, 1]; C2: [2, 2, 2] would have a within-variance of 0, and the rhat function would return inf due to division by zero. I think the best way to correct this is to check if there exists a non-constant chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing something intelligent around constant chains would be a big improvement on our current NaN behavior. But I'm not sure what that is as there's not a number that makes sense as the ESS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all chains have the same constant value, we can't make the difference between all chains being stuck or variable actually being constant (e.g. diagonal of correlation matrix) as Stan doesn't tag the variables. In that case diagnostics in R return NA. If the chains have different constant values, then the variable can't be a true constant, and Rhat Inf is fine.

}
}

Eigen::MatrixXd matrix(num_draws, num_chains);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Eigen::MatrixXd matrix(num_draws, num_chains);
Eigen::MatrixXd draws_matrix(num_draws, num_chains);

double rhat_tail = rhat(rank_transform(
(matrix.array() - math::quantile(matrix.reshaped(), 0.5)).abs()));

return std::max(rhat_bulk, rhat_tail);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for @avehtari

Do we want to just return the max or should we return a pair so the user can see the bulk and tail rhats?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is useful for the user to know both. A diagnostic message could be simplified by reporting if the max of these is too low, but otherwise I would prefer that both would be available for the user. Making them both available does change the io via csv and changing csv structures need to be considered carefully

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then @aleksgorica can you have this return back an std::pair?

Comment on lines 296 to 297
inline double compute_split_potential_scale_reduction_rank(
std::vector<const double*> draws, std::vector<size_t> sizes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want these arguments to come in as constant references. As written this will make a hard copy of the input vectors when you call this function. Making the arguments references (&) means the function will just use the already existing object without making a copy and const means the arguments will be constant in the function (i.e. we will not modify them)

Suggested change
inline double compute_split_potential_scale_reduction_rank(
std::vector<const double*> draws, std::vector<size_t> sizes) {
inline double compute_split_potential_scale_reduction_rank(
const std::vector<const double*>& draws, const std::vector<size_t>& sizes) {

We want containers (like std::vector or Eigen::MatrixXd or std::map) to be passed by constant reference. Small types such as double, int, std::size_t etc. can be passed by value as making copies of them is trivial. Happy to explain this more if you like but don't want to overload you with info. A nice place to read about things like this is Scott Meyers "Effective Modern C++" which if you google should be easy to find a free copy of online.

This comment applies to all the function signatures you added here

Comment on lines 288 to 290
* Current implementation assumes draws are stored in contiguous
* blocks of memory. Chains are trimmed from the back to match the
* length of the shortest chain.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use split_chains which also assumes each chain is the same length

Comment on lines 306 to 307
double half = num_draws / 2.0;
std::vector<size_t> half_sizes(2 * num_chains, std::floor(half));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it more clear you are using floating point division and then taking the floor to get the index

Suggested change
double half = num_draws / 2.0;
std::vector<size_t> half_sizes(2 * num_chains, std::floor(half));
std::size_thalf = std::floor(num_draws / 2.0);
std::vector<size_t> half_sizes(2 * num_chains, half);

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
arma/arma.stan 0.25 0.23 1.1 9.39% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.01 0.02 0.62 -60.65% slower
gp_regr/gen_gp_data.stan 0.02 0.02 0.98 -1.8% slower
gp_regr/gp_regr.stan 0.12 0.13 0.93 -7.72% slower
sir/sir.stan 87.69 85.71 1.02 2.25% faster
irt_2pl/irt_2pl.stan 4.56 4.42 1.03 3.16% faster
eight_schools/eight_schools.stan 0.06 0.07 0.83 -20.07% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.28 0.26 1.08 7.16% faster
pkpd/one_comp_mm_elim_abs.stan 20.41 20.05 1.02 1.78% faster
garch/garch.stan 0.59 0.51 1.15 13.25% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.38 2.99 1.13 11.56% faster
arK/arK.stan 1.83 1.75 1.05 4.45% faster
gp_pois_regr/gp_pois_regr.stan 2.82 2.66 1.06 5.78% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 10.38 10.44 0.99 -0.55% slower
performance.compilation 209.61 208.84 1.0 0.37% faster
Mean result: 1.0006665985989671

Jenkins Console Log
Blue Ocean
Commit hash: 728ec0a53bd4cce07bf159a426e13512de0b0263


Machine information No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
arma/arma.stan 0.2 0.24 0.86 -16.9% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.01 0.01 1.09 8.21% faster
gp_regr/gen_gp_data.stan 0.02 0.02 1.08 7.31% faster
gp_regr/gp_regr.stan 0.11 0.1 1.06 5.56% faster
sir/sir.stan 77.99 75.3 1.04 3.45% faster
irt_2pl/irt_2pl.stan 3.85 3.74 1.03 2.7% faster
eight_schools/eight_schools.stan 0.05 0.05 1.03 2.73% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.25 0.24 1.01 1.24% faster
pkpd/one_comp_mm_elim_abs.stan 18.27 17.61 1.04 3.62% faster
garch/garch.stan 0.46 0.44 1.04 3.64% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.79 2.75 1.02 1.63% faster
arK/arK.stan 1.63 1.59 1.02 2.16% faster
gp_pois_regr/gp_pois_regr.stan 2.54 2.45 1.04 3.65% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.07 8.92 1.02 1.65% faster
performance.compilation 179.41 180.07 1.0 -0.37% slower
Mean result: 1.0234405050935638

Jenkins Console Log
Blue Ocean
Commit hash: 0fcf10855f923eb24c8e9958f1f19fde97572810


Machine information No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@SteveBronder
Copy link
Collaborator

@aleksgorica this code all looks good! The only thing to fix now is the API. We want to keep all the old code and functions the same for backwards compatibility and have all of this in new functions. For instance compute_potential_scale_reduction() should have the same results as previously.

So the new code you have in compute_potential_scale_reduction_rank etc. is good and you can revert your changes to compute_potential_scale_reduction. New code in the future can use your compute_potential_scale_reduction_rank for the new estimate.

Once that is changed then I think you are good to merge!

@aleksgorica
Copy link
Collaborator Author

@aleksgorica this code all looks good! The only thing to fix now is the API. We want to keep all the old code and functions the same for backwards compatibility and have all of this in new functions. For instance compute_potential_scale_reduction() should have the same results as previously.

So the new code you have in compute_potential_scale_reduction_rank etc. is good and you can revert your changes to compute_potential_scale_reduction. New code in the future can use your compute_potential_scale_reduction_rank for the new estimate.

Once that is changed then I think you are good to merge!

Okay, I hope I have understood correctly, I have just reverted the changes in the original non-rank functions to the previous code. However, I would like to know why that is necessary since I believe the new code yields the same results as the previous code according to the tests, and it is also a bit better written.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
arma/arma.stan 0.26 0.19 1.35 25.96% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.01 0.01 1.08 7.44% faster
gp_regr/gen_gp_data.stan 0.02 0.02 1.04 4.04% faster
gp_regr/gp_regr.stan 0.11 0.1 1.04 3.42% faster
sir/sir.stan 78.12 75.68 1.03 3.12% faster
irt_2pl/irt_2pl.stan 3.87 3.93 0.99 -1.51% slower
eight_schools/eight_schools.stan 0.05 0.05 1.0 0.17% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.25 0.25 0.99 -0.58% slower
pkpd/one_comp_mm_elim_abs.stan 18.03 18.27 0.99 -1.28% slower
garch/garch.stan 0.45 0.46 0.98 -2.51% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.78 2.83 0.98 -1.99% slower
arK/arK.stan 1.64 1.69 0.97 -3.19% slower
gp_pois_regr/gp_pois_regr.stan 2.5 2.61 0.96 -4.57% slower
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.11 9.43 0.97 -3.52% slower
performance.compilation 178.35 179.15 1.0 -0.45% slower
Mean result: 1.0234807232212786

Jenkins Console Log
Blue Ocean
Commit hash: ffae22a76851e3b49ca43f31234eeada692f1d9b


Machine information No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@bob-carpenter
Copy link
Contributor

Okay, I hope I have understood correctly, I have just reverted the changes in the original non-rank functions to the previous code. However, I would like to know why that is necessary since I believe the new code yields the same results.

There's no need to keep the old code, but we need to keep the old interfaces so we don't break anyone's existing code. The new code should be doing ranked R-hat, right? That won't provide the same answers. If you wrote general enough code to do both, you can have the old interface delegate to the new code. Then we can rewrite the interfaces to use the new code and get rid of the old code. @SteveBronder will know more about the specifics here.

@SteveBronder
Copy link
Collaborator

@bob-carpenter I rewrote this to dispatch to the old cold from the previous API

@aleksgorica there's one signature missing for split_potential_scale_reduction_rank so it matches the API of split_potential_scale_reduction. See below. Can you add that signature and have the current split_potential_scale_reduction signature call your new split_potential_scale_reduction_rank returning bulk rhat?

https://github.com/stan-dev/stan/blob/develop/src/stan/mcmc/chains.hpp#L219

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
arma/arma.stan 0.22 0.2 1.09 8.35% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.01 0.01 0.95 -5.09% slower
gp_regr/gen_gp_data.stan 0.02 0.02 0.99 -0.75% slower
gp_regr/gp_regr.stan 0.12 0.11 1.06 5.76% faster
sir/sir.stan 81.19 82.66 0.98 -1.81% slower
irt_2pl/irt_2pl.stan 4.15 4.03 1.03 2.92% faster
eight_schools/eight_schools.stan 0.05 0.05 1.01 0.63% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.25 0.25 1.01 0.65% faster
pkpd/one_comp_mm_elim_abs.stan 18.61 18.33 1.01 1.47% faster
garch/garch.stan 0.48 0.47 1.03 2.79% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.91 2.83 1.03 2.6% faster
arK/arK.stan 1.66 1.67 0.99 -0.59% slower
gp_pois_regr/gp_pois_regr.stan 2.6 2.61 1.0 -0.35% slower
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.39 9.48 0.99 -0.94% slower
performance.compilation 190.84 190.77 1.0 0.04% faster
Mean result: 1.011567169879977

Jenkins Console Log
Blue Ocean
Commit hash: d8ac1c67d08c8b579e1b20ed5747abacce87c6fc


Machine information No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@SteveBronder
Copy link
Collaborator

Actually I reverted my old PR. Changing all the old functions to use the new one causes cmdstan tests to fail.

Let's have the _rank versions like you made here. We just need one last signature for

double split_potential_scale_reduction(
      const Eigen::Matrix<Eigen::VectorXd, Dynamic, 1>& samples)

Then we are good! After this is in we can then change cmdstan etc. to use the new ranked version. I think this is better because we need to change the output anyway to report back both the bulk and tail rhat

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
arma/arma.stan 0.2 0.19 1.07 6.13% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.01 0.01 1.12 10.43% faster
gp_regr/gen_gp_data.stan 0.02 0.02 1.1 8.7% faster
gp_regr/gp_regr.stan 0.11 0.1 1.06 5.37% faster
sir/sir.stan 77.58 75.31 1.03 2.92% faster
irt_2pl/irt_2pl.stan 3.76 3.75 1.0 0.26% faster
eight_schools/eight_schools.stan 0.05 0.05 1.02 1.8% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.25 0.24 1.04 3.67% faster
pkpd/one_comp_mm_elim_abs.stan 17.97 17.57 1.02 2.27% faster
garch/garch.stan 0.45 0.45 1.0 -0.12% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.77 2.73 1.02 1.55% faster
arK/arK.stan 1.63 1.6 1.02 2.0% faster
gp_pois_regr/gp_pois_regr.stan 2.48 2.5 0.99 -0.79% slower
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.05 9.02 1.0 0.4% faster
performance.compilation 173.78 176.46 0.98 -1.54% slower
Mean result: 1.030804841038834

Jenkins Console Log
Blue Ocean
Commit hash: 934e17704b703a32c4c29f9ab5ae1913c4a58a57


Machine information No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@aleksgorica
Copy link
Collaborator Author

Actually I reverted my old PR. Changing all the old functions to use the new one causes cmdstan tests to fail.

Let's have the _rank versions like you made here. We just need one last signature for

double split_potential_scale_reduction(
      const Eigen::Matrix<Eigen::VectorXd, Dynamic, 1>& samples)

Then we are good! After this is in we can then change cmdstan etc. to use the new ranked version. I think this is better because we need to change the output anyway to report back both the bulk and tail rhat

Ok, but why should we even keep double split_potential_scale_reduction( const Eigen::Matrix<Eigen::VectorXd, Dynamic, 1>& samples) or add double split_potential_scale_reduction_rank( const Eigen::Matrix<Eigen::VectorXd, Dynamic, 1>& samples), when it is declared private and there are no written tests for the function and no known references? Would we break backward compatibility by removing the function?

@SteveBronder
Copy link
Collaborator

Oh, your right I didn't see that it was a private member that is never called. Let's leave it for now but idt you need to write a rank version for that one.

@SteveBronder
Copy link
Collaborator

@aleksgorica you should have received an email that adds you to the stan project :) clicking on the link in that email from github should then allow you to press the "merge pull request" button

@aleksgorica
Copy link
Collaborator Author

Thank you all for accepting me to Stan project :). I had really great time working on the pull request. Also thank you @SteveBronder and @bob-carpenter for mentoring.

But github doesn't allow me to merge pull request.

@bob-carpenter
Copy link
Contributor

I just verified that Steve gave you permission to merge. Can you verify that works on your end by merging this?

Thanks, and welcome to the Stan team!

@SteveBronder
Copy link
Collaborator

Did you click on the link in the email that was sent to you from github? After you click that and refresh the page you should have permission to merge

@aleksgorica
Copy link
Collaborator Author

Yes, I clicked multiple times, now it always redirects me to https://github.com/stan-dev/stan. But it still does not allow me to merge. If I go to https://github.com/settings/organizations, there is written that I am outside collaborator on 1 repository for Stan.

@SteveBronder
Copy link
Collaborator

Sorry you should have just gotten a new email that should give you the right permissions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants