Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible Build #1341

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions bin/misc-functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -528,6 +528,10 @@ __dyn_package() {
echo -n "${BUILD_ID}" > "${PORTAGE_BUILDDIR}"/build-info/BUILD_ID
fi

if [[ "${BUILD_TIME}" == "ebuild" ]]; then
find ${D} -exec touch -h -r ${EBUILD} {} \;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • need to quote ${D}
  • Modifying the timestamps of installed files is conditionally problematic, since installed file contents can contain the timestamps of other installed file contents and require a match. In particular, this is a problem for python bytecode. If you touch the timestamp of *.py files, then all *.pyc files will be invalidated and the next time they are imported as root, the interpreter will regenerate and rewrite the .pyc files with new values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eli-schwartz thank you for the feedback

I'll change the ${D} accordingly.

As for the timestamp, any alternative suggestion to make it deterministic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible if we are to fulfill the "Preservation of file modification times" requirement of PMS:
https://dev.gentoo.org/~ulm/pms/head/pms.html#x1-146001r1

There was an ignore-mtime option dropped from #991 due to the same requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making file metadata deterministic when PMS explicitly says it shall not be deterministic is a tough topic.

All I can say is that from a pure usability standpoint, you don't really know what software depends on the timestamp. Python bytecode may be only one example.

Setting $SOURCE_DATE_EPOCH is explicitly respected by python bytecode to use a slower and less efficient bytecode invalidation format. It's also the actual reproducible builds specification. It is likely any other software depending on timestamps, will respect that variable if it respects anything at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance we can revise this specification?

Reproducibility has become more and more relevant these days, and it has became relevant to us Gentoo users especially since binary packages are offered officially.

Just like binary packages of other distros (e.g. Debian, even Arch is activelly spending effort on it), it would be nice to be able to verify the official build somehow, even if it means having to match the USE flags and other configs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've submitted this ticket for EAPI to allow mtime modification in future version.

fi

if [[ "${BINPKG_FORMAT}" == "xpak" ]]; then
local tar_options=""

Expand Down
2 changes: 1 addition & 1 deletion bin/phase-functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ __filter_readonly_variables() {
local bash_misc_vars="BASH BASH_.* COLUMNS COMP_WORDBREAKS HISTCMD
HISTFILE HOSTNAME HOSTTYPE IFS LINENO MACHTYPE OLDPWD
OPTERR OPTIND OSTYPE POSIXLY_CORRECT PS4 PWD RANDOM
SECONDS SHLVL _"
SECONDS SHLVL _ SRANDOM EPOCHREALTIME EPOCHSECONDS"
local filtered_sandbox_vars="SANDBOX_ACTIVE SANDBOX_BASHRC
SANDBOX_DEBUG_LOG SANDBOX_DISABLED SANDBOX_LIB
SANDBOX_LOG SANDBOX_ON"
Expand Down
16 changes: 13 additions & 3 deletions lib/_emerge/EbuildBinpkg.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,19 @@ def _start(self):
pkg = self.pkg
root_config = pkg.root_config
bintree = root_config.trees["bintree"]
pkg_allocated_path, build_id = bintree.getname_build_id(
pkg.cpv, allocate_new=True
)

BUILD_ID_TYPE = self.settings.configdict["env"].get("BUILD_ID_TYPE")
if BUILD_ID_TYPE == "int" or not BUILD_ID_TYPE:
pkg_allocated_path, build_id = bintree.getname_build_id(
pkg.cpv, allocate_new=True
)
elif BUILD_ID_TYPE == "hash":
pkg_allocated_path, build_id = bintree._allocate_filename_hash(
pkg.cpv, os.path.join(self.settings.get("T"), "environment")
)
else:
raise Exception("Invalid BUILD_ID_TYPE")

bintree._ensure_dir(os.path.dirname(pkg_allocated_path))

self.pkg_allocated_path = pkg_allocated_path
Expand Down
1 change: 1 addition & 0 deletions lib/portage/cache/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ class database(flat_hash.database):
"EAPI",
"PROPERTIES",
"DEFINED_PHASES",
"_mtime_",
)

autocommits = True
Expand Down
23 changes: 23 additions & 0 deletions lib/portage/dbapi/bintree.py
Original file line number Diff line number Diff line change
Expand Up @@ -2404,6 +2404,29 @@ def _allocate_filename_multi(self, cpv, remote_binpkg_format=None):
continue
return (filename, build_id)

def _allocate_filename_hash(self, cpv, hash_src, remote_binpkg_format=None):
import hashlib
try:
binpkg_format = get_binpkg_format(cpv._metadata["PATH"])
except (AttributeError, KeyError):
binpkg_format = self.settings.get(
"BINPKG_FORMAT", SUPPORTED_GENTOO_BINPKG_FORMATS[0]
)
if binpkg_format == "xpak":
binpkg_suffix = "xpak"
elif binpkg_format == "gpkg":
binpkg_suffix = "gpkg.tar"
else:
raise InvalidBinaryPackageFormat(binpkg_format)
pf = catsplit(cpv)[1]

with open(hash_src) as F:
build_id = hashlib.sha1(F.read().encode()).hexdigest()[:8]
filename = (
f"{os.path.join(self.pkgdir, cpv.cp, pf)}-{build_id}.{binpkg_suffix}"
)
return (filename, build_id)

@staticmethod
def _parse_build_id(filename):
build_id = -1
Expand Down
1 change: 1 addition & 0 deletions lib/portage/dbapi/porttree.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,7 @@ def __init__(self, _unused_param=DeprecationWarning, mysettings=None):
"SLOT",
"DEFINED_PHASES",
"REQUIRED_USE",
"_mtime_"
}

self._aux_cache = {}
Expand Down
1 change: 1 addition & 0 deletions lib/portage/package/ebuild/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,7 @@ class config:
"repository",
"RESTRICT",
"LICENSE",
"_mtime_",
)

_module_aliases = {
Expand Down
13 changes: 12 additions & 1 deletion lib/portage/package/ebuild/doebuild.py
Original file line number Diff line number Diff line change
Expand Up @@ -2526,6 +2526,8 @@ def _post_src_install_write_metadata(settings):
"""

eapi_attrs = _get_eapi_attrs(settings.configdict["pkg"]["EAPI"])
ebuild_mtime = settings.configdict["pkg"].get("_mtime_")
build_time = settings.configdict["env"].get("BUILD_TIME")
build_info_dir = os.path.join(settings["PORTAGE_BUILDDIR"], "build-info")
metadata_buffer = {}

Expand All @@ -2543,6 +2545,15 @@ def _post_src_install_write_metadata(settings):
if v is not None:
metadata_buffer[k] = v

if build_time == "pkg" or not build_time:
build_time = time.time()
elif build_time == "ebuild":
build_time = ebuild_mtime
else:
try:
build_time = time(int(build_time))
except:
raise Exception("Invalid BUILD_TIME")
with open(
_unicode_encode(
os.path.join(build_info_dir, "BUILD_TIME"),
Expand All @@ -2553,7 +2564,7 @@ def _post_src_install_write_metadata(settings):
encoding=_encodings["repo.content"],
errors="strict",
) as f:
f.write(f"{time.time():.0f}\n")
f.write(f"{build_time:.0f}\n")

use = frozenset(settings["PORTAGE_USE"].split())
for k in _vdb_use_conditional_keys:
Expand Down