2020-06-06

パウリの相対性理論

W．パウリ　相対性理論　内山龍雄　訳　昭和49年10月28日　第1刷発行　講談社

本棚に眠っていた本を取り出してみた。

たぶん、読んで理解できるのは、序文とか、歴史的背景のような読み物のところだけだろうと思うが、今日1日、つまみ食いしてみよう。

W. Pauliが21才のときに、Mathematical Encyclopediaのために書かれた論文を、35年後に単行本として出版したもの。

その論文は、1921年までに発表された相対性理論に関するすべての文献の完全な総合報告を作ることがそのねらいだったとのこと。

本文は、原文ままだが、1955年までのその後の発展については、巻末に付録をつけ、本文の適当な箇所にこの付録を引用するための脚注もつけたとのこと。

以下に、英訳本に対するW. Pauliの序文の後半部分を、そのまま転載する。

　相対性理論は”古典物理学”の終点であるという考えがある。ここにいう古典物理学とは時間空間のなかで因果律という”決定論”的形式により支配されたNeuton-Faraday-Maxwellのスタイルの物理学をさす。一方これにかわって量子力学的な新しいスタイルの自然法則が登場したといわれる。このような見かたは、私にいわせれば、部分的には正しい。しかしこの考えは、今日の物理学者の一般的な考えかたに対するEinsteinの偉大な影響を正しく、また十分に評価しているとはいえない。光の速さ（したがってまたすべての信号の速さ）が有限であるということから生ずる結果の認識論的分析により、特殊相対性理論は素朴な視覚表象から一歩抜きんでたものとなった。その昔、仮想的媒質とよばれた”光を伝えるエーテル”の運動の状態という概念は、単にそれらが観測にかからないという理由からだけでなく、数学的公式化にとって邪魔なものとなったために、放棄されねばならないことになった。すなわちエーテルは、相対性理論の基礎にある群論的性格にとって邪魔なものとなった。

　一般相対性理論では変換群をさらに一般的なものに拡張することにより、慣性座標系という特別な概念もEinsteinにより排除された。なぜならこの概念は一般相対性理論の群論的性格と相いれないものであるからである。一般に理論を数学的に公式化するとき用いられる数学的量と、観測されたデータとのあいだの対応を概念的に分析するにしたがって、素朴な視覚表象を放棄するという上の例に述べたような一般的、批判的態度がなかったならば、現在のような形式の量子力学を創造することはできなかったであろう。相補性原理にしたがう量子力学では、作用量子が有限であるということにもとづく認識論的分析により、素朴な視覚表象からの脱皮がさらにおこなわれた。すなわち時間空間内における古典的場の概念、ならびに粒子（電子）の描く軌道という概念からの脱皮である。これらの概念は理論の合理的一般化のためには放棄されねばならなかった。電子の軌道が観測できないという理由だけからではなく、これらの概念は量子力学の数学的公式化の根底にある一般変換群に固有な対称性にとって邪魔になるから、両概念は排除されねばならなかった。

　私は、基本的な科学上の発見が自己の道にそって、ときにはその発見者自身の反対にもさからって、如何にしてさらに新しい実りある発展を生むかということを示す最もよい例が相対性理論であると思う。

　1956年11月18日、チューリッヒにて　　　　　　　　　　　　　　W. Pauli

第Ⅰ編　特殊相対性理論の基礎

§１．歴史的背景（ローレンツ、ポアンカレ、アインシュタイン）

　相対性理論によって引きおこされた物理学的諸概念の変革には、実はそうなるまでに永い準備期間があった。すでに1887年、Voigt は弾性論的光学理論の立場から、運動している座標系においては局所的時間 t' を用いるほうが数学的に便利であることを指摘している。彼の論文では t' の原点は空間座標の一次関数であらわされる。しかし t' のスケールは静止座標系の時間 t のそれと同一としている。このようにして光の波動方程式は運動している座標系からみても、その形を変えないことが証明された。この Voigt の注意は、しかしながら、その後完全に忘れさられてしまった。

概要を記述する能力は無いので、適当に拾い読みする。

　しかし Michelson の干渉計の実験（これは v/c の2次の量に関する実験である）の否定的結果は理論に対して致命的打撃を与えた。この問題を解決するために、ローレンツならびに、フィッツジェラルドもローレンツとは独立に次のような仮説を提唱した。すなわちすべての物体は速さｖで並進運動をしているとき、その長さが収縮するという説である。長さの変化率は運動の方向に対してκ√1-(v/c)^2（正しく表示できない）倍に収縮する。

　ローレンツが研究しのこした形式的な欠陥はポアンカレによってうめられた。ポアンカレは相対性原理が一般的にまた厳密になりたつものと主張した。彼はいままでの議論に登場した人々と同様に、マックスウェルの方程式は真空中では厳密になりたつものと仮定した。この仮定からすべての自然法則は”ローレンツ変換”に対して不変でなければならないという要請が導かれる。運動のさいに、運動方向に垂直な方向の大きさが不変であるということはつぎの要請から自然に導かれる。すなわち静止系から、これに対して一様な速度で運動している座標系への乗り移りを与える変換の全体が数学でいうひとつの群をなさねばならないという要請である。普通よく出あう座標系のズラシはこの群の部分群をなす。ポアンカレはさらに、電荷密度や電流に対するローレンツの間違った変換公式を訂正した。このようにして彼は電子論の場の方程式が完全な共変性をもつことを示した。

　最後にこの新しい考えの基礎を正しく数式化して、この問題に終止符をうったのは、アインシュタインである。1905年の彼の論文はポアンカレの論文とほとんど同じ頃に、また1904年に発表されたローレンツの論文を知らないで書かれたものである。アインシュタインの論文は、ローレンツやポアンカレの論文に述べられていることの本質的部分をすべて包含しているばかりでなく、その体裁ははるかにエレガントで、包括的であり全問題の本質をより深く理解しているものといえよう。これから、このアインシュタインの研究の詳細について説明しよう。

§２．相対性の要請

　物理的諸現象に対する地球の運動の影響を地球自身の上で何とかして測定しようとする多くの試みがすべて失敗したことはつぎのような主張が正しいことを証明するものであるといっても差しつかえなかろう。すなわち或る座標系を基準とした場合、そこに起るすべての現象はこの基準系全体の並進運動には無関係であるということである。

§３．光速度不変の要請、リッツの理論

　相対性の要請だけでは、すべての自然法則がローレンツ変換に対して不変であるということを導くのにはまだ不十分である。たとえば古典力学の方程式（ニュートン力学の方程式）はローレンツ変換に対しては不変な形をしていないが、相対性の要請だけに着目するならば、これを完全に満たしている。すでに§1で述べたように、ローレンツとポアンカレはマックスウェルの方程式を彼らの議論の出発点に採用した。ところで自然法則の不変性といったような基本的法則は最も簡単な仮定から導かれるべきである。これを成しとげることに成功したのがアインシュタインである。彼はつぎのような電気力学の簡単な法則を原理として仮定する必要があることを示した：光の速さは光源の運動に無関係である。

§４．同時刻の相対性、ローレンツ変換の導出、ローレンツ変換の公理的性質

　前節までに述べた二つの要請、すなわち相対性の要請と光速度不変の要請は一見両立しないもののように思われる。例えば、1人の観測者Aに対して光源Lが早さｖをもって運動しているとしよう。また第２の観測者Bは光源Lに対して静止しているものとする。これら両観測者にとって光の波面はそれぞれ球面に見える。しかもその中心は、A, Bそれぞれにとって静止して見えるはずである。したがって、A, Bは実は異なる球面を見ていることになる。この矛盾はつぎのことを容認するならば解消する。すなわちAが見たとき、Aの球面上の各点には光が同時刻に到達するが、これをBから眺めれば、Aの球面上のすべての点に同一の時刻に光が到達したようには見えないということである。これは同時刻という概念が見る人によって異なるもので、相対的概念であることを主張するものである。そこでまず第一に、別々の場所に在る時計を同時刻にそろえるとはいかなる意味を指すものかを説明することが必要である。これについてアインシュタインはつぎのような定義を採用した。いま点Pから、Pの時計の示す時刻tpに光が放射され、それが点Qに到着し、そこで反射され、再びPに立ちもどったときのPの時計の示す時刻をtp'とする。Qで光が反射されたとき、Qに在る時計の示す時刻がtqであったとする。もしtq=(tp+tp')/2がなりたつときは、Qの時計はPの時計と同時刻に調整されているという。アインシュタインは時計の調整に光を用いた。なぜなら、二つの要請は光の信号がどのように伝播するかについて不明確さが一切ない明確な規定をわれわれに与えるから。時計を同時刻に調整するについては、勿論光以外の手段を用いる方法も考えられる。たとえば一つの時計を或る場所から他の地点にまで運搬する方法とか、また力学的あるいは弾性的な信号の伝達法も考えられよう。しかしどのような方法を用いるにしても、その結果が上述の光を用いた調整法による結果と矛盾してはならないということは重要な条件である。

§５．ローレンツ収縮と時間の遅れ

　ローレンツ収縮は変換公式（Ⅰ）の結論のなかで最もかんたんなもののひとつである。したがってそれはまた二つの基本的要請の結果と言うことになる。

§６．速度の加法定理、光行差とエーテルの随伴係数、ドップラー効果

　古典運動学における速度の加法の法則は、相対論的運動学ではもはや成りたたないことは容易にわかるであろう。相対性理論では c に v(<c) を加えれば c+v ではなく、再び c とならなければならない。

第Ⅱ編　数学的準備

§７．4次元世界（ミンコフスキー）

　第Ⅰ編で示したことは、相対性の要請と光速度不変の要請は”すべての物理法則がローレンツ群に対して不変でなければならない”というひとつの要請にまとめられるということである。今後はローレンツ群というときは恒等式（Ⅱ）を満足するすべての一次変換の全体をさすものとする。この群に属する任意の変換は3次元空間座標軸の回転と（Ⅰ）のタイプの特別なローレンツ変換の組み合わせで作ることができる。数学的にいえば、特殊相対性理論とはローレンツ群に対する不変論にほかならない。

　相対性理論にとってミンコフスキーの研究はきわめて重要な基本的な役割を演じた。彼はつぎの二つの事実に着目することによって、理論をきわめて見通しのよい形式に書きあらわした：

１．

§８．ローレンツ群の拡張

　後に一般相対性理論を展開するときに必要となる数学的手段をこれから開発するために、ここで一般相対性理論の二、三の形式的結果を予め想定することにしよう。

§９．アフィン変換に対するテンソル解析

　特殊相対性理論と一般相対性理論で、同じ公式をちがった形に書きあらわすことは不便であるから、これをさけるためにわれわれは最初からアフィン変換群を議論の基礎にとり、直交変換（つまりローレンツ群）に制限するようなことはしない。

§10．ベクトルの反変ならびに共変成分の幾何学的意味

§11．”面テンソル”と”立体テンソル”．4次元体積．

§12．デュアル・テンソル．

§13．リーマン幾何学への移行

　これからすべての点変換からなる群に対する不変論を議論することにしよう。そのためにはまず、長さの定義をしておかねばならない。また一般リーマン幾何学の定理を述べておく必要がある。Bolyai および Lobachevski の考えた古い幾何学では平行に関するユークリッドの公理は放棄された。しかし任意の幾何学的図形をそのまま、或る場所から他の場所まで自由に運搬することの可能性は公理としてみとめた。その結果、彼らの幾何学は曲率が一定の空間の或る特別な場合に相当する。また射影幾何学から出発しても、より一般的な計量をもつ空間には到達し得ない。もっとも一般的な計量をもつ空間の可能性を考えた最初の人はリーマンである。特殊相対性理論ならびに一般相対性理論では剛体という概念が修正されることになったが、それはいままで永い間、自明とされていた合同の公理が放棄されねばならないことが今日にいたって明らかとなったことを意味する。またそれは一般リーマン幾何学が空間・時間に対するわれわれの考察の基礎とならねばならないことを示すものである。

§14．ベクトルの平行移動の概念

§15．測地線

§16．空間の曲率

　空間の曲率という概念を最初にいいだした人はリーマンである。彼は曲面のガウス曲率という概念をｎ次元多様体の場合へ拡張した。しかし彼のパリ受賞論文が発表されるまでは、この問題に対する彼の解析的方法がどのようなものかはわからなかった。この彼の論文には曲率に関する彼の扱いのすべてが載っている。それは消去法ならびに変分法のいずれをも用いる扱い方である。しかしリーマンのこの仕事より以前に、Christoffel および Lipschitz はすでに同じ結論を導いていた。

§17．リーマンの標準座標系とその応用

§18．ユークリッド幾何学および曲率が一定の空間

§19．4次元リーマン空間におけるガウスおよびストークスの積分定理

§20．測地成分を用いた共変微分

§21．アフィン・テンソルおよび自由ベクトル

　一般相対性理論では、座標系の任意の変換に対して不変（共変）な形式をもつ方程式のみを扱うが、ときには、座標の一次変換（アフィン変換）に対してのみテンソルのように変換される或る種の量が重要となる場合がある。後者のようなふるまいをする量をアフィン・テンソルという。アフィン・テンソルの例として最もよく知られているものが測地成分 Γ （上付きi、下付きkl）である。

§22．現実の世界に対する条件

§23．無限小座標変換と変分原理

第Ⅲ編　特殊相対性理論

a）運動学

§24．ローレンツ変換の4次元的表現

§25．速度の合成則

§26．加速度の変換則、双曲線運動．

b）　電気力学

§27．電荷の不変性．4元電流密度

§28．電子論の基礎方程式の共変性

　すでに§１でも述べたように、ガリレイ変換に対してマックスウェルの方程式が不変でないことが相対性理論を誕生させるひとつの大きな誘因となった。ローレンツは彼の1904年の論文において、現在われわれがローレンツ変換とよんでいる変換に対してマックスウェルの方程式が不変であることを証明した。しかしこの証明は電荷や電流が存在しない場合にかぎられていた。電荷、電流が存在する場合をも含めて方程式の不変性を完全に証明したのはポアンカレ（およびこれと独立にアインシュタイン）である。またマックスウェルの方程式を4次元テンソル形式に書きなおしたのはミンコフスキーである。彼は”面テンソル”の概念を重視した最初の人である。

　さて電磁場の方程式を4次元的に不変な形式で書きくだすために、まず電荷密度、電流密度に関係しない4個の方程式をとりあげよう：

§29．電磁的力．電子の力学．

　アインシュタインは彼の第１論文ですでにつぎのことを示した。すなわちもし電磁場内を無限に小さな速度をもって運動している点電荷の運動の法則がわかっているとき、相対性理論を用いれば、任意の大きさの速度をもって電磁場のなかを運動する点電荷の行動について明確な予言をすることが可能であるということである。

質量に対する（215）という形式は、特に電子の質量に対してローレンツによりはじめて与えられた。彼は、電子自身もその運動の結果、”ローレンツ収縮”をこうむるという仮定からこの結果をみちびいた。

§30．電磁場の運動量とエネルギー．微分型ならびに積分型保存則．

§31．電気力学における不変変分原理．

§32．応用例

§33．運動している物体に対するミンコフスキーの現象論的電気力学

§34．現象論的電磁気学の電子論的基礎づけ．

§35．現象論的電気力学におけるエネルギー・運動量テンソルならびに電磁気力．ジュール熱．

　相対性原理によれば、静止している物体に対する（電磁的）エネルギー・運動量テンソルならびに電磁的力がわかっていれば、運動している物体に対するこれらの量を一意的に導くことができるはずである。それにもかかわらずエネルギー・運動量テンソルに対して、いろいろの人によりそれぞれ異なった形式が提唱されている。これらの種々の形式のうちでどれが正しいかは、いまのところまだ決着がついていない。そこでエネルギー・運動量テンソルの形式がどんなものであろうと、それの特別なえらびかたには無関係になりたつ相対性理論からの一般的結論についてまず考えてみよう。

　エネルギー密度W、エネルギーの流れの密度（強さ）S、運動量密度ｇ、および3次元的張力テンソルの成分T（添え字省略）は、真空の中の電気力学の場合と同様に、ひとつの4次元テンソルSikにまとめられる：

§36．理論の応用

c）力学および一般力学

§37．運動方程式．運動量と運動のエネルギー

ここで E=mc**2が登場する。（318b）

§38．相対論的力学（電気力学によらない導きかた）

§39．相対論的力学におけるハミルトンの原理

§40．一般座標．運動方程式の正準形

§41．エネルギーの慣性．

　運動エネルギーと質量の間の簡単な関係式（318b）から、すべてのエネルギー E には m=E/c**2 であたえられる質量が付随する（すなわちEは必ずE/c**2という大きさの質量をもつ）という要請に導かれる。これをみとめると、任意の物体が過熱されればその質量は増加することになる。

　以上に述べた議論により、いかなる種類のエネルギーEも必ず大きさE/c**2の質量をもつという基本的法則が相対性原理とエネルギー・運動量の保存則から導かれることが証明されたと考えてよかろう。われわれはこの法則が特殊相対性理論から求まる結論のうちで最も重要なものであると考える（アインシュタインもこのように考えた）。

§42．一般力学

§43．外力が作用している物理系のエネルギー、運動量の変換性

§44．応用例．TroutonおよびNobleの実験

§45．流体力学と弾性論

d）熱力学および統計力学

§46．ローレンツ変換に対する熱力学的量のふるまい

　物質の静止系から、これに対して一定速度で運動している座標系に移るとき熱力学的な諸量がどのような変換をこうむるかについては、運動座標系における力学に関するプランクの基礎的研究がその解答を与えた。彼は変分原理を出発点にとった。しかしそれらの量の変換則はまた直接にも導けることがアインシュタインにより示された。その場合には逆にこれらの結果から変分原理のなりたつことが証明される。

§47．最小作用の原理

§48．相対論的力学の統計力学への応用

§49．特別な例

　α）運動している空洞内における黒体放射

　これは歴史的にみて興味のある例である。というのは相対性理論を使わなくても電気力学を基礎にしてこの問題は解答が与えられるからである。電気力学を基礎にしてこの電磁場を考えるとき、運動している空洞内の電磁波のもつエネルギーは運動量ももち、また慣性質量ももつという結論に必然的に到達する。この結論が相対性理論が提唱される以前にHasenohrlにより与えられたことはまことに興味深いことである。もちろん、かれの推理は二、三の点で若干の修正が必要ではあるが、しかし立派なものである。この問題の完全な解答はMosengeilにより与えられた。プランクはMosengeilの結果を一般化することにより運動している物体系の力学に関する多数の公式を導いた。運動している空洞内に在る電磁波のもつ圧力、運動量、エネルギーおよびエントロピーが温度にどのように依存するかという問題、また電磁波のスペクトル分布が温度や方向とどのような関係にあるかという問題に対しては、相対性理論のたすけをかりて問題を静止している空洞の場合にやきなおすことによりこれらの問題に対する解答を直接に手にいれることができる。静止している空洞に対してはつぎのような関係がなりたつ：

　β）理想気体

　理想気体のふるまいについて、相対論的効果（気体分子の質量が速度とともに変化すること）のために、非相対論的力学を用いた計算結果からのズレがおこるのは、気体分子の平均速度が光速度に近くなる場合にかぎられる。

第Ⅳ編　一般相対性理論

§50．アインシュタインの論文（1916年）ができるまでの歴史的概観

　ニュートンの重力の法則は作用が瞬間的に遠隔地点に到達するという考えにたっている故、特殊相対性理論と両立しかねるものである。後者によれば、すべての作用はどんなに速くても光速度以上の速さでは伝播できない。また重力の法則もローレンツ不変でなければならない。ポアンカレはいちはやく、これら両要請が満たされるようにニュートンの重力の法則を修正することを試みた。このような試みはいろいろの方法で実行することができる。しかしそれらの試みに共通な基本的仮定は次のことである。すなわち2個の粒子のあいだにはたらく重力はそれらの同時刻における相対的位置に依存するのではなく、 t=r/c だけ以前における相手の粒子の位置との相対的関係に依存する。また位置だけでなく速度（またさらに多分加速度）にも依存するということである。しかしニュートンの法則からのズレは常に v/c の2次（またはそれ以上）であり、そのためこのズレは非常に小さくわれわれの経験と矛盾しない。ミンコフスキーとゾンマーフェルトはポアンカレのアイデアを4次元ベクトル解析に適合する形に書きあらわした；また特別な場合についてローレンツによりくわしく検討された。

　これらすべての試みに対する反論は、これらの人々がすべて、重力場の方程式であるポアッソンの方程式のかわりに力の法則自身を理論の出発点に採用したということである。作用の伝播は必ず有限な速度で行われるということがあきらかになった以上、その作用を、空間的な位置および時間の経過するにつれて連続的に変化する量（これを場という）を用いて表すときは、またこの量（場）が満足すべき微分方程式を探索すれば、われわれは必ずや、普遍的になりたつ簡単な法則に到達するであろうと確信する。このように考えれば、われわれの問題はポアッソンの方程式（ΔΦ=4πκµ0）および粒子の運動方程式（・・・）をローレンツ不変な形に書き替えることである。

　しかしながら、実際の歴史は、上の二つの方程式のかきかえを実行するかわりに、思わざる方向に発展した。特殊相対性理論からの物理的な推論が或る段階に達したとき、アインシュタインは相対性原理を一様でない運動をしている座標系にまで適用できるように理論の拡張を試みた。彼は、すべての物理法則はガリレイ系以外の座標系においても同じ数学的表現形式を保持すべきであるということを原理として要請した。この要請が実際に満たされることが可能となったのはいわゆる等価原理のおかげである。ニュートン力学では一様な重力場内にある物理系のふるまいは、重力は存在しないがガリレイ系に対して一様な加速度をもって運動している座標系からこの物理系をながめた場合のそれのふるまいと力学的現象にかんするかぎりではまったく等しい。これに対して、単に力学的現象にかぎらずあらゆる物理現象がこれら両方の場合にまったく同じように起こるべきであるということが等価原理の主張である。この原理は一般相対性原理の基礎原理のひとつである。この主張はアインシュタインにより後に原理として採用され展開された。

§51．等価原理．重力と計量の関係

§52．物理法則の一般共変性の要請

§53．等価原理からの簡単な結論

　α）弱い重力場における小さな速度をもった質点の運動方程式

　β）スペクトル線の赤方偏移

　ɤ）静的重力場におけるフェルマーの定理

§54．物質現象に対する重力場の影響

§55．重力場が存在する場合の物質系に対する作用原理

§56．重力場の方程式

　一般相対性理論が解答を与えなければならない最も重要な問題はG-場自身の法則を確立するということである。この法則もまた一般共変性をもつべきであるという要請は当然のものといえよう。しかしこの法則を一意的に決定することが可能なためには、さらに或る条件を設けねばならない。

§57．変分原理からの重力場の方程式の導出

§58．実験との比較

　α）ニュートンの理論との関係

　β）質点のつくる重力場の厳密解

　ɤ）水星の近日点移動と交戦の湾曲

§59．静的重力場の厳密解（つづき）

§60．アインシュタインの近似解とその応用

§61．重力場のエネルギー

§62．重力場の方程式の修正．慣性の相対性と空間的に閉じた宇宙

　α）マッハの原理

　β）恒星系の統計的平衡状態．λ-項

　ɤ）有限の大きさをもつ宇宙のエネルギー

第Ⅴ編　荷電素粒子の理論

§63．電子と特殊相対性理論

§64．Mieの理論

§65．Weil の理論

§66．アインシュタインの理論

§67．素粒子の問題の現状に関する一般的注意

　いままで述べてきたどの理論もそれぞれそれに固有な長所と欠陥をもっている。そのどれもが失敗に帰したのは何故か、これらの理論に共通な欠陥、共通な難点は何かということをここでまとめてみるのは有意義なことであろう。ここに述べた場の理論に共通なねらいは、物理法則をあらわす微分方程式が或る特別なタイプの解を有限個しかもたないという事実によって電荷の原子的性格（つまり素電荷をもった素粒子の存在）を説明するということである。ここにいう特別なタイプの解とはいたるところで正則な静的球対称場を表す解である。特に電荷の正、負に応じてそれぞれこのような特別な解がひとつづつ存在しなければならない。このような条件を満足する微分方程式は特に複雑な構造をもった方程式であるにちがいない。方程式の構造に関するこの複雑さだけでも、すでに場の理論がこの問題に対する正しい攻略法ではないことを語っていると思われる。なぜならば、物理的にみて素電荷の存在自身はまことに簡明な基礎的な事実である。したがってそのように簡明な基本的なことは簡明な初等的な方法で理論的にも理解されるべきことで、数学的な解析の特別な技巧によって説明されるべきものではない。

　さらに場の理論では、荷電素粒子の内部を平衡状態に保つためにはクーロンの斥力を相殺する特別な凝集力の存在が必要である。この凝集力が電磁的性格のものと仮定するならば、Mieが考えたように、電磁ポテンシャル自身にも物理的な意味をもたせなければならない。しかしこのような解釈は§64にのべたような重大な困難を引きおこす。これと反対に、荷電粒子は自己の重力によって、粒子自身を安定にたもっているという考えがある。しかしこの考えも非常に強い経験的反論に遭遇する。なぜなら、そのような解釈にたてば、電子の重力質量と電荷のあいだには或る簡単な数値的関係が存在することになる。すなわちe**2≒km**2となる必要がある。ところが現実にはe/m√k （k=ニュートンの万有引力定数）は10^20の程度の途方もない大きな数となる。

　場の理論はまた正負の電荷のあいだに存在する非対称、すなわち正電荷をもつ陽子の質量が負電荷をもつ電子の質量の1800倍も大きいという事実を説明できなければならない。しかしこのような非対象は理論の一般共変性と矛盾することが容易にわかる。

　最後に場の理論的な考察には概念的に疑問に思われる点がある。場の理論では、電子の内部においても電場の強さに対して普通の考えかたを用いている。しかしもともと電場の強さは試験用電荷に作用する電気的力として定義されたものである。しかし電子や陽子よりもさらに小さな試験用荷電粒子は存在しない。したがって電子の内部における電場の強さは、その元来の定義にしたがえば、測定不能ということになる。したがってそのような電場は虚構のものであり物理的意味のないものといえる。

　以上述べた議論に対して、読者諸兄の考えがどのようなものであろうとも、つぎのことだけは確実なことといえよう：すなわちこの素粒子の構造という問題に対する満足のいく解答をうるには、まずそのまえに、連続的な場という概念にとってまったく異質な或る新しい概念を理論の基礎にとりいれる必要があるということである。　

以上、W. Pauli 相対性理論内山龍雄訳、の一部の写経、おわり　

ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー

＜感想＞

　理論自体を理解するのは難しいが、人間模様や、理論の発展に加わった多くの研究者の果たした役割がよくわかって面白い。評価できる立場にはないが、著者も翻訳者も超一流という感じがする。

　まだ、大半が、セクションのタイトルだけだが（6月8日）、できるだけ多くのセクションについて、写経しておきたいと思う。

　注意点は、この内容は、1921年までに発表された相対性理論に関する文献に基づいて書かれたものであるということである。100年前のことである。

ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー

＜雑談＞

＊人工知能研究の１つの目標は、30年以上前に映画で見たナンバーファイブのように、自ら学習するロボット（プログラム）を開発すること。

・これについて、決めておかなければならないことがいくつかある。

・目的、目標、開発期間、開発手段、・・・。

（課題の例を列挙してみよう）

・ARCの課題を解く：知能テストレベルの課題であっても、新規なアルゴリズムによってクリヤできれば、その新規なアルゴリズムは、それ自体が価値あるものになる可能性がある。

・相対性理論のような理論を再発見する：人が人工知能に求めるものはいろいろあるだろうと思うが、人工知能に夢をもとめるならば、自然科学分野において発見ができる人工知能を開発するということは、とてつもなく大きな目標になるだろう。

・発見ができるためには、研究ができなければならない。

・研究ができるためには、当該分野の知識と関連分野の知識と最新の動向を把握していることが必要となるだろう。

・最も困難なことは、おそらく、新たな価値ある発見につながる課題を見つけることだろう。

・新たな課題を解決する方法を作り出すこと、見つけること。

・そのために必要な科学的思考能力を具体的に記述すること。

・考えること。

・論理的に考えること。

・科学的に考えること。

・時間、空間、物質、相互作用、現象、について考えること。

・生成、消滅、変化、について考えること。

・演繹的に考えること。

ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー

2022/9/3　追記

アインシュタインは、1922年12月14日に京都で講演しており、その講演内容の英訳が、Physics Today / August 1982 P45-47に掲載されている。

講演はドイツ語で、当人による原稿は無く、1923年に出版された石原純氏による和文の講演録を英訳したものである。石原氏は1912年から1914年までゾンマーフェルトとアインシュタインの下で学んだ理論物理学者で当日は講演の通訳をしたようである。

アインシュタインは1922年にノーベル賞を受賞しているが、京都公演のスケジュールが先に決まっていたため、受賞式を欠席したとのことである。

英文講演タイトル：How I created the theory of relativity

この講演記録は3枚の写真を入れて2ページしかない。これだけの文章で、アインシュタインが学生のときに抱いた疑問を解決する特殊相対性理論から2015年に発表した一般相対性理論の構築までの創造のプロセスが思い出話として1人称で語られている。

f:id:AI_ML_DL:20200606090203p:plain — style=148 iteration=1

f:id:AI_ML_DL:20200606090257p:plain — style=148 iteration=20

f:id:AI_ML_DL:20200606090343p:plain — style=148 iteration=500

2020-06-03

The frontier of simulation-based inference

Kyle Cranmer, Johann Brehmer, and Gilles Louppe

www.pnas.org/cgi/doi/10.1073/pnas.1912789117

Many domains of science have developed complex simulations to describe phenomena of interest.

While these simulations provide high-fidelity models, they are poorly suited for inference and lead to challenging inverse problems.

We review the rapidly developing field of simulation-based inference and identify the forces giving additional momentum to the field.

Finally, we describe how the frontier is expanding so that a broad audience can appreciate the profound influence these developments may have on science.

statistical inference | implicit models | likelihood-free inference |

approximate Bayesian computation | neural density estimation

Mechanistic models can be used to predict how systems will behave in a variety of circumstances.

These run the gamut of distance scales, with notable examples including particle physics, molecular dynamics, protain folding, population genetics, neuroscience, epidemiology, economics, ecology, climate science, astrophysics, and cosmology.

The expressiveness of programming languages facilitates the development of complex, high-fidelity simulations and the power of modern computing provides the ability to generate synthetic data from them.

Unfortunately, these simulators are poorly suited for statistical inference.

The source of the challenge is that the probability density (or likelihood) for a given observation - an essential ingredient for both frequentist and Bayesian inference methods - is typically intractable.

Such models are often referred to as implicit models and contrasted against prescribed models where the likelihood for an observation can be explicitly calculated (1).

The problem setting of statistical inference under intractable likelihoodshas been dubbed likelihood-free inference - although it is a bit of a misnomer as typically one attempts to estimate the intractable likelihood, so we feel the term simulation-based inference is more apt.

f:id:AI_ML_DL:20200603225340p:plain

＊残念だが、とても読みこなせそうにない。

＊以下は、本文の一部の機械翻訳である。

＊ABCは、Approximate Bayesian Computationのこと。

Workflows for Simulation-Based Inference

この幅広い機能は、異なる推論ワークフローで組み合わせることができます。この一連のさまざまなワークフローのガイドラインとして、まず、一般的な構成要素と、これらの各コンポーネントで使用できるさまざまなアプローチについて説明します。で図1及び以下のセクションで我々は、次に、異なる推論アルゴリズムに一緒にこれらのブロックをつなぎます。

すべての推論方法の不可欠な部分は、図1で黄色の五角形として視覚化されているシミュレーターの実行です。シミュレーターが実行されるパラメーターは、ベイジアン設定の事前分布に依存するかどうかに関係なく、いくつかの提案分布から抽出され、静的またはアクティブな学習方法で反復的に選択できます。次に、シミュレータからの潜在的に高次元の出力を、推論方法への入力として直接使用するか、低次元の要約統計量に減らすことができます。

推論手法は、ABCのように、推論中にシミュレーター自体を使用するものと、代理モデルを構築して推論に使用する方法に大きく分けることができます。最初のケースでは、シミュレーターの出力がデータと直接比較されます（図1 A–D）。後者の場合、シミュレーターの出力は、図1 E – Hの緑色のボックスに示すように、推定またはMLステージのトレーニングデータとして使用されます。結果の代理モデルは、赤い六角形で示され、推論に使用されます。

アルゴリズムは、真の尤度の扱いにくさをさまざまな方法で扱います。いくつかの方法は、尤度関数の扱いやすいサロゲートを作成し、他の方法は、尤度比関数のサロゲートを作成します。他の方法では、尤度関数が明示的に現れることはありません。たとえば、棄却確率に暗黙的に置き換えられる場合などです。

ベイズ推定の最後のターゲットは事後です。メソッドは、MCMCやABCなどの後方からサンプリングされたパラメーターポイントのサンプルへのアクセスを提供するか、後方関数を近似する扱いやすい関数へのアクセスを提供するかで異なります。同様に、ワークフローの早い段階で推論する数量を指定する必要がある方法もあれば、この決定を延期することを許可する方法もあります。

おわり

f:id:AI_ML_DL:20200603195206p:plain — style=147 iteration=1

f:id:AI_ML_DL:20200603195429p:plain — style=147 iteration=20

f:id:AI_ML_DL:20200603195515p:plain — style=147 iteration=500

2020-06-02

ARC コンペのコードに学ぶ

KaggleのARCコンペ第3位、Ilia Larchwnko氏の手法に学ぶ

目的は、Domein Specific Languageにより、課題を解くことができるようにすること。

ARCコンペの7位以内で、かつ、GitHubで公開しているものの中から選んだ。

Ilia氏は、2名で参加していて、最終結果は0.813(19/104)であるが、GitHubで公開しているのは、Ilia氏単独のもので、単独での正解数は正確にはわからない。

Kaggleのコンペサイトに公開されているnotebooksは、19/104の好成績を得ているものであるが、複数のコードが混ざっている。

GitHubに公開されている、Ilia氏単独開発のコードは、全体の構成がわかりやすい。train dataでの正解は138/400、evaluation dataでの正解は96/400とのことである。

ソースコードは、大きくは、predictors（約4500行）とpreprocessing（約1300行）とfunctions（約160行）に分かれている。

predictorsでは、functionsから、

combine_two_lists(list1, list2):

filter_list_of_dicts(list1, list2):

""" returns the intersection of two lists of dicts """

find_mosaic_block(image, params):

""" predicts 1 output image given input image and prediction params """

intersect_two_lists(list1, list2):

""" intersects two lists of np.arrays """

reconstruct_mosaic_from_block(block, params, original_image=None):

swap_two_colors(image):

""" swaps two colors """

preprocessingからは、

find_color_boundaries(array, color):

""" looks for the boundaries of any color and returns them """

find_glid(image, frame=False, possible_colors=None):

""" looks for the grid in image and returns color and size """

get_color(color_dict, colors):

""" retrive the absolute number corresponding a color set by color_dict """

get_color_max(image, color):

""" return the part of image inside the color boundaries """

get_dict_hash(d):

get_grid(image, grid_size, cell, frame=False):

""" returns the particular cell form the image with grid """

get_mask_from_block_params(image, params, block_cashe=None, mask_cashe=None, color_scheme=None)

get_predict(image, transform, block_cash=None, color_scheme=None):

""" applies the list of transforms to the image """

preprosess_sample(sample, param=None, color_param=None, process_whole_ds=False):

""" make the whole preprocessing for particular sample """

が呼び出され、

predictorsには、以下のクラスがある。

1. Predictor

Puzzle(Predictor):

""" Stack different blocks together to get the output """

PuzzlePixel(puzzle):

""" very similar to puzzle but applicable only to pixel_level blocks """

Fill(predictor):

""" applies different rules using 3x3 masks """

Fill3Colors(Predictor):

""" same as Fill but iterates over 3 colors """

FillWithMask(Predictor):

""" applies rules based on masks extracted from images """

FillPatternFound(Predictor):

""" applies rules based on masks extracted from images """

ConnectDot(Predictor):

""" connect dot of same color, on one line """

ConnectDotAllColors(Predictor):

""" connect dot of same color, on one line """

FillLines(Predictor):

""" fill the whole horizontal and/or vertical lines of one color """

11.

ReconstructMosaic(Predictor)

""" reconstruct mosaic """

12.

ReconstructMosaicRR(Predictor):

""" reconstruct mosaic using rotations and reflections """

13.

ReconstructMosaicExtract(ReconstructMosaic):

""" returns the reconstructed part of the mosaic """

14.

ReconstructMosaicRRExtract(ReconstructMosaicRR):

""" returns the reconstructed part of the rotate/reflect mosaic """

15.

Pattern(Predictor)

""" applies pattern to every pixel with particular color """

16.

PatternFromBlocks(Pattern):

""" applies pattern extracted form some block to every pixel with particular color """

17.

Gravity(Predictor):

""" move non_background pixels toward something """

18.

GravityBlocks(Predictor):

""" move non_background objects toward something """

19.

GravityBlocksToColors(GravityBlocks):

""" move non_background objects toward color """

20.

GravityToColor

21. EliminateColor

22. EliminateDuplicate

23. ReplaceColumn

24. CellToColumn

25. PutBlochIntoHole

26. PutBlockOnPixel

27. EliminateBlock

28. InsideBlock

29. MaskToBlock

30. Colors

31. ExtendTargets

32. ImageSlicer

33. MaskToBlockParallel

34. RotateAndCopyBlock

わかりやすい課題を１つ選んで、詳細を見ていこう。

まず最初に、入力として与えられたもの（イメージ、図柄、グリッドパターン）の、colorと、blockと、maskを、JSON-like objectで表現する。

JSONの文法は、こんな感じ。

{ "name": "Suzuki", "age": 22}

それぞれ、以下のように説明されている。

しかし、これだけ見ても、なかなか、理解できない。

実際のパターンと見比べたり、preprocessing.pyのコードを見て学んでいくしかない。

この作業は、ARCの本質的な部分でもあるので、じっくり検討しよう。

2.1.1 Colors

I use a few ways to represent colors; below are some of them:

Absolute values. Each color is described as a number from 0 to 9. Representation: {"type”: "abs”, "k”: 0}
あらかじめ決められている数字と色の対応関係：0:black, 1:blue, 2:red, 3:green, 4:yellow, 5:grey, 6:magenda, 7:orange, 8:sky, 9:brown
The numerical order of color in the list of all colors presented in the input image sorted (ascending or descending) by the number of pixels with these colors. Representation: {"type”: "min”, "k”: 0}, {"type”: "max”, "k”: 0}
色の並び、０（黒）を最大とみなすか、最小とみなすか。どういう使い方をするのだろうか。
The color of the grid if there is one on the input image. Representation: {"type”: "grid_color”}
単色（出力に単色はあるが、入力で単色というのはあっただろうか）。それとも、黒地に単色パターンという意味だろうか。
The unique color in one of the image parts (top, bottom, left, or right part; corners, and so on). Representation: {"type": "unique", "side": "right"}, {"type": "unique", "side": "tl"}, {"type": "unique", "side": "any"}
上下左右隅のどこかの部分の色だけが異なっている。"tl"は、top+leftのことだろうか？
No background color for cases where every input has only two colors and 0 is one of them for every image. Representation: {"type": "non_zero"}
入力グリッドが2色からなっている場合に、通常は、０：黒をバックグラウンドとして扱うが、黒が他の色と同じように扱われている場合には、"non_zero"と識別するということか。

Etc.

2.1.2 Blocks

Block is a 10-color image somehow derived from the input image.

Each block is represented as a list of dicts; each dict describes some transformation of an image.

One should apply all these transformations to the input image in the order they are presented in the list to get the block.

Below are some examples.

The first order blocks (generated directly from the original image):

The image itself. Representation: [{"type": "original"}]
One of the halves of the original image. Representation: [{"type": "half", "side": "t"}], [{"type": "half", "side": "b"}], [{"type": "half", "side": "long1"}]
上半分、下半分、"long1"：意味不明
"t" : top, "b" : bottom
The largest connected block excluding the background. Representation: [{"type": "max_block", "full": 0}]
バックグラウンド以外で、もっとも大きなブロックに着目する、ということか。"full": 0は、バックグラウンドが黒（０）ということか？
The smallest possible rectangle that covers all pixels of a particular color. Representation: [{"type": "color_max", "color": color_dict}] color_dict – means here can be any abstract representation of color, described in 2.1.1.
特定の色で最小サイズの矩形ブロックのことか？"color_max"の意味が不明
Grid cell. Representation: [{"type": "grid", "grid_size": [4,4],"cell": [3, 1],"frame": True}]
グリッドが全体で、セルは部分を指しているのか？
The pixel with particular coordinates. Representation: [{"type": "pixel", "i": 1, "j": 4}]
particular coordinateとi,jの関係が不明

Etc.

The second-order blocks – generated by applying some additional transformations to the other blocks:

Rotation. Representation: [source_block ,{"type": "rotation", "k": 2}] source_block means that there can be one or several dictionaries, used to generate some source block from the original input image, then the rotation is applied to this source block
回転、"k"は単位操作の繰り返し回数か？
Transposing. Representation: [source_block ,{"type": "transpose"}]
"transpose" : 行と列を入れ替える
Edge cutting. Representation: [source_block ,{"type": "cut_edge", "l": 0, "r": 1, "t": 1, "b": 0}] In this example, we cut off 1 pixel from the left and one pixel from the top of the image.
端部のカット：数字がピクセル数だとすれば、rightとtopから1ピクセルカットするということになる。説明が間違っているのか？
Resizing image with some scale factor. Representation: [source_block , {"type": "resize", "scale": 2}], [source_block , {"type": "resize", "scale": 1/3}]
2倍、3分の1倍
Resizing image to some fixed shape. Representation: [source_block , {"type": "resize_to", "size_x": 3, "size_y": 3}]
x方向に3倍、y方向にも3倍ということか？
Swapping some colors. Representation: [source_block , {"type": "color_swap", "color_1": color_dict_1, "color_2": color_dict_2}]
色の交換

Etc.

There is also one special type of blocks - [{"type": "target", "k": I}]. It is used when for the solving ot the task we need to use the block not presented on any of the input images but presented on all target images in the train examples. Please, find the example below.
次の図のように、入力画像に含まれず、出力画像（target）にのみ含まれるブロック構造を指す。

train1

2.1.3 Masks

Masks are binary images somehow derived from original images. Each mask is represented as a nested dict.

Initial mask literally: block == color. Representation: {"operation": "none", "params": {"block": bloc_list,"color": color_dict}} bloc_list here is a list of transforms used to get the block for the mask generation
Logical operations over different masks Representation: {"operation": "not", "params": mask_dict}, {"operation": "and", "params": {"mask1": mask_dict 1, "mask2": mask_dict 2}}, {"operation": "or", "params": {"mask1": mask_dict 1, "mask2": mask_dict 2}}, {"operation": "xor", "params": {"mask1": mask_dict 1, "mask2": mask_dict 2}}
Mask with the original image's size, representing the smallest possible rectangle covering all pixels of a particular color. Representation: {"operation": "coverage", "params": {"color": color_dict}}
Mask with the original image's size, representing the largest or smallest connected block excluding the background. Representation: {"operation": "max_block"}

オリジナルイメージサイズのマスクの例

オリジナルイメージを4x4に拡大した後に、オリジナルイメージでマスクしている！

train1

以下の2組もマスクの例

f:id:AI_ML_DL:20200605120419p:plain

f:id:AI_ML_DL:20200605120609p:plain

You can find more information about existing abstractions and the code to generate them in preprocessing.py.

2.2 Predictors

I have created 32 different classes to solve different types of abstract task using the abstractions described earlier.

All of them inherit from Predictor class.

The general logic of every predictor is described in the pseudo-code below (also, it can be different some classes).

for n, (input_image, output_image) in enumerate(sample['train']):

list_of_solutions = [ ]

for possible_solution in all_possible_solutions:

if apply_solution(input_image, possible_solution) == output_image:

list_of_solutions.append(possible_solution)

if n == 0:

final_list_of_solutions = list_of_solutions

else:

final_list_of _solutions = intersection(list_of_solutions, final_list_of _solutions)

if len(final_list_of_solutions == 0

return None

answers = [ ]

for test_input_image in sample['test']:

answers.append([ ])

for solution in final_list_of_solutions:

answers[-1].append(apply_solution(test_input_image, solution))

return answers

The examples of some predictors and the results are below.

・Puzzle - generates the output image by concatenating blocks generated from the input image

見た目は非常に簡単なのだが、プログラムは130行くらいある。

まずは、写経

# puzzle like predictors

class Puzzle(Predictor):

""" Stack different blocks together to get the output """

def __init__(self, params=None, preprocess_params=None):

super( ).__init__(params, preprocess_params)

self.intersection = params["intersection"]

def initiate_factors(self, target_image):

t_n, t_m = target_image.shape

factors = [ ]

grid_color_list = [ ]

if self.intersection < 0:

grid_color, grid_size, frame = find_grid(target_image)

if grid_color < 0:

return factors, [ ]

factors = [glid_size]

grid_color_list = self.sample["train"][0]["colors"][glid_color]

self.frame = frame

else:

for i in range(1, t_n, + 1):

for j in range(1, t_m + 1):

if (t_n - self.intersection) % 1 == 0 and (t_m - self.intersection) % j == 0:

factors.append([i, j])

return factors, grid_color_list

＊ここで、preprocessingのfind_grid( )を見ておこう。

def find_grid(image, frame=False, possible_colors=None):

""" Looks for the grid in image and returns color and size """

grid_color = -1

size = [1, 1]

if possible_colors is None:

possible_colors = list(range(10))

for color in possible_colors:

for i in range(size[0] +1, image.shape[0] // 2 + 1):

if (image.shape[0] +1) % i == 0:

step = (image.shape[0] +1) // i

if (image[(step - 1) : : step] == color).all( ):

size[0] = i

grid_color = color

for i in range(size[1] +1, image.shape[1] // 2 + 1):

if (image.shape[1] +1) % i == 0:

step = (image.shape[1] +1) // i

if (image[(step - 1) : : step] == color).all( ):

size[1] = i

grid_color = color

preprocessing.pyのコードの簡単なものから眺めていこう。

def get_rotation(image, k):

return 0, np.rot90(image, k)

kは整数で、回転角は、90 * kで、反時計回り。

def get_transpose(image):

return 0, np.transpose(image)

行と列の入れ替え（転置）

def get_roll(image, shift, axis)

return 0, np.roll(image, shift=shift, axis=axis)

＊またまた、途中で放り出すことになってしまった。

＊ARCに興味がなくなった。

＊知能テストを、ヒトが解くように解くことができるプログラムを開発するという目的において、重要なことは、例題から解き方を学ぶこと。

＊ARCは、どれも、例題が3つくらいある。複数の例題があってこそ、出力が一意に決まるものもあるが、１つの例題だけで済ませた方が楽しものも多く、それで出力が一意に決まるものを見つける方が楽しい。

＊あえて言えば、例題はすべて1つにして、複数の正解があってもいいのではないだろうか。

＊あとは、やはり、1つしかない例題から、変換方法を見つけ出すことを考えるようなプログラムを作ってみたいと思うので、そちらをやってみる。

おわり

f:id:AI_ML_DL:20200602101441p:plain — style=146 iteration=1

f:id:AI_ML_DL:20200602101548p:plain — style=146 iteration=20

f:id:AI_ML_DL:20200602101641p:plain — style=146 iteration=500

2020-05-20

Chapter 19 Training and Deploying TensorFlow Models at Scale

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron

Chapter 2は、KaggleのTitanicと並行して勉強していたので、何をどこまで学んだか忘れてしまったが、章のタイトルが"End-to-End Machine Learning Project"となっていて、最後の方に、"Lauch, Monitor, and Maintain Your System"という節があって、開発した機械学習モデルを市場に出して運用するところまで説明されていたのが強く印象に残っている。

プログラム開発は、モノづくりであって、市場に出してナンボ。

誰が、どこで、どのように使うのかを想定しておかないと、収集したデータも、開発したプログラムも、使われることなく埋もれてしまうことになりかねない。

研究開発して、論文発表して終わりということなら、関係ないかもしれない。

それでも、この分野の今後の発展を考えるならば、常に変化している最先端の開発環境を使えるようにしておくことも含めて、勉強しておこう。

ということで、まず、第2章の該当部分を復習することから始めよう。

Chapter 2: End-to-End Machine Learning Project

Lauch, Monitor, and Maintain Your System

Perfect, you got approval to launch!

You now need to get your solution ready for production (e.g., polich the code, write documantation and test, and so on).

Then you can deploy your model to your production environment.

One way to do this is to save the trained Scikit-Learn model (e.g., using joblib), including the full preprocessing and prediction pipeline, then load this trained model within your production environment and use it to make predictions by calling its predict( ) method.

For example, perhaps the model will be used within a website:

the user will type in some data about a new distinct and click the Estimate Price button.

This will send a query containing the data to the web server, which will forward it to your web application, and finally your code will simply call the model's predict( ) method (you want to load the model upon server startup, rather than every time the model is used).

Alternatively, you can wrap the model within a dedicated web service that your web application can query through a REST API.

REST API: In a nutshell, a REST (or RESTful) API is an HTTP-based API that follows some conventions, such as using standard HTTP verbs to read, update, or delete resources (GET, POST, PUT, and DELETE) and using JSON for the inputs and outputs.

This makes it easier to upgrade your model to new versions without interrupting the main application.

It also simplifies scaling, since you can start as many web services as needed and load-balance the requests coming from your web application across these web services.

Moreover, it allows your web application to use any language, not just Python.

Anothe popular strategy is to deploy your model on the cloud, for eample on Google Cloud AI Platform (formerly known as Google Cloud ML Engine):

just save your model using joblib and upload it to Google Cloud Storage (GCS), then head over to Google Cloud AI Platform and create a new model version, pointing it to the GCS file.

That's it!

This gives you a simple web service that takes care of load balancing and scaling for you.

It takes JSON requests containing the input data (e.g., of a district) and return JSON responses containing the predictions.

You can then use this web service in your website (or whatever production environment you are using).

As we will see in Chapter 19, deploying TensorFlow models on AI Platform is not much different from deploying Scikit-Learn models.

But deployment is not the end of the story.

You also need to write monitoring code to check your system's live performance at regular intervals and trigger alerts when it drops.

This could be a steep drop, likely due to a broken component in your infrastructure, but be aware that it could also be a gentle decay that could easily go unnoticed for a long time.

This is quite common because models tend to "rot" over time:

indeed, the world changes, so if the model was trained with last year's data, it may not be adapted to today's data.

Even a model trained to classify pictures of cats and dogs may need to be retrained regularly, not because cameras keep changing, along with image formats, sharpness, brightness, and size ratios.

Moreover, people may love different breeds next year, or they may decide to dress their pets with tiny hats - Who knows?

So you need to monitor your model's live performance.

But howdo you that?

Well, it depends.

In some cases the model's performance can be infered from downstream metrics.

Fore example, if your model is part of a recommender system and it suggests products that the users may be interested in, then it's easy to monitor the number of recommended products sold each day.

If this number drops (compared to nonrecommended products), then the prime suspect is the model.

This may be because the data pipeline is broken, or perhaps the model needs to be retrained on fresh data (as we will discuss shortly).

However, its not always possible to determine the model's performance without any human analysis.

For example, suppose you trained an image classification model (see Chapter 3) to detect several product defects on a production line.

How can you get an alert if the model's performance drops, before thousands of defective products get shipped to your cliants?

One solution is to send to human raters a sample of all the pictures that the model classified (especially pictures that the model wasn't so sure about).

Depending on the task, the raters may need to be experts, or they could be nonspecialists, such as workers on a crowdsourcing platform (e.g., Amazon Mechanical Turk).

In some applications they could even be the users themselves, responding for example via surveys or repurposed captchas.

Either way, you need to put in place a monitoring system (with or without human raters to evaluate the live model), as well as all the relevant processes to define what to do in case of failures and how to prepare for them.

Unfortunately, this can be a lot of work.

In fact, it is often much more work than building and training a model.

そりゃあ、遊び用のモデルと違って、生産工場で使うモデルは、根本的に違った設計になるのは当然である。

初期性能の維持は当然であり、欠陥の見逃しなど許されるはずがない。

最低でも、不良品の検出と良品の検出はパラレルで走らさないといけない。

良品と不良品の検知の経験を、モデルに対して定期的にフィードバックして、モデルの性能を向上させていくべきものでしょう。

ハードウエアの向上にも対応しないといけないし、それによる性能アップも必要。

複数のモデルを多重に走らせることが必要だろうな。

性能だけなら、凝ったディープラーニングモデルが高い性能を示すかもしれないが、そのDNNモデルにしても、簡単なものから複雑なものまでパラレルに走らせばいいし、予測能力の数値は低くても、安定して動作する機械学習モデルも並行して走らせておけばいいだろうし、・・・。

ランダムにサンプリングした高精度画像をオフラインで定期的に、あるいは、徹底的に検査・精査することも必要だろうし・・・。

画像も、可視だけでなく、赤外とか紫外とか、さらに、レーザー照射して干渉光を利用分光するとか、高速ラマン分光を使うとか、X線や電子線を照射して特性X線を検出するとか、・・・。

If the data keeps evolving, you will need to update your datasets and retrain your model regularly.

You should probably automate the whole process as much as possible.

Here are a few things you can automate:

・Collect fresh data regularly and label it (e.g., using human raters).

・Write a script to train the model and fine-tune the hyperparameters automatically.

This script could run automatically, fore example every day or every week, depending on your needs.

・Write another script that will evaluate both the new model and the previous model on the updated test set, and deploy the model to production if the performance has not decreased (if it did, make sure you investigate why).

You should also make sure you evaluate the model's input data quality.

Sometimes performance will degrade slightly because of a poor-quality signal (e.g., a malfunctioning sensor sending random values, or another team's output becoming stale), but it may take a while before your system's performance degrades enough to trigger an alart.

If you monitor your model's inputs, you may catch this earlier.

For example, you could trigger an alert if more and more inputs are missing a feature, or if its mean or standard deviation drifts too far from the training set, or a categorical feature starts containing new categories.

Finally, make sure you keep backups of every model you create and have the process and tools in place to roll back to a previous model quickly, in case the new model starts failing badly for some reason.

Having backups also makes it possible to easily compare new models with previous ones.

Similarly, you should keep backups of every version of your datasets so that you can roll back to a previous dataset if the new one ever gets corrupted (e.g., if the fresh data that gets added to it turns out to be full of outliers).

Having backups of your datasets also allows you to evaluate any model against any previous dataset.

You may want to create several subsets of the test set in order to evaluate how well your model performs on specific parts of the data.

For example, you may want to have a subset containing only the most recent data, or a test set for specific kinds of inputs (e.g., districts located inland versus districts located near the ocean).

This will give you a deeper understanding of your model's strengths and weaknesses.

As you can see, Machine Learning involves quite a lot of infrastructure, so don't be surprized if your first ML project takes a lot of effort and time to build and deploy to production.

Fortunately, once all the infrastructure is in place, going from idea to production will be much faster.

Chapter 19 Training and Deploying TensorFlow Models at Scale

A great solution to scale up your service, as we will see in this chapter, is to use TF Serving, either on your own hardware infrastructure or via a cloud service such as Google Cloud AI Platform.

It will take care of efficiently serving your model, handle graceful model transitions, and more.

If you use the cloud platform, you will also get many extra features, such as powerful monitoring tools.

In this chapter we will look at how to deploy models, first to

f:id:AI_ML_DL:20200520094423p:plain — style=140 iteration=1

2020-05-20

Chapter 18 Reinforcement Learning

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron

Reinforcement Learning (RL) is one of the most exciting fields of Machine Learning today, and also one of the oldeat.

It has been around since the 1950s, producing many interesting applications over the years, particularly in games (e.g., TD-Gammon, a Backgammon-playing program) and in machine control, but seldom making the headline news.

But a revolution took place in 2013, when researchers from a British startup called DeepMind demonstrated a system that could learn to play just about any Atari game from scratch (https://homl.info/dqn), eventually outperforming humans (https://homl.info/dqn2) in most of them, using only raw pixels as imputs and without any prior knowledge of the rules of the games.

This was the first of a series of amazing feats, culminating in March 2016 with the victory of their system AlphaGo against Lee Sedol, a legendary professional player of the game of Go, and in May 2017 against Ke Jie, the world champion.

No program had ever come close to beating a master of this game, let alone the world champion.

Today the whole field of RL is boiling with new ideas, with a wide range of applications.

DeepMind was bought by Google for over $500 million in 2014.

So how did DeepMind achieve all this?

With hindsight it seems rather simple: they applied the power of Deep Learning to the field of Reinforcement Learning, and it worked beyond their wildest dreams.

In this chapter we will first explain what Reinforcement Learning is and what it's good at, then present two of the most important techniques in Deep Reinforcement Learning: policy gradients and deep Q-networks (DQNs), including a discussion of Markov decision processes (MDPs).

We will use these techniques to train models to balance a pole on a moving cart; then I'll introduce th TF-Agents library, which uses state-of-the-art algorithms that greatly simplify building powerful RL systems, and we will use the librsry to train an agent to play Breakout, the famous Atari game.

I'll close the chapter by taking a look at some of the latest advances in the field.

Learning to Optimize Rewards

In Reinforcement Learning, a software agent makes observations and takes actions within an environment, and in return it receives rewards.

Its objective is to learn to act in a way that will maximize its expected rewards over time.

If you don't mind a bit of anthropomorphism, you can think of positive rewards as pleasure, and negative rewards as pain ( the term "reward" is a bit misleading in this case).

In short, the agent acts in the environment and learns by trial and error to maximize its pleasure and minimize its pain.

This is quite a broad setting, which can apply to a wide variety of tasks.

Here are a few examples (see Figure 18-1):

a. The agent can be the program controlling a robot.

In this case, the environment is the real world, the agent observes the environment through a set of sensors such as cameras and touch sensors, and its actions consist of sending signals to active motors.

It may be programmed to get positive rewards whenever it approaches the target destination, and negative rewards whenever it wastes time or goes in the wrong direction.

b. The agent can be the program controlling Ms. Pac-Man.

In this case, the environment is a simulation of the Atari game, the actions are the nine possible joystick positions (upper left, down, center, and so on), the observations are screenshots, and the rewards are just the game points.

c. Similarly, the agent can be the program playing a board game such as Go.

d. The agent does not have to control a physically (or virtually) moving thing.

For example, it can be a smart thermostat, getting positive rewards whenever it is close to the target temperature and saves energy, and negative rewarda when humans need to tweak the temperature, so the agent must learn to anticipate human needs.

e. The agent can observe stock market prices and decide how much to buy or sell every second.

Rewards are obviously the monetary gains and losses.

Note that there may not be any positive rewards at all; for example, the agent may move around in a maze, getting a negative reward at every time step, so it had better find the exit as quickly as possible!

There are many other examples of tasks to which Reinforcement Learning is well suited, such as self-driving cars, recommender systems, placing ads on a web page, or controlling where an image classification system should focus its attention.

Policy Search

The algorithm a software agent uses to determine its actions is called its policy.

The policy couuld be a neural network taking obsevations as inputs and outputting tha action to take (see Figure 18-2).

The policy can be any algorithm you can think of, and it does not have to be deterministic.

In fact, in some cases it does not even have to observe the environment!

For example, consider a robotic vacuum cleaner whose reward is the amount of dust it picks up in 30 minutes.

Its policy could be to move forward with some probability p every second, or randomly rotate left or right with probability 1 - p.

The rotation angle would be a random angle between -r and +r.

Since this policy involves some randomness, it is called stochastic policy.

The robot will have an erratic trajectry, which guarantees that it will eventually get to any place it can reach and pick up all the dust.

The question is , how much dust will it pick up in 30 minutes?

How wold you train such a robot?

There are just two policy parameters you can tweak: the probability p and the angle range r.

One possible learning algorithm could be to try out many different values for these parameters, and pick the combination that performs best (see Figure 18-3).

This is an example of policy search, in this case using a brute force approach.

When the policy space is too large (which is generally the case), finding a good set of parameters this way is like searching for a needle in a gigantic haystack.

Anothe way to explore the policy space is to use genetic algorithms.

For example, you could randomly create a first generation of 100 policies and try them out, then "kill" the 80 worst policies and make the 20 survivors produce 4 offspring each.

An offspring is a copy of its parent plus some random variation.

The surviving policies plus their offspring together constitute the second generation.

You can continue to iterate through generations this way until you find a good policy.

Yet another approach is to use optimization techniques, by evaluating the gradients of the rewards with regard to the policy parameters, then tweaking these parameters by following the gradients toward higher rewards.

We will discuss this approach, is called policy gradients (PG), in more detail later in this chapter.

Going back to the vacuum cleaner robot, you could slightly increase p and evaluate whether doing so increase the amount of dust picked up by the robot in 30 minutes; if it does, then increase p some more, or else reduce p.

We will implement a popular PG algorithm using TensorFlow, but before we do, we need to create an environment for the agent to live in - so it's time to introduce OpenAI Gym.

Introduction to OpenAi Gym

Here, we've created a CartPole environment.

This is a 2D simulation in which a cart can be accelerated left or right in order to balance a pole placed on top of it (see Figure 18-4).

You can get the list of all available environments by running gym.envs.registry.all( ).

After the environment is created, you must initialize it using the reset( ) method.

This returns the first observation.

Obsevations depend on the type of environment.

For the CartPole environment, each observation is a 1D NumPy array containing four floats: these floats represent the cart's horizontal position (0.0 = center), its velocity (positive means right), the angle of the pole (0.0 = vertical), and its angular velosity (positive means clockwise).

Neural Network Policies

Lat's create a neural network policy.

Just like with the policy we hardcoded earlier, this neural network will take an observation as input, and it will output the action to be executed.

More precisely, it will estimate a probability for each action, and then we will select an action randomly, according to the estimated probabilities (see Figure 18-5).

In the case of the CartPole environment, there are just two possible actions (left or right), so we only need one output neuron.

It will output the probability p of action 0 (left), and of course the probability of action 1 (right) will be 1 - p.

For example, if it outputs 0.7, then we will pick action 0 with 70% probability, or action 1 with 30% probability.

You may wonder why we are picking a random action based on the probabilities given by the neural network, rather than just picking the action with the highest score.

This approach lets the agent find the right balance between exploring new actions and exploiting the actions that are known to work well.

Here's an analogy: suppose you go to a restaurant for the first time, and all the dishes look equally appealing, so you randomly pick one.

If it turns out to be good, you can increase the probability that you'll order it next time, but you shouldn't increase that probability up to 100%, or else you will never try out the other dishes, some of which may be even better than the one you tried.

Also note that in this particular environment, the past actions and observations can safely be ignored, since each observation contains the environment's full state.

If there were some hidden state, then you might need to consider past actions and obsevations as well.

For example, if the environment only revealed the position of the cart but not velocity, you would have to consider not only the current observation but also the previous observation in order to estimate the current velocity.

Another example is when the observations are noisy; in that case, you generally want to use the past few observations to estimate the most likely current state.

The CartPole problem is thus as simple as can be; the observations are noise-free, and they contain the environment's full state.

＊意味が全く理解できない。

Evaluation Actions: The Credit Assignment Problem

Policy Gradients

Markov Decision Processes

Temporal Difference Learning

Q-Learning

Exploration Policies

Approximate Q-Learning and Deep Q-Learning

Implementing Deep Q-Learning

Deep Q-Learning Variants

Fixed Q-Value Targets

Double DQN

Priotized Experience Replay

Dueling DQN

The TF-Agents Library

Inatalling TF-Agents

TF-Agents Environments

Environment Specifications

Environment Wrappers and Atari Preprocessing

Training Architecture

Creating the Deep Q-Network

Creating the DQN Agent

Creating the Replay Buffer and the Corresponding Observer

Creating Training Metrics

Creating the Collect Driver

Creating the Dataset

Creating the Training Loop

Ovrrview of Some Popular RL Algolism

Exercises

f:id:AI_ML_DL:20200520093812p:plain — style=139 iteration=1

2020-05-20

Chapter 17 Representation Learning and Generative Learning Using Autoencoders and GANs

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron

Autoencoders are artificial neural networks capable of learning dense representaions of the input data, called latent representations or codings, without any supervision (i.e., the training set is unlabeled).

These codings typically have a much lower dimensionality than the input data, making autoencoders useful for dimensionality reduction (see Chapter 8), especially for visualization purposes.

Autoencoders also act as feature detectors, and they can be used for unsupervised pretraining of deep neurel networks (as we discussed in Chapter 11).

Lastly, some autoencoders are generative models: they are capable of randomely generating new data that looks very similar to the training data.

For example, you could train an autoencoder on pictures of faces, and it would then be able to generate new faces.

However, the generated images are usually fuzzy and not entirely realistic.

In contrast, faces generated by generative adversarial networks (GANs) are now so convincing that it is hard to believe that the people they represent do not exist.

You can judge so for youself by visng https://thispersondoesnotexist.com/, a website that shows faces generated by a recent GAN architecture called StyleGAN (you can also check out https://thisrentaldoesnotexist.com/ to see some generated Airbnb bedrooms).

GANs are now widely used for super resolution (increasing the resolution of an image), colorization (https://github.com/jantic/DeOldify), poweful image editing (e.g., replacing photo bombers with realistic background), turning a simple sketch into a photorealistic image, predicting the next frames in a video, augmenting a dataset (to train other models), generating other types of data (such as text, audio, and time series), identifying the weaknesses in other models and strengthening them, and more.

Autoencoders and GANs are both unsupervised, they both learn dense representations, they can both be used as generative models, and they have many similar applications.

However, they work very differently:

Efficient Data Representations

Performing PCA with an Undercomplete Linear Autoencoder

Stacked Autoencoders

Implementing a stacked Autoencoder Using Keras

Visualizing the Reconstructions

Visualizing the Fashion MNIST Dataset

Unsupervised Pretraining Using Stacked Autoencoders