Persistent Homology — a Survey Herbert Edelsbrunner and John Harer, Article · January 2008, DOI: 10.1090/conm/453/08802
ABSTRACT.
Persistent homology is an algebraic tool for measuring topological features of shapes and functions. It casts the multi-scale organization we frequently observe in nature into a mathematical formalism. Here we give a record of the short history of persistent homology and present its basic concepts. Besides the mathematics we focus on algorithms and mention the various connections to applications, including to biomolecules, biological networks, data analysis, and geometric modeling.
A Topological Loss Function for Deep-Learning based Image Segmentation using Persistent Homology James R. Clough, Nicholas Byrne, Ilkay Oksuz, Veronika A. Zimmer, Julia A. Schnabel, Andrew P. King Abstract
We introduce a method for training neural networks to perform image or volume segmentation in which prior knowledge about the topology of the segmented object can be explicitly provided and then incorporated into the training process. By using the differentiable properties of persistent homology, a concept used in topological data analysis, we can specify the desired topology of segmented objects in terms of their Betti numbers and then drive the proposed segmentations to contain the specified topological features. Importantly this process does not require any ground-truth labels,just prior knowledge of the topology of the structure being segmented. We demonstrate our approach in four experiments: one on MNIST image denoising and digit recognition, one on left ventricular myocardium segmentation from magnetic resonance imaging data from the UK Biobank, one on the ACDC public challenge dataset and one on placenta segmentation from 3-D ultrasound. We find that embedding explicit prior knowledge in neural network segmentation tasks is most beneficial when the segmentation task is especially challenging and that it can be used in either a semi-supervised or post-processing context to extract a useful training gradient from images without pixelwise labels.
Explicit topological priors for deep-learning based image segmentation using persistent homology
James R. Clough, Ilkay Oksuz, Nicholas Byrne, Julia A. Schnabel and Andrew P. King School of Biomedical Engineering & Imaging Sciences, King’s College London, UK
1 Introduction Image segmentation, the task of assigning a class label to each pixel in an image, is a key problem in computer vision and medical image analysis. The most successful segmentation algorithms now use deep convolutional neural networks (CNN), with recent progress made in combining fine-grained local features with coarse-grained global features, such as in the popular U-net architecture [17]. Such methods allow information from a large spatial neighbourhood to be used in classifying each pixel. However, the loss function is usually one which considers each pixel individually rather than considering higher-level structures collectively.
In many applications it is important to correctly capture the topological characteristics of the anatomy in a segmentation result. For example, detecting and counting distinct cells in electron microscopy images requires that neighbouring cells are correctly distinguished. Even very small pixelwise errors, such as incorrectly labelling one pixel in a thin boundary between cells, can cause two distinct cells to appear to merge. In this way significant topological errors can be caused by small pixelwise errors that have little effect on the loss function during training but may have large effects on downstream tasks. Another example is the modelling of blood flow in vessels, which requires accurate determination of vessel connectivity. In this case, small pixelwise errors can have a significant impact on the subsequent modelling task. Finally, when imaging subjects who may have congenital heart defects, the presence or absence of small holes in the walls between two chambers is diagnostically important and can be identified from images, but using current techniques it is difficult to incorporate this relevant information into a segmentation algorithm. For downstream tasks it is important that these holes are correctly segmented but they are frequently missed by current segmentation algorithms as they are insufficiently penalised during training. See Figure 1 for examples of topologically correct and incorrect segmentations of cardiac magnetic resonance images (MRI).
Persistent-Homology-based Machine Learning and its Applications – A Survey Chi Seng Pun et al., arXiv:1811.00252v1 [math.AT] 1 Nov 2018
Abstract A suitable feature representation that can both preserve the data intrinsic information and reduce data complexity and dimensionality is key to the performance of machine learning models. Deeply rooted in algebraic topology, persistent homology (PH) provides a delicate balance between data simplification and intrinsic structure characterization, and has been applied to various areas successfully. However, the combination of PH and machine learning has been hindered greatly by three challenges, namely topological representation of data, PH-based distance measurements or metrics, and PH-based feature representation. With the development of topological data analysis, progresses have been made on all these three problems, but widely scattered in different literatures. In this paper, we provide a systematical review of PH and PH-based supervised and unsupervised models from a computational perspective. Our emphasizes are the recent development of mathematical models and tools, including PH softwares and PH-based functions, feature representations, kernels, and similarity models. Essentially, this paper can work as a roadmap for the practical application of PH-based machine learning tools. Further, we consider different topological feature representations in different machine learning models, and investigate their impacts on the protein secondary structure classification.
この論文では、計算の観点から、PHおよびPHベースの教師ありモデルと教師なしモデルの体系的なレビューを提供します。私たちが強調しているのは、PHソフトウェアとPHベースの関数、特徴表現、カーネル、類似性モデルなど、数学モデルとツールの最近の開発です。基本的に、このペーパーは、PHベースの機械学習ツールの実用化のためのロードマップとして機能します。さらに、さまざまな機械学習モデルでさまざまな位相的特徴表現を検討し、タンパク質の二次構造分類への影響を調査します。 by Google翻訳
この1か月間でmachine learning, deep learingの燃料電池開発への応用について学ぶ。
Fundamentals, materials, and machine learning of polymer electrolyte membrane fuel cell technology Yun Wang et al., Energy and AI 1 (2020) 100014
Machine learning and artificial intelligence (AI) have received increasing attention in material/energy development. This review also discusses their applications and potential in the development of fundamental knowledge and correlations, material selection and improvement, cell design and optimization, system control, power management, and monitoring of operation health for PEM fuel cells, along with main physics in PEM fuel cells for physics-informed machine learning.
4. Machine learning in PEMFC development
4.1. Machine learning overview
According to learning style, machine learning algorithms can be generally classified into three types: supervised learning(教師あり学習), unsupervised learning(教師なし学習), and reinforcement learning(強化学習), as shown in Table 9 .
Table 10 lists popular supervised learning algorithms and their characteristics.
Deep learning is the ANN with deep structures or multi-hidden layers [229-232] .
It can achieve good performance with the support of big data and complex physics, and has a much simpler mathematical form than many traditional machine learning algorithms.
However, deep learning relies on big data, and thus traditional machine learning still have strong applications, especially for interdisciplinary studies, and can solve problems with reasonable amounts of data.
Many open-source machine learning frameworks have been developed and made available to the general public, including Scikit-Learn, Caffe2, H2O, PyTorch (for neural networks), TensorFlow (for neural networks), and Keras (for neural networks).
4.2. Machine learning for performance prediction
PEMFC performance is characterized by the polarization curve, also called the I-V curve, which is determined by a number of factors including fuel cell dimensions, material properties, operation conditions, and electrochemical/physical processes [233-236] .
Various physical models and experimental methods have been proposed to predict or di- rectly measure the I-V curve, which are reviewed by many other works [ 158 , 160 , 202 , 237 ].
As an alternative approach, machine learning is capable of establishing the relationship between inputs and output performance through proper training of existing data, as shown in Fig. 18 .
Mehrpooya et al. [233] experimentally constructed a database of PEMFC performance under various inlet humidity, temperature, and oxygen and hydrogen flow rates.
A two-hidden-layer ANN was then trained using the database to predict the performance under new conditions.
Total 460 points are contained in the database with 400 for training and 60 for testing, and R 2 of 0.982 (for the training) and 0.9723 (for the test) was achieved in their study.
(このレベルの内容では、手間がかかる割には、効果は少ない(小さい)と思う。)
Unlike physical models, the mapping between inputs and outputs constructed by machine learning models does not follow an actual physical process; thus, the machine learning approach is also called the blackbox model.
Machine learning has unique advantages in PEMFC modeling, which requires no prior knowledge, especially of the complex coupled transport and electrochemical processes occurring in PEMFC operation.
This significantly reduces the level of modeling difficulty and also makes it possible to take into account any processes in which the physical mechanisms are not yet known or formulated.
The machine learning method is also advantageous in terms of computational efficiency in the implementation process after proper training.
This characteristic makes machine learning potentially extremely important in the practical PEMFC applications which usually involve a large size multiple-cell system, dynamic variation, and long-term operation.
For a complex physical model that takes multi-physics into account, the computational and time costs are usually too high; a simplified physical model lacks of high prediction accuracy.
For even a small scale stack of 5–10 cells, physics model-based 3D simulation usually requires 10–100 million gridpoints and takes days or weeks for predicting one case of steady-state operation [ 158 , 160 , 241 ].
In this regard, machine learning could greatly help to broaden the application of complex physical models by leveraging on prediction accuracy and computational efficiency.
Using the simulation data from complex physical models to train a machine learning model is a popular approach, usually referred to as surrogate modeling.
A surrogate model can replace the complex physical model with similar prediction accuracy but higher computational efficiency.
Wang et al. [242] developed a 3D fuel cell model with a CL agglomerate sub-model to construct a database of the PEMFC performance with various CL compositions.
A data-driven surrogate model based on the SVM was then trained using the database, which exhibited comparable prediction capability to the original physical model with several-order higher computational efficiency.
It only took a second to predict an I-V curve using the surrogate model versus hundreds of processor-hours using the 3D physics-based model.
Owing to its computational efficiency of the surrogate model, the surrogate model, coupled with a generic algorithm (GA), is suitable for CL composition optimization.
Similarly, Khajeh-Hosseini-Dalasm et al. [243] combined a CL physical model and ANN to develop a surrogate model to predict the cathode CL performance and activation overpotential.
For fast prediction of the multi-physics state of PEM fuel cell, Wang et al. [244] developed a data-driven digital twinning frame work, as shown in Fig. 20 .
A database of temperature, gas reactant, and water content fields in a PEM fuel cell under various operating conditions was constructed using a 3D physical model.
Both ANN and SVM were used to solve the multi-physics data with spatial distribution characteristics.
The data-driven digital twinning framework mirrored the distribution characteristics of multi-physics fields, and ANN and SVM exhibited different prediction performances on different physics fields.
There is a great potential to improve the current two-phase models (e.g. the two-fluid and mixture approaches) of PEM fuel cells by using AI technology, for example, machine learning analysis of visualization data and VOF/LBM simulation results.
Physics-informed neural networks were recently proposed by Raissi et al. [174] , known as hidden fluid mechanics (HFM), to encode the Navier-Stokes (NS) equation into deep learning for analyzing fluid flow images, as shown in Fig. 21 .
Such a strategy can be extended to the deep learning of two-phase flow and fuel cell performance by incorporating relevant physics, such as the capillary pressure correlation, Darcy’s law, and the Butler-Volmer equation, into the neural networks.
Machine learning is widely used in the chemistry and material communities to discover new material properties and develop next generation materials [245-247] .
Experimental measurement, characterization and theoretical calculation are main traditional methods to diagnose or predict the properties of a material, which are usually expensive in terms of cost, time, and computational resources.
Material properties are influenced by many intricate factors, which increases the difficulty level in the search for optimal material synthesis using only traditional methods.
Machine learning can assist in material selection and property prediction using existing databases, which is advantageous in taking into account unknown physics and greatly increasing the efficiency.
As example, in the catalyst design absorbate binding energy prediction by the empirical Sabatier principle is widely used for the optimization of activity in catalyst design ( Fig. 22 (a)) [247] .
To remove the empirical equation, a database of binding energy for different catalyst structures constructed by characterization or theoretical calculation is used to train a machine learning model, which shows a great efficiency in predicting the catalyst activity in a wide range to identify the optimal solution of the catalyst structure ( Fig. 22 (b)).
Owing to the great potentials of machine learning in chemistry and materials science, professional tools have been developed, along with universal machine learning frameworks, and numerous structure and property databases for molecules and solids can be easily accessed to model training.
Popular professional machine learning tools and databases are summarized in Table 12.
4.4. Machine learning for durability
A durable and stable PEM fuel cell that is reliable for the entire life of the system is crucial for its commercialization.
Thus, it is important to predict the state of health (SoH), the remaining useful life (RUL), and durability of PEM fuel cell using the data generated from monitoring units.
The cell voltage is the most important indicator of fuel cell performance and thus is a popular output parameter in the machine learning.
In recent years, machine learning has been employed to predict fuel cell durability and SoH, which can generally be classified as model-based and data-driven approaches.
Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations Maziar Raissi, Alireza Yazdani, and George Em Karniadakis, Science 367, 1026–1030 (2020)
For centuries, flow visualization has been the art of making fluid motion visible in physical and biological systems. Although such flow patterns can be, in principle, described by the Navier-Stokes equations, extracting the velocity and pressure fields directly from the images is challenging. We addressed this problem by developing hidden fluid mechanics (HFM), a physics-informed deep-learning framework capable of encoding the Navier-Stokes equations into the neural networks while being agnostic to the geometry or the initial and boundary conditions. We demonstrate HFM for several physical and biomedical problems by extracting quantitative information for which direct measurements may not be possible. HFM is robust to low resolution and substantial noise in the observation data, which is important for potential applications.
We developed an alternative approach, which we call hidden fluid mechanics (HFM), that simultaneously exploits the information available in snapshots of flow visualizations and the NS equations, combined in the context of physicsinformed deep learning (5) by using automatic differentiation. In mathematics, statistics, and computer science—in particular, in machine learning and inverse problems—regularization is the process of adding information in order to prevent overfitting or to solve an ill-posed problem. The prior knowledge of the NS equations introduces important structure that effectively regularizes the minimization procedure in the training of neural networks. For example, using several snapshots of concentration fields (inspired by the drawings of da Vinci in Fig. 1A), we obtained quantitatively the velocity and pressure fields (Fig. 1, B to D).
一般に金属表面上の気体の化学吸着にはあまり活性化エネルギーを必要とはしない。J. K. Robertsは、注意してきれいにした金属線上への水素の吸着は約25°Kでさえも速やかに進行し、強く水素原子の吸着された単分子層(単原子層)を作ることを示した。このときの吸着熱は、金属の水素化物の共有結合を作るのに要する熱量に近い。
For the hydrogen oxidation reaction (HOR) and oxygen reduction reaction (ORR) to proceed efficiently, the materials used in fuel cells must be chosen so that a high beginning of life performance and durability are ensured.
For example, to improve the activation and reduce transport losses, various issues as discussed earlier need to be addressed, including durable electrocatalyst and its loading reduction [2] , reactant/membrane contamination [ 91 , 92 ], water management [ 93 , 94 ], and degradation [ 95 , 96 ].
Material advance and improvement are therefore important for fuel cell R&D, and fundamentals that establish the material properties and fuel cell performance under various operation conditions are highly needed.
3.1. Materials
3.1.1. Membrane
The PEM is located between the anode and cathode CLs.
Its main functions are two-fold:
(i) it acts as a separator between the anode and the cathode reactant gasses and electrons, and
(ii) it conducts protons from the anode to cathode CLs.
Therefore, as a separator it must be impermeable to gasses (i.e., it should not allow the crossover of hydrogen and oxygen) and must be electrically insulating.
In addition, the membrane material must withstand the harsh operating conditions of PEM fuel cells, and thus possess high chemical and mechanical stability [97] .
The CL material is a major factor affecting fuel cell performance and durability.
Conventional CLs are composed of electrocatalyst, carbon support, ionomer, and void space.
従来型の触媒層は、電極触媒、炭素支持体、アイオノマー、及び、空隙からなる。
Optimization of the CL ink preparation has been the main driver in PEMFC development [ 21 , 102 ].
This breakthrough highlights the importance of the so-called triple-phase boundaries of the ionomer, Pt/C, and void space so that all reactants could access for the reactions.
Conventional CLs are prepared based on the dispersion of a catalyst ink comprising a Pt/C catalyst, ionomer, and solvent.
Ink composition is important for aggregation of the ionomer and agglomeration of carbon particles, and the dispersion medium governs the ink’s properties, such as the aggregation dimension of the catalyst/ionomer particles, viscosity, and rate of solidification, and ultimately, the electrochemical and transport properties of the CLs [103-105] .
The ionomer not only acts as a binder for the Pt/C particles but also proton conductor.
Imbalance in the ionomer loading increases the transport or ohmic loss, with a small amount of ionomer reducing the proton conductivity and a large amount increasing the transport resistance of gaseous reactants.
Understanding inks for porous-electrode formation Kelsey B. Hatzell, Marm B. Dixit, Sarah A. Berlinger and Adam Z. Weber J. Mater. Chem. A, 5, 20527 (2017)
Scalable manufacturing of high-aspect-ratio multi-material electrodes are important for advanced energy storage and conversion systems. Such technologies often rely on solution-based processing methods where the active material is dispersed in a colloidal ink. To date, ink formulation has primarily focused on macro-scale process-specific optimization (i.e. viscosity and surface/interfacial tension), and been optimized mainly empirically. Thus, there is a further need to understand nano- and mesoscale interactions and how they can be engineered for controlled macroscale properties and structures related to performance, durability, and material utilization in electrochemical systems.
In summary, there is a growing need for fabricating porous electrodes with unprecedented control of layer composition. Key to this is knowledge of the underlying physics and phenomena going from multicomponent dispersions and inks to casting/processing to 3D structure. While there has been some recent work as highlighted herein, a great deal remains to be accomplished in order to inform predictive and not empirical optimizations. Such investigations have occurred in other fields such as semiconductors and coatings and dispersions in general, but this has not been translated to thin-film properties and functional layers as occur in electrochemical devices. Overall, ink engineering is an exciting opportunity to achieve next-generation composite materials, but requires systematic studies to elucidate design rules and metrics and identify controlling parameters and phenomena.
Fundamentals, materials, and machine learning of polymer electrolyte membrane fuel cell technology, Yun Wang et al., Energy and AI 1 (2020) 100014
In contrast, non-conventional CLs are structured such that one of the major ingredients in their conventional counterparts is eliminated [ 2 , 102 ].
Nanostructured thin film (NSTF) CLs from 3 M are the most successful nonconventional CL.
They consist of whiskers where the catalyst is deposited without ionomer for proton conduction.
Over the years, they have proven to provide a higher activity than conventional CLs, as seen in Fig. 5 .
In addition, similar to conventional CLs, annealing can be used to change the CL structure and ultimately change its activity.
Fig. 5. Schematic illustration and corresponding HRTEM images of the mesoscale ordering during annealing and formation of the mesostructured thin film starting from the as-deposited Pt–Ni on whiskers (A), annealed at 300 °C (B) and 400 °C (C). Specific activities of Pt–Ni NSTF as compared to those of polycrystalline Pt and Pt-NSTF at 0.9 V (D) [106] . [106] van der Vliet DF , Wang C , Tripkovic D , et al. Mesostructured thin films as electrocatalysts with tunable composition and surface morphology. Nat Mater 2012;11:1051–8 .
8月10日(火):ペースアップ
Carbon is the most commonly used support material for catalyst because of its low cost, chemical stability, high surface area, and affinity for metallic nanoparticles.
The surface area of the support varies depending on its graphitization process and is reported to range from 10 to 2000 m 2 /g [107] .
Ketjen Black and Vulcan XC-72 are popular carbons with a surface area of 890 m 2 /g and 228 m 2 /g, respectively [108] .
Carbon tends to aggregate, forming carbon particle agglomerates with a bimodal pore size distribution (PSD).
This PSD is usually composed of the primary pores of typically 2–20 nm in size and sec- ondary pores larger than 20 nm.
The primary pores are located between carbon particles in an agglomerate, while the secondary pores are between agglomerates.
Depending on the Pt distribution and utilization within an agglomerate, the primary pores play a key role in determining the electrochemical kinetics, while the secondary pores are important for reactant transport across a CL.
The portion of the primary and secondary pores is largely determined by the surface area of the carbon support [108] .
Hence, it has been reported that carbon supports also determine the optimal ionomer content and the Pt distribution in CLs [ 109 , 110 ].
Additionally, the anode overpotential is usually considered negligible in comparison with its cathode counterpart because of the sluggish ORR.
Thus, most work in the literature is focused on cathode CLs.
CL optimization is focused on not only enhanced durability but also reduction of the Pt loading.
For this purpose, it is crucial to determine the optimal combination of the carbon support and catalyst for loading reduction.
An example is highlighted in Fig. 6 , where different carbons are heat-treated to induce the catalytic activities of PANI- derived catalysts and to ensure their performance and stability.
Rotating Ring-Disk Electrode (RDE) measurements were conducted to study the ORR activity of various heat-treated PANI-C catalysts as a function of temperature.
The durability and stability of CL material are a major subject in R&D, which is related to multiple factors, mainly including (i) operating and environmental conditions, (ii) oxidant and fuel impurities, and (iii) contaminants and corrosion in cell components.
For instance, operation under high voltages (above 1.35 V), which may occur during fuel cell startup and shut-down, can lead to Pt dissolution [112] .
Operation further above this voltage will cause degradation of the carbon support, known as carbon corrosion.
In addition, any traces of a contaminant in the fuel or oxidant feeds can lead to a decrease in fuel cell performance by poisoning CL materials [ 113 , 114 ].
Some contaminants cover the Pt catalyst and then reduce the electrochemical surface area (ECSA) available for the reaction.
This catalytic contamination is usually reversible upon removal of the contaminants.
In certain instances, contaminants such as ammonia will cause irreversible degradation under adequate exposure time and concentration [44] .
Further, cell components, such as CLs and BPs, may contain contaminants, from their manufacturing process and/or material used, which eventually leach out and cause poi- soning of the MEA.
This may include membrane poisoning by metallic cations [91] .
Up to date, Pt is the electrocatalyst of choice for the ORR in PEM fuel cells because of its high activity.
However, Pt has a high cost associated with it and is currently mined in mainly several countries, such as South Africa and Russia.
Furthermore, high Pt loading is required to reach the target lifetime without major efficiency loss.
Using state-of-the-art methods, Pt catalyst is distributed in a way that does not allow its full utilization in CLs [ 115 , 116 ].
Alternative catalysts that are either Pt free or Pt alloys are under research.
Two excellent review papers on the topic are provided by Ref. [ 117 , 118 ].
A summary of some of these catalysts, their current status, and remaining challenges is provided in Fig. 7 .
Machine learning and AI are extremely helpful and highly demanding for CL development providing that CLs have been extensively studied for not only PEM fuel cells, but also many other systems, such as electrolyzers and sensors with Pt-catalyst electrodes.
The species transport equations, ORR reaction kinetics, two-phase flow, and degrada- tion mechanisms can be encoded into the neural networks for effective physics-informed deep learning to understand the impacts of catalyst materials on fuel cell performance/durability and optimize the pore size, PSD, PTFE loading, ionomer content, and carbon and electrocatalyst loading.
In the mass production phase, machine learning and AI can assist the quality control of CL composition in signal processing and element analysis when integrated with detection techniques such as Laser Induced Breakdown Spectroscopy (LIBS) [119] .
F.-K. Wang et al.: Hybrid Method for Remaining Useful Life Prediction of PEMFC Stack
ABSTRACT
Proton exchange membrane fuel cell (PEMFC) is a clean and efficient alternative technology for transport applications. The degradation analysis of the PEFMC stack plays a vital role in electric vehicles. We propose a hybrid method based on a deep neural network model, which uses the Monte Carlo dropout approach called MC-DNN and a sparse autoencoder model to analyze the power degradation trend of the PEMFC stack. The sparse autoencoder can map high-dimensional data space to low-dimensional latent space and significantly reduce noise data. Under static and dynamic operating conditions, using two experimental PEMFC stack datasets the predictive performance of our proposed model is compared with some published models. The results show that the MC-DNN model is better than other models. Regarding the remaining useful life (RUL) prediction, the proposed model can obtain more accurate results under different training lengths, and the relative error between 0.19% and 1.82%. In addition, the prediction interval of the predicted RUL is derived by using the MC dropout approach.
Y. Xie et al.: Novel DBN and ELM Based Performance Degradation Prediction Method for PEMFC
ABSTRACT
Lifetime and reliability seriously affect the applications of proton exchange membrane fuel cell (PEMFC). Performance degradation prediction of PEMFC is the basis for improving the lifetime and reliability of PEMFC. To overcome the lower prediction accuracy caused by uncertainty and nonlinearity characteristics of degradation voltage data, this article proposes a novel deep belief network (DBN) and extreme learning machine (ELM) based performance degradation prediction method for PEMFC. A DBN based fuel cell degradation features extraction model is designed to extract high-quality degradation features in the original degradation data by layer-wise learning. To tackle the issues of overfitting and instability in fuel cell performance degradation prediction, an ELM with good generalization performance is introduced as a nonlinear prediction model, which can get some enhancement of prediction precision and reliability. Based on the designed DBN-ELM model, the particle swarm optimization (PSO) algorithm is used in the model training process to optimize the basic network structure of DBN-ELM further to improve the prediction accuracy of the hybrid neural network. Finally, the proposed prediction method is experimentally validated by using actual data collected from the 5-cells PEMFC stack. The results demonstrate that the proposed approach always has better prediction performance compared with the existing conventional methods, whether in the cases of various training phase or the cases of multi-step-ahead prediction.
寿命と信頼性は、プロトン交換膜燃料電池(PEMFC)の用途に深刻な影響を及ぼします。 PEMFCの性能低下予測は、PEMFCの寿命と信頼性を向上させるための基礎です。劣化電圧データの不確実性と非線形特性によって引き起こされる低い予測精度を克服するために、この記事では、PEMFCの新しいディープビリーフネットワーク(DBN)とエクストリームラーニングマシン(ELM)ベースのパフォーマンス劣化予測方法を提案します。 DBNベースの燃料電池劣化特徴抽出モデルは、層ごとの学習によって元の劣化データから高品質の劣化特徴を抽出するように設計されています。燃料電池の性能劣化予測における過剰適合と不安定性の問題に取り組むために、優れた一般化性能を備えたELMが非線形予測モデルとして導入され、予測の精度と信頼性をある程度向上させることができます。設計されたDBN-ELMモデルに基づいて、粒子群最適化(PSO)アルゴリズムがモデルトレーニングプロセスで使用され、DBN-ELMの基本的なネットワーク構造をさらに最適化して、ハイブリッドニューラルネットワークの予測精度を向上させます。最後に、提案された予測方法は、5セルPEMFCスタックから収集された実際のデータを使用して実験的に検証されます。結果は、提案されたアプローチが、さまざまなトレーニングフェーズの場合でも、マルチステップアヘッド予測の場合でも、既存の従来の方法と比較して常に優れた予測パフォーマンスを持っていることを示しています。 by Google翻訳
I. INTRODUCTION The proton exchange membrane fuel cells (PEMFC) have been taken as a potential power generation system for many fields, including electric vehicles, aerospace electronics, and aircrafts [1], [2], due to its high conversion efficiency, low operation temperature, and clean reaction products [3], [4].
However, the fuel cell system is affected by multiple factors during operation, which reduces its reliability and shortens its lifetime [5].
Therefore, predicting the performance degradation can effectively indicate the health status of PEMFCs, which could provide a maintenance plan to reduce the failures and downtimes of PEMFCs, thereby extending their lifetime and increasing their reliability [6], [7].
The degradation prediction of PEMFCs can use the historical operating data, such as voltage, power, and impedance, to obtain early indications about fuel cell degradation trend and failure time [8].
The voltage drop is directly associated with failure modes and components aging of fuel cells, and it is also the easiest to obtain.
Thus, the voltage is commonly treated as the critical deterioration indicator reflecting the performance degradation of PEMFC [9], [10].
Current aging voltage prediction approaches can be grouped into two categories, model-based method, data-based method [11].
The model-based methods use the specific physical model or semi-empirical degradation model to provide the degradation estimation for the fuel cells.
However, their reliability is limited because the degradation mechanisms inside PEMFCs are still not fully understood [12].
Some other model-based methods use particle filter [13], Kalman filter [14], and their variants to estimate the health of PEMFC.
However, due to their limited nonlinear processing capabilities or low computational efficiency, they are difficult to describe the high nonlinearity and complexity of PEMFC aging processes.
Form a practical point of view, the data-based methods are more advantageous because they can represent the degradation features observed in the aging voltage data flexibly without any prior knowledge about the fuel cells [15].
Moreover, the data-based methods are easy to deploy, less computationally complex, and more suitable for practical online applications [8].
The existing different data-based methods can be divided into data analytics methods and machine learning methods.
Regression analysis approaches, such as autoregressive integrated moving average methods [15], locally weighted projection regression methods [16], and regime switch vector autoregressive methods [17], are some of the data analytics methods that have been adopted.
A large number of machine learning methods also achieve the great strides in PEMFC degradation prediction, including the support vector machine (SVM) based methods [18], relevance vector machine (RVM) based methods [19], Gaussian process state space based methods [20], back propagation neural network based methods [21], Echo State Network based methods [22], adaptive neuro-fuzzy inference system (ANFIS) based methods [23], extreme learning machine (ELM) based methods [24], and so on.
However, the above data-based methods build the prediction model without considering the degradation characteristics of the voltage data.
Thus they may not achieve better performance.
The actual data contain more fluctuations and noises, which limit the effectiveness of the regression analysis approaches.
Besides, some voltage recovery phenomena contained in the voltage degradation process of fuel cell exhibit the high nonlinear characteristics which cannot be fully extracted by these shallow neural networks mentioned in [21]–[24].
The general machine learning methods noted in [18]– [20] not only have the weak feature extraction ability but also are affected by many artificial determining factors such as their kernel functions construction [25].
Therefore, to improve the unsatisfactory prediction performance, the designed prediction method should be tightly integrated with data characteristics.
Furthermore, considering the weak feature extraction ability of shallow models, it is better to employ the deep learning architecture for PEMFC degradation prediction.
To overcome the above problems, a novel PEMFC performance degradation prediction model based on the deep belief network (DBN) and extreme learning machine (ELM) is proposed for the first time, which considers the statistical characteristics of original degradation data.
Deep Belief Network, as a deep learning method [26], has achieved state-of-the-art results on challenging modelling and regression problems for highly nonlinear statistical data.
DBN can learn high-quality and robust features from the data through multiple layers of nonlinear feature transformation [27], which achieves high precision recognition on handwritten digits [28] and facial expression [29].
It can also accurately describe the complex mapping relationships between inputs and features and has achieved state-of-the-art results on lifetime prediction problems of Multi-bearing [30], lithium batteries [31] and rotating components [32].
Thus, the DBN method with good feature extraction and expression abilities is adopted in this article to learn the deep PEMFC degradation features from a large number of voltages that contain too much noise and redundant data.
However, the DBN model may encounter the problems of the overfitting and local minima when using the gradient-based learning algorithm to obtain network parameters.
The ELM method with good generalization and universal approximation capability [33] is introduced to solve these limitations.
In the proposed DBN-ELM model, ELM services as a supervised regressor on the top layer to obtain the solutions directly without such trivial issues [34].
Furthermore, the ELM regressor can employ the deep feature provided by DBN to obtain a relatively stable prediction performance, which can avoid the ill-posed problems [35] in common ELM caused by data statistical characteristics [36] and the initialization mode [37].
In short, the proposed DBN-ELM method employs the DBN to extract high-quality degradation features and generate a relatively stable feature space which is, in turn, fed into an ELM to perform PEMFC degradation voltage prediction.
The propose d novel prediction model combines the excellent feature learning ability of DBN and generalization performance of ELM, which aims to enhance PEMFC degradation prediction performance.
Furthermore, to further improve the prediction accuracy, the particle swarm optimization (PSO) algorithm as the optimization tool is adopted into the design of the DBN-ELM model.
The PSO algorithm with the advantages of fast search speed, simple structure, and good memory ability [23] is widely used to optimize the structure [38]–[40] and parameters [23], [41], [42] of neuralnetworks (NN).
Thus, this article uses the PSO algorithm with time-varying inertia weight [43] to adjust the structural parameters of the DBN-ELM and improve prediction accuracy.
Finally, the proposed DBN-ELM method is verified by different case studies on a 1kW PEMFC experimental platform.
The novelty and contributions of this article can be summarized as follows:
• The degradation characteristics of the experimental voltage data are firstly analyzed, which guides the tailored design of the high-performance prediction model. • The DBN method is originally applied to the PEMFC performance degradation prediction for high-level degradation features extraction and learning. • The novel DBN-ELM method can accurately infer future voltage degradation changes of the PEMFC stack. • The PSO algorithm is introduced into the design of the proposed DBN-ELM prediction model to further improve the performance of PEMFC degradation prediction. • Experimental results demonstrate the accuracy and generalization performance of the proposed method in PEMFC degradation prediction.
この論文でも、使っているデータはIEEE PHM 2014 Data Challengeのものであり、Kaggleのコンペでスコア争いをしているのと変わらない。
触媒層のTEM観察が気になったので文献を調べてみた。 Testing fuel cell catalysts under more realistic reaction conditions: accelerated stress tests in a gas diffusion electrode setup Shima Alinejad et al., J. Phys.: Energy 2 (2020) 024003 Abstract
Gas diffusion electrode (GDE) setups have very recently received increasing attention as a fast and straightforward tool for testing the oxygen reduction reaction (ORR) activity of surface area proton exchange membrane fuel cell (PEMFC) catalysts under more realistic reaction conditions. In the work presented here, we demonstrate that our recently introduced GDE setup is suitable for benchmarking the stability of PEMFC catalysts as well. Based on the obtained results, it is argued that the GDE setup offers inherent advantages for accelerated degradation tests (ADT) over classical three-electrode setups using liquid electrolytes. Instead of the solid–liquid electrolyte interface in classical electrochemical cells, in the GDE setup a realistic three-phase boundary of (humidified) reactant gas, proton exchange polymer (e.g. Nafion) and the electrocatalyst is formed. Therefore, the GDE setup not only allows accurate potential control but also independent control over the reactant atmosphere, humidity and temperature. In addition, the identical location transmission electron microscopy (IL-TEM) technique can easily be adopted into the setup, enabling a combination of benchmarking with mechanistic studies.
2.2. Gas diffusion electrode cell setup. An in-house developed GDE cell setup was employed in all electrochemical measurements that was initially designed for measurements in hot phosphoric acid [24]. The design used in the present study has been described before [31]. In short, it was optimized to low temperature PEMFC conditions(<100 °C) by placing a Nafion membrane between the catalyst layer and liquid electrolyte; no liquid electrolyte is in direct contact with the catalyst[31]. A photograph of the parts of the improved GDE setup is shown in figure 1.
An advantage of half-cells with a liquid electrolyte - compared to MEA test - is the possibility of performing IL-TEM measurements to analyze the degradation mechanism leading to the loss in active surface area.
Here, we demonstrate that the same is feasible in the GDE setup, and even elevated temperatures can be used; see figure 5.
By placing the TEM grid between the membrane electrolyte and GDL, the IL-TEM method can be applied straightforwardly.
For the demonstration, a catalyst with lower Pt loading (20 wt%) was used to facilitate the ability to follow the change in individual particles.
The typical degradation phenomena, such as migration and coalescence (yellow circles) and particle detachment (red circle), can be clearly seen to occur as consequence of the load-cycle treatment.
Chemical States of Water Molecules Distributed Inside a Proton Exchange Membrane of a Running Fuel Cell Studied by OperandoCoherent Anti-Stokes Raman Scattering Spectroscopy Hiromichi Nishiyama, Shogo Takamuku, Katsuhiko Oshikawa, Sebastian Lacher, Akihiro Iiyama and Junji Inukai, J. Phys. Chem. C 2020, 124, 9703−9711
ABSTRACT:
On the performance and stability of proton exchange membrane fuel cells (PEMFCs), the water distribution inside the membrane has a direct influence.
In this study, coherent anti-Stokes Raman scattering (CARS) spectroscopy was applied to investigate the different chemical states of water (protonated, hydrogen-bonded (H-bonded) and non-H-bonded water) inside the membrane with high spatial (10 μm φ (area) × 1 μm (depth)) and time (1.0 s) resolutions.
The number of water molecules in different states per sulfonic acid group in a Nafion membrane was calculated using the intensity ratio of deconvoluted O−H and C−F stretching bands in CARS spectra as a function of current density and at different locations.
The number of protonated water species was unchanged regardless of the relative humidity (RH) and current density, whereas H-bonded water molecules increased with RH and current density.
This monitoring system is expected to be used for analyzing the transient states during the PEMFC operation.
Signatures of the hydrogen bonding in the infrared bands of water J.-B. Brubach et al., THE JOURNAL OF CHEMICAL PHYSICS 122, 184509 s2005d
Following the above considerations on the OH bond oscillator strength as a function of the number of established H bonds, the three-Gaussian components were assigned to three dominating populations of water molecules.
The lowest frequency Gaussian (ω=3295 cm−1) is assigned to molecules having H-bond coordination number close to four, as this component sits close to the OH band observed in ice.
The corresponding population is labeled “network water.”
Conversely, the highest frequency Gaussian (ω=3590 cm−1) is ascribed to water molecules being poorly connected to their environment since the frequency position of this component lies close to that of multimer molecules (for instance, ωdimer=3640 cm−1).
This population is called “multimer water.”
In between the two extreme Gaussians lies a third component (ω=3460 cm−1) which we associate with water molecules having an average degree of connection larger than that of dimers or trimers but lower than those participating to the percolating networks.
This type of molecules is referred to as “intermediate water.”
Obviously, this picture describes a situation averaged over time and any one molecule is expected to belong to the three types of population over several picoseconds.
The fact that the intermediate water Gaussian sits very close to the quasi-isobestic point frequency means, according to our view, that the quasiisobestic point separates water molecules with respect to their involvement or noninvolvement in the long range connective structures, built up by almost fully bonded water molecules.
third component (ω=3460 cm−1) : intermediate water
Peak 4 : 3483 cm-1 : H-bonded to H2O
highest frequency Gaussian (ω=3590 cm−1) : poorly connected to their environment
Peak 5 : 3559 cm-1 : non-H-bonded water
次の論文を読んでみたいが、有料なので、またの機会に!
Mechanism of Ionization, Hydration, and Intermolecular H-Bonding in Proton Conducting Nanostructured Ionomers Simona Dalla Bernardina, Jean-Blaise Brubach, Quentin Berrod, Armel Guillermo, Patrick Judeinstein§, Pascale Roy and Sandrine Lyonnard
Abstract
Water–ions interactions and spatial confinement largely determine the properties of hydrogen-bonded nanomaterials. Hydrated acidic polymers possess outstanding proton-conducting properties due to the interconnected H-bond network that forms inside hydrophilic channels upon water loading.
We report here the first far-infrared (FIR) coupled to mid-infrared (MIR) kinetics study of the hydration mechanism in benchmark perfluorinated sulfonic acid (PFSA) membranes, e.g., Nafion.
The hydration process was followed in situ, starting from a well-prepared dry state, within unprecedented continuous control of the relative humidity.
A step-by-step mechanism involving two hydration thresholds, at respectively λ = 1 and λ = 3 water molecules per ionic group, is assessed.
The molecular environment of water molecules, protonic species, and polar groups are thoroughly described along the various states of the polymer membrane, i.e., dry (λ ≈ 0), fully ionized (λ = 1), interacting (λ = 1–3), and H-bonded (λ > 3).
This unique extended set of IR data provides a comprehensive picture of the complex chemical transformations upon loading water into proton-conducting membranes, giving insights into the state of confined water in charged nanochannels and its role in driving key functional properties as ionic conduction.
白金触媒の評価に関する論文を見よう!
New approach for rapidly determining Pt accessibility of Pt/C fuel cell catalysts Ye Peng et al., J. Mater. Chem. A, 9, 13471 (2021)
A rapid method for evaluating accessibility of Pt within Pt/C catalysts for proton exchange membrane fuel cells (PEMFCs) is provided. This method relies on 3-electrode techniques which are available to most materials scientists, and will accelerate development of next generation PEMFC catalysts with optimal distribution of Pt within the carbon support.
Proton exchange membrane fuel cells (PEMFCs) are rapidly gaining entry into many commercial markets ranging from stationary power to heavy duty/light duty transportation.
However, as the technology continues to advance, operating current densities are pushed ever higher while platinum group metal (PGM) loadings are pushed ever lower.
コストダウンと性能向上のためには、触媒量を減らし、電流密度を上げる、必要がある。
As this occurs, new challenges are being discovered which require materials-level advances to overcome.
In particular, as PGM loadings are reduced to a level =<0.125 mg cm-2, significant performance losses have been widely reported.
These losses are most clearly observed at current densities of >1.5 A cm-2 , and have been correlated very strongly with a decrease in ‘roughness factor’ (‘r.f.’, a measure of cm2 Pt per cm2 membrane electrode assembly (MEA)) at the cathode, leading several researchers to attribute this to an oxygen transport phenomenon occurring at each individual Pt site.
分極曲線の測定法としては,非常にゆっくりとした走査速度でセル電圧を掃引して測定することもあるが,ある電流密度で一定時間保持して得られるセル電圧を,低電流密度から高電流密度まで順次測定していく定常法が一般的に用いられる.これは,電流密度を変更することにより MEA 内でガス・水分・電流などの分布が変化し,これらの状態が定常状態に落ち着くまでには5~10分程度かかるためである.
一方,カーボンブラックなどの触媒担体の劣化は1 V を超える高電位で加速されることが知られている15).通常の状態であれば,燃料電池電極がこのような高い電位にさらされることはないが,例えば起動停止時には逆電流機構とよばれるメカニズムでカソード電位が最大1.5 Vに達することがある16).このような状態を模擬する起動停止試験としてFig. 7(b)に示すような試験条件が提案されている.窒素雰囲気下0.9V/1.3 V の矩形波サイクルを繰り返すことで,起動停止時の異常電位に対する耐性を評価する.
High Pressure Nitrogen-Infused Ultrastable Fuel Cell Catalyst for Oxygen Reduction Reaction, Eunjik Lee et al., ACS Catal., 11, 5525−5531 (2021)
ABSTRACT:
The mass activity of a Pt-based catalyst can be sustained throughout the fuel cell vehicle life by optimizing its stability under the conditions of an oxygen reduction reaction (ORR) that drives the cells. Here, we demonstrate improvement in the stability of a readily available PtCo core−shell nanoparticle catalyst over 1 million cycles by maintaining its electrochemical surface area by regulating the amount of nitrogen doped into the nanoparticles. The high pressure nitrogen-infused PtCo/C catalyst exhibited a 2-fold increase in mass activity and a 5-fold increase in durability compared with commercial Pt/C, exhibiting a retention of 80% of the initial mass activity after 180 000 cycles and maintaining the core−shell structure even after 1 000 000 cycles of accelerated stress tests. Synchrotron studies coupled with pair distribution function analysis reveal that inducing a higher amount of nitrogen in core−shell nanoparticles increases the catalyst durability.
INTRODUCTION Extensive practical applications of the commercial hydrogen fuel cell vehicle have been delayed because of the high cost and limited durability of the membrane electrode assembly (MEA).
One of the main reasons for the high cost of the MEA is the large amount of Pt used to catalyze the oxygen reduction reaction (ORR) at the cathode of the proton exchange membrane (PEM) fuel cell.
In the past decade, several studies investigated ORR electrocatalysts to reduce the cost of the MEA.
One of the main strategies is to add modifiers to the Pt catalyst by changing the structure and morphology of the PtM (metal) alloycatalyst, while others include completely avoiding Pt usage by using various nonprecious M−N−C moiety catalysts.
Although the addition of modifiers can drastically increase catalytic performance, it cannot be sustained for prolonged periods, which is a major factor impeding commercialization.
To date, carbon-supported PtCo alloy nanoparticles have emerged as the best alternative to Pt/C; original equipment manufacturers are already using them in first-generation hydrogen fuel cell vehicles.
For better Pt utilization efficiency throughout the fuel cell lifetime, an ideal catalyst should be able to maintain its electrochemical surface area (ECSA).
Although earlier studies have corroborated nitrogen’s role in stabilizing the catalyst, high pressures doping of nitrogen in a controlled environment on industrial scale core−shell nanoparticles was not achieved.
(25) Kuttiyiel, K. A.; Sasaki, K.; Choi, Y.; Su, D.; Liu, P.; Adzic, R. R. Nitride Stabilized PtNi Core−Shell Nanocatalyst for high Oxygen Reduction Activity. Nano Lett. 2012, 12 (12), 6266−6271. (26) Kuttiyiel, K. A.; Choi, Y.; Hwang, S.-M.; Park, G.-G.; Yang, T.- H.; Su, D.; Sasaki, K.; Liu, P.; Adzic, R. R. Enhancement of the oxygen reduction on nitride stabilized Pt-M (M = Fe, Co, and Ni) core−shell nanoparticle electrocatalysts. Nano Energy 2015, 13, 442−449.
Thus, in this study, to obtain a highly stable and active ORR catalyst, a highpressure nitriding reactor that can infuse a controlled number of nitrogen (N) atoms into the alloy nanoparticles was developed.
Varying the ratio of N atoms in the PtCo/C core−shell nanoparticles can significantly affect the morphology of the nanoparticles and simultaneously increase their stability without impacting the activity.
Herein, we report the preparation of N-stabilized PtCo core−shell nanoparticles with ultrastable configurations; the result is a highly durable ORR catalyst that can withstand up to 1 000 000 cycles in accelerated stress tests (ASTs), enabling rapid commercialization of fuel cell vehicles.
To the best of our knowledge, thus far, no catalysts have been reported that can last 1 million cycles.
The best configuration (Pt40Co36N24/C) retained 93% of its ECSA, while its initial half-wave potential decreased by only 6 mV after 30 000 cycles.
This confirms that the proposed configuration is a suitable alternative to the commercial Pt/C catalyst, whose ECSA deteriorated by 40% under similar conditions.
CONCLUSION We exhibited that nanostructured core−shell materials with high contents of N in their cores can be engineered to sustain harsh and oxidative electrochemical environments during fuel cell operation.
X-ray experiments and PDF analyses revealed that a high N content could protect the Co core against dissolution.
The sustainment of 1 million cycles after harsh and corrosive ASTs without significant dissolution facilitates the potential industrial scale application of the catalysts.
This strategy presents a promising approach to develop cheap and ultradurable core−shell catalysts using other 3d transition metal cores.
8月14日(土)
High Pressure Nitrogen-Infused Ultrastable Fuel Cell Catalyst for Oxygen Reduction Reaction, Eunjik Lee et al., ACS Catal., 11, 5525−5531 (2021)
RESULTS AND DISCUSSION Carbon-supported PtCo core−shell nanoparticles were prepared by reducing platinum acetylacetonate [Pt(acac)2] and cobalt acetylacetonate [Co(acac)2] via ultrasound-assisted polyol synthesis.
Transmission electron microscopy (TEM) analysis showed that the as-synthesized PtCo nanoparticles exhibited a core−shell structure with an average particle size of ∼2.3 nm (Figure S1).
Scanning TEM (STEM) and energy dispersive X-ray spectroscopy (EDS) confirmed the core−shell structure with 1−2 Pt monolayers on the Co-rich core (Figure 1B−D).
The PtCo core−shell nanoparticles were annealed in an argon/ammonia mixture (N2/NH33: 5/95) at 510 °C in three pressurized environments (1, 40, and 80 bar).
The nanoparticles maintained their core−shell structures and exhibited an increase in the particle size and a change in composition (Figure 1F−H).
As shown in Figure 1E, higher pressure increases the N content in the nanoparticles but ultimately decreases the particle size.
On the basis of the N content in the nanoparticles, the molar ratio changes drastically; the resultant nanoparticles are denoted as Pt52Co48/C, Pt53Co45N2/C, Pt44Co42N14/C, and Pt40Co36N24/C (Table 1).
For all samples, in-house X-ray diffraction (XRD) patterns exhibit the typical face-centeredcubic (fcc) structure, with no phase segregation, corresponding to Pt and its alloys with transition metals (JCPDS, No. 87- 0646) (Figure 1A).
The position of the (111) peak of PtCo/C shifts to a higher angle compared with that of Pt/C, indicating that Co atoms with relatively smaller atomic sizes are incorporated into the Pt lattice, causing compressive strain.
Interestingly, the nitriding pressure directly affects the full width at half-maximum (fwhm) and position of the (111) peak.
In particular, the fwhm increases and the (111) peak position gradually shifts to a lower angle with an increase in the nitriding pressure.
This suggests that the nitriding pressure changes the atomic structure of the catalyst particles while relaxing the lattice mismatch between Pt skin and cobalt nitride core (Table 1).
Furthermore, X-ray photoelectron spectroscopy (XPS) studies indicate that, compared with metallic Pt, the Pt 4f peak in all samples shifts to a lower binding energy (BE), likely owing to the charge transfer from Co to Pt (Figure S2).
Additionally, no peaks (∼399.8 eV) for imides/lactams/amides are observed, indicating that most N in the samples exists in the form of nitrides.
To gain further insights about how the as-synthesized PtCo core−shell nanoparticles maintain their structures while incorporating N atoms, we carried out ab initio molecular dynamics (AIMD) studies to simulate the formation of the CoN nanophase in the nanoparticle core.
Before the conduction of AIMD, the NH3 molecules were packed into a unit cell with cuboctahedral PtCo nanoparticles under pressures of 1, 10, and 45 bar by use of the COMPASSII force field.
We considered the entropic effect to identify the continuous reaction process incorporated at a finite temperature of 783 K.
In the case of a single PtCo nanoparticle, it is found that N atoms from the NH3 molecules cannot penetrate the Co core even at a high pressure of NH3, as shown in Figure S3 and Movie S1.
Therefore, we tested the case of formation of PtCoN core−shell nanoparticles through a particle growth process involving the agglomeration of the preformed PtCo fragments into nitride cores that are consequently covered by a Pt shell.
The results shown in Figure 2A indicate that this is the likely mechanism of the particle size increasing from ∼2.3 nm for pure PtCo nanoparticles to ∼4.2 nm for Pt53Co45N2/C (Table 1).
Interestingly, AIMD studies are appreciably consistent with the observation that two Pt12Co1 nanoparticles at 10 bar of NH3 (e.g., 28.7 bar at 783 K) can spontaneously merge without any considerable activation barrier.
The simulations indicate the formation of irregular particles with a compressed Pt−Pt distance depending on the location of nearby N atoms, as revealed by the atomic pair distribution function (PDF) analysis and the reverse Monte Carlo modeling (discussed below), thereby increasing the number of N atoms that exist near the Pt sublayer.
In situ Co K edge X-ray absorption near-edge structure (XANES) spectra of Pt52Co48/C, Pt53Co45N2/C, Pt44Co42N14/ C, and Pt40Co36N24/C nanoparticles (Figure 2B) were obtained in 0.1 M HClO4 at a potential of 0.42 V.
As the N concentration increases, the peak intensity at 7724 eV starts decreasing; the highest peak at 7727 eV is observed at a N concentration of >14 at%.
This change can be ascribed to a change in the electronic structures of Co due to N doping.
As shown in Figure S7, the XANES spectra of CoO (Co2+) and Co3O4 (Co2.67+) exhibit the highest peaks at 7725 and 7729 eV, respectively; meanwhile, the highest peak for Pt40Co36N24/C lies between them.
Thus, the N doping of PtCo catalysts alters the electronic state of Co, resulting in an increase in the oxidation state.
The increase in the oxidation state with an increase in the N content is also supported by the data shown in the inset of Figure 2B; half-step energy values (at 0.5 of the normalized absorption in the XANES spectra) increase with an increase in the N concentration.
Figure 2C shows the in situ Pt L3 edge XANES spectra of the PtCo/C and N−PtCo/C catalysts measured in 0.1 M HClO4 at a potential of 0.42 V.
The intensities of the white lines (first peaks in XANES data) change with the variation in the N content in the N−PtCo/C catalysts.
As shown in the inset of Figure 2C, the intensity increases with increase in N concentration; it is higher than that of a Pt foil but lower than that of the PtCo/C catalyst.
The change in white line intensity is related to the d-band structure in Pt. It is well-known that higher intensities correspond to an increase in d-band vacancy; that in turn lowers the adsorption of the intermediate molecules (such as OOH and OH) on the Pt surface.
Thus, N doping can weaken the interaction of the Pt surface with oxygen, compared with that of bulk Pt.
However, the effect is not as strong as that for the PtCo/C catalyst as the white line intensity for the N−PtCo/C catalysts is lower than that of PtCo/C and varies with the N content.
The XANES data suggest that N doping in N−PtCo/ C alters the electronic states of Co and Pt, resulting in moderate adsorption strength of oxygen on the Pt surface.
To comprehensively understand the particle structure, highenergy synchrotron XRD experiments coupled with atomic PDF analysis were carried out.
Experimental PDFs (Figure S8) were fit with 3D models for the nanoparticles using classical molecular dynamics (MD) simulations and were further refined against the experimental PDF data by employing reverse Monte Carlo modeling.
Cross sections of the models emphasizing the core−shell characteristics of the particles are shown in Figure 3.
The models exhibit a distorted fcc-type structure and reproduce the experimental data in exceedingly good detail (Figure S8).
The bonding distances between the surface Pt atoms and surface Pt coordination numbers extracted from the models are also shown in Figure 3.
As observed, PtCo core−shell particles exhibit large structural distortions (∼1.8%).
The surface Pt−Pt distance in Pt53Co45N2 is 2.739 Å, which is approximately 1.5% shorter than the surface Pt−Pt distances in bulk Pt (2.765 Å).
Furthermore, the surface Pt−Pt distance in PtCo is 2.731 Å, indicating 0.3% more strain compared with the strain observed in the Pt53Co45N2 particles.
This indicates that N relaxes the compressive stress in PtCo core−shell particles.
Moreover, the average surface Pt coordination number for the particles with CoN cores increases and becomes more evenly distributed than in the case of pure Pt particles; that is, the surfaces of N-treated particles appear less rough (fewer undercoordinated sharp edges and corners), which can affect the binding strength of oxygen molecules to the particle surface and accelerate the ORR kinetics.
As expected, the N-treated particles show an increased number of N atoms located near the Pt shell, which explains the increased stability of the nanoparticles compared with those of pure Pt and PtCo particles.
The electrochemical performances of all the catalysts were compared using cyclic voltammetry (CV) curves (Figure S4).
The incorporation of Co into the Pt nanoparticles increases the ECSAs of the catalysts, while that of N into the PtCo nanoparticles decreases their ECSAs (Figure 4A).
A slightly different trend was observed with respect to the specific and mass activities of the catalyst (Figure 4B).
The PtCo/C catalyst with low nitrogen content shows the highest activity among the catalysts; however, an increase in N content does not drastically change its catalytic behavior.
Our study was mainly focused on achieving structural stability of the catalyst.
AST cycles at 0.6−0.95 V and 3 s hold were employed for each catalyst.
All N-infused PtCo/C catalysts showed higher stability and activity compared with commercially available Pt/C and PtCo/C catalysts (Figure S5).
The catalyst with the highest N amount (Pt40Co36N24/C) retained 93% of its ECSA, with a decrease of only 6 mV in its initial half-wave potential after 30 000 cycles.
To further investigate the structural integrity of all the catalysts, we cycled them until the ORR activity decreased to half its initial value.
As observed in Figure 4C, most of the N-infused catalysts retained their structures up to 230 000 cycles; however, the catalyst with the highest amount of N (Pt40Co36N24/C) retained its structural integrity until 1 000 000 cycles and lost just 44 mV from its initial half-wave potential (Figure S6).
Fuel cell (25 cm2) performance tests with 0.1 mg cm−2 Pt content showed promising results (Figure 4D,E). The Pt40Co36N24/C catalyst achieves the U.S.
Department of Energy durability target of a 30 mV voltage drop at 0.8 A cm−2 after 30 000 ASTs (Figure 4H).
Moreover, considering the particle size growth after 30 000 ASTs, the PtCo nanoparticles grew by 41% from their initial average size (Figure 4F), whereas the N-infused PtCo nanoparticles grew by 21%, confirming that N plays a key role in impeding nanoparticle coarsening (Figure 4G).
As previously reported, DFT-based studies clearly support the higher ORR activities of nitride-stabilized Pt−metal electrocatalysts over Pt/C catalysts.
Their volcano-like trends show that the interactions of Pt/C and PtCo/C with oxygen are significantly stronger and weaker, respectively, compared with those of PtCoN/C.
The outstanding stability of high-pressure N-infused PtCoN/C catalysts can be easily explained on the basis of our resent DFT findings.
The segregation effect of Pt facilitated by the higher N concentration in turn facilitates the diffusion of Pt atoms to the vacant sites of the outermost shell, preventing dissolution.
Evidently, these results demonstrate the enhanced catalytic stability of the Pt40Co36N24/C catalyst over the other N-infused PtCo catalyst.
次のレビューは、様々な触媒の作り方が、網羅的に紹介されている。
Ultra‑low loading of platinum in proton exchange membrane‑based fuel cells: a brief review, Aristatil Ganesan and Mani Narayanasamy, Materials for Renewable and Sustainable Energy (2019) 8:18
Abstract This review report summarizes diferent synthesis methods of PEM-based fuel cell catalysts with a focus on ultra-low loading of Pt catalysts. It also demonstrates fuel cell performances with ultra-low loading of Pt catalysts which have been reported so far, and suggests a combination method of synthesis for an efficient fuel cell performance at a low loading of Pt catalyst. Here, maximum mass-specifc power density (MSPD) values are calculated from various reported performance values and are discussed, and compared with the Department of Energy (DOE) 2020 target values.
Introduction
・・・・・・・・・・
Regrettably, expensive platinum group metal (PGM) catalysts block the commercial sales/volume. PGMs (plus application) cost contribute to the total cost of FC stack from 21% (1000 FC systems/year) to 45% (500,000 systems/year) [5] as expected. Since PGMs are expensive, the PGMs loading should be reduced from current (target) levels. As PGMs play a critical role in both hydrogen oxidation reaction (anode–HOR) and oxygen reduction reaction (cathode– ORR) of the fuel cell, the challenge is ahead in the PEMFC community to address PGM cost issues for its use in both anode and cathode of the fuel cell.
・・・・・・・・・・
According to DOE 2020, the loading target of PGM is 0.125 mgcm−2 and < 0.1 mgcm−2 for the anode and cathode, respectively. Nevertheless, a still lower loading of about 0.0625 mgcm−2 is required for PEMFC vehicles to stand along with IC engine vehicles.
Literature
Many research groups are working on Pt alloy catalysts such as PtCo; PtNi; PtCoMn; WSnMo; PtRu; PtAgRu; PtAuRu; PtRhRu; and Pt–Ru–W2C to replace Pt/C [7–10]. By providing high surface area carbon supports, Pt content could be reduced with high Pt utilization [11, 12]. Using the plasma sputtering technique [13], the total Pt loading in both anode and cathode is reduced to 20 μgcm−2. By this method, uniform dispersion of Pt as clusters with size less than 2 nm is achieved with high catalyst utilization.
Most researchers have made an attempt to reduce Pt loading by providing novel catalyst supports such as multiwalled CNT and single-walled CNT [14]. Binary alloys of Pt, Pt–Cu [15], Pt–Co [16–19], Pt–Ni [17, 18], Pt–Cr [17] revealed 2–3 times higher mass-specific activity than Pt/C, which is due to alloy effects and ligand effects. A ternary alloy of PtFeNi and PtFeCo [19] showed excellent ORR activity, but in some cases presents Pt particles aggregation. A bimetallic alloy of Pd–Pt on hollow core mesoporous shell carbon (PtPd/HCMSC) demonstrated enhanced ORR activity and stability [20, 21]. Recently, a core shell of PtCo@Pt offered low loading of catalyst, but it had a disadvantage of base metal cobalt (Co) leaching (dissolution) from bulk to surface [22]. Wang et al. [21] investigated PtNi alloy as a high-performing catalyst for automotive applications with a low loading of Pt: 0.125 mgcm−2 which satisfied the DOE 2020 target.
Pt–Ni alloycatalyst synthesized by direct current magnetron sputtering involves Pt sputtering on synthesized PtNi/C substrate which forms multilayered Pt-skin surface, with superior ORR activity. This catalyst involves the mature technology of synthesis with improved performance compared to Pt/C. Though this catalyst presents superior performance, it involves careful preparation of Pt target material for sputtering (costly), preparation of PtNi by chemical reduction, thermal decomposition, and acid treatment with final heat treatment. Materials’ preparation involves many steps and needs careful optimization for getting a reasonable yield of catalyst. Durability studies were not conducted at the MEA level as it is specified by DOE.
Kongkanand et al. investigated [22] PtCo on high surface area carbon (HSC), which demonstrates a less degree of PtCo particle coalescence after the stability test. Also, HSC is favored for start-up performance and long-term durability. The dissolution of Pt and Co was resolved by developing a deposition model [23]. DOE has updated its cost estimation for an automotive fuel cell by 15%, i.e., $45/ kW, because of the development of catalyst, PtCo/HSC. This catalyst system would reduce the total cost of the system to 14% or $7.5/kW [22]. These catalysts (PtNi/PtCo) cost about $15.20/g for cathode (Pt 0.100 mgcm−2) and $10.86/g for Anode (Pt 0.025 mgcm−2) [1]. Chen et al. investigated Pt3Ni nanoframes and demonstrated high mass activity with durability, but MEA performance at high current density was challenging [23]. The shape-controlled synthesis of Pt–Pd and Ru–Rh catalyst showed high mass activity and it offers a commercially efficient scale-up method. This catalyst has issues with performance at the MEA level and stability [24]. In addition to catalyst support modification, and alloying of Pt, for Pt content reduction, a proper MEA fabrication methodology is to be identified for low Pt loading. This review provides intensive guidance for researchers working on low Pt loading catalyst for fuel cells.
Most promising methods for the preparation of electrodes
Though there are several methods such as physical vapor deposition, chemical vapor deposition, sputter deposition, galvanic replacement reaction (Pd nanocrystals with different shapes) [24] hydrothermal synthesis [25] electrodeposition (hetero-structured nanotube dual catalyst) [26] electrospinning [27] and molten salt method [28], electrodeposition [29] are available for catalyst synthesis and coating, only very few methods are practically feasible for producing nanoparticles of a catalyst and its efficacy for coating on electrodes.
Electrodeposition The need and necessity for nanostructured energy materials with high surface area, and for its efficient application in energy conversion devices, can be achieved only with the electrochemical synthesis route. Electrodeposition technique proves to be the best method for the following reasons.
1. Electrode potential, deposition potentials, current densities, and bath concentrations could be controlled for the synthesis of homogenous nanostructured materials.
Hence by varying deposition parameters, one can synthesize thin catalyst film, with desired stoichiometry, thickness, and microstructure.
2. Particle size, desired surface morphology, catalyst loading, thickness, and microstructure can be easily achieved using various control parameters involved in electroplating.
3. Electrochemical reactions proceeded at ambient temperature and pressure, as high thermodynamic efficiency during plating is maintained.
4. Environmentally friendly.
5. Synthesis can be started with low-cost chemicals as precursor materials.
6. One-pot single-step synthesis of the final product is possible by avoiding a number of steps.
7. Any metal or alloy can be easily doped into desired nanostructured materials.
8. The required nanostructured energy materials can be directly grown on the electrode surface by electrochemical method, and it provides good adhesion, large surface area, and electrical conductivity.
And hence, this method is found suitable for construction of energy devices with high efficiency and with low cost.
9. By this method, materials with poor electrical conductivity of metal oxide used as catalyst supports can be easily incorporated into advanced energy materials and will facilitate fast electron transport mechanism. Therefore, electrical conductivity of catalyst supports can be enhanced by the electrodeposition method.
10. The electrochemical synthesis route eliminates the complexity of mixing catalyst powders with carbon black and polymer binder in fabricating electrodes for fuel cells in a short time [29].
・・・・・・・・・・
Chemical precipitation method
A thin nanocatalyst layer is formed by the reduction of reducing agent in the precursor solution. The desired particle size of the catalyst can be achieved by varying parameters, such as temperature, pH, the ratio of reducing ion to Pt, reaction time, and stirring rate. The main disadvantage of this method is producing irregular particle size and shape, and resulting in the inhomogeneous layer. This formation is due to various growth kinetics and conditions, and thus it is least used for catalyst synthesis.
Colloidal method
By this method, colloidal dispersion is formed by stabilizer and the precursor. The suitable support material is added and by which colloid deposition occurs on the support surface. In the final stage, the decomposition of colloid results in the formation of catalyst. The common colloidal particles formed by the precursors, H2PtCl6 and RuCl3, and reduced with reducing agent. The stabilizers and reducing agents present in the final product will have to be removed by thermal treatment. This method involves various steps to be followed for the catalyst synthesis.
Sol–gel synthesis method
This method allows forming solid particles suspended in liquid solution (sol) and upon subsequent aging, and drying to form a semi-solid suspension in a liquid (gel). And subsequent calcination results in a mesoporous solid or powder formed on the substrate. Pore size distribution on the catalyst layer can be varied by various experimental parameters. The disadvantage associated with this technique is catalytic burning in pores, makes them inaccessible to reactants, and resulting in low catalyst utilization.
Impregnation method
This method uses high surface area carbon supports for the formation of catalyst. In this method, chloride Pt salt directly mixed with reducing agents, Na2S2O3, NaBH4, N2H4, formic acid, and H2 gas in an aqueous solvent. This method results in Pt agglomeration and weak support due to the high surface tension of the liquid solution [56].
Microemulsion method
The water-soluble inorganic salt was used as a metal precursor in the solution. Here the particle growth rate, size, and shape are being decided by a proper proportion of metal salt and organic solvent and the resulting solution forms water-in-oil structure (microemulsion). The hydrophobic property of organic molecules protects the metal particle as an insulation layer and prevents agglomeration when the reducing agent is added. That is a surfactant-assisted synthesis of catalyst which forms suitable catalyst support with the protection layer. The main drawback of this method is the use of expensive chemicals and not being environmentally friendly [39].
Microwave‑assisted polyol method
Here, Pt metal salts are reduced in ethylene glycol, and the reduction reaction occurs at a temperature above 120 °C. Microwave-assisted heating could produce more active ORR catalyst than the conventional heat treatment. Microwave heating produces uniform dispersion and greater morphological control over particle size (< 3 nm). The main advantage of this method is that it has no surfactant addition and uses an inexpensive solvent like ethylene glycol. The disadvantage associated with this method is that it is time consuming.
Chemical vapor deposition (CVD)
This method uses the required precursors in the gas phase using external heat energy plasma sources in an enclosed media-assisted chamber. The thin solid film formed on the substrate by decomposition reaction of precursors. The impurities produced during reaction is removed by the flowing media gas into the chamber. This method is most widely used for the synthesis of advanced materials like CNT and graphene. This method involves a huge cost for instruments and process.
Spray technique
Spray painting involves printing techniques for coating catalyst directly on the substrate, and it involves inkjet printing, casting, sonic method, etc. The advantage of printing technique is that we can coat a large area of the electrode, irrespective of surface (conductive or non-conductive) of the substrate. After coating, the coated surface is allowed for evaporation of the solvent. Though many advantages are provided by this technique, it has a large influence on practical applicability and mass production, so catalyst utilization is very low.
Atomic layer deposition (ALD)
This method is under the sub class of CVD. Here gas phase molecules are used sequentially to deposit atoms on the substrate. The precursors involved react on surface one at a time, in sequential order. The substrate is exposed to different exposures at different time and forms uniform nanocoating on the substrate. This method involves four steps to complete the whole process: (1) exposure to precursor first, (2) purging of the reaction chamber, (3) exposure to second reactant precursor, and (4) a further purge of the reaction chamber. During step 1 and 2, the precursors react with the substrate at all available reactive sites. The unused precursors and impurities are removed by purging the inert gas. During the third stage, the adsorbed precursor on the substrate starts reacting with reactant precursor to eliminate ligands of the first precursor for forming target material, while the residues formed in step 3 are eliminated in step 4 of inert gas purging which complete one cycle; likewise many cycles are repeated to achieve desired thickness of the target material.
Key features to consider when preparing the electrodes
In emerging hydrogen economy, fuel cell technology developments need to be redressed in cost effectiveness and benchmark performance as directed by DOE US and operation under long life cycles. There are many ways to reduce the cost of fuel cells without sacrificing performance and are [45–50] listed below:
1. reduction of precious metal loading. 2. Nanostructured thin-film (NSTF) development for catalyst layer. 3. Particle size reduction for electrocatalyst. 4. Developing non-precious metal/alloy. 5. Developing novel catalyst preparation methods. 6. Using novel MEA fabrication methods to adopt for advanced catalyst and membrane materials. 7. Adopting new techniques to promote triple-phase boundaries and mitigate mass transfer limitation. 8. Attempt to develop carbonaceous and non-carbonaceous catalyst support materials to achieve peak performance at low-cost investment.
In addition to various useful applications of PEMFC, still, it has to go a long way in terms of catalyst for successful commercialization, like cost, efficiency, and cycle stability. Even now, Pt/Pt-based materials hold its strong position in functioning as an efficient catalyst for PEMFC and DMFC, as it exhibits superior catalytic activity, electrochemical stability, high exchange current density, and excellent work function [50–53].
Due to the lack of Pt resources in earth’s crust, they are a costlier and limited supply for industries. In regard to PEMFC automotive applications, the present resources of Pt are not sufficient to fulfill the requirements, and the obtained ORR activity is also not up to the benchmark performance [51]. Because of these reasons, researchers are now focused mainly on synthesizing ultra-fine nanoparticles of Pt, alloying with other metals, and ultra-low loading of Pt on highly porous, high surface area metal oxide/composite support to reduce the cost without sacrificing the performance [52]. Usually, conductive porous membranes are used as catalyst support materials for PEMFC and DMFC, but the use of metal catalyst support shows higher stability, and activity when compared to unsupported catalyst.
The typical characteristics of catalyst support are as follows: ・High surface area. ・Ability to maximize triple-phase boundary through their ・mesoporous structure. ・Good metal–catalyst support interaction. ・High electrical conductivity. ・Good water management. ・Increased resistance to corrosion. ・Ease of catalyst recovery [54].
Support material, in addition to increasing catalytic activity and durability, also determines the particle size of a metal catalyst. Hence, the choice of support material should be chosen, in such a way that it supports performance, behavior, long cycles of operation, and cheaper cost of catalyst. The following steps should be considered for developing a new catalyst system,
• Developing non-precious metal catalyst. • Choice of suitable catalyst support materials.
The metals other than Pt group are palladium, ruthenium, rhodium, iridium, and osmium. The availability of these metals is scarce compared to Pt. Hence by incorporating all the above points and alloying with non-Pt group metals, the loading of precious metal could be reduced with higher performance [55]. The essential properties of support materials discussed above are important to achieve better performance of fuel cell at a cheaper cost.
Stability
The major issue with PEMFC catalyst is long-term durability. During the continuous operation of PEMFC, catalytic agglomeration, and electrochemical corrosion of carbon-based support result in deterioration of catalyst activity [53]. By choosing the correct catalyst support, one can eliminate the agglomeration of catalyst, and corrosion of support. With the existing carbon black support, the electrochemical corrosion triggers at above 0.9 V which results in the catalyst getting detached from support, and agglomerates. It will create a lack of diffusion of fuel/oxidant reactants and reduces overall fuel cell performance, and life. These issues force us to find a solution for long cyclic stability of PEMFC by choosing proper support, which has strong electrochemical stability under acid/alkaline medium.
The most widely used support materials are carbon black with various grades from various companies based on quality in terms of porosity and surface area. Since from last decade, the researcher’s focus is on nanostructured catalyst supports, as they deliver faster charge transfer, surface area, and improved catalytic activity. They are broadly classified into carbonaceous and non-carbonaceous supports. Carbonaceous type includes different types of modified carbon materials such as mesoporous carbon, carbon nanotubes (CNTs), nanodiamonds, carbon nanofibers (CNF), and graphene [36, 54–61]. This nanostructured modified carbon offers high surface area, high electrical conductivity, and good stability in acid and alkaline environments. High crystallinity of carbon nanomaterials, such as CNT and CNF, exhibits stability and good activity [62].However, under repeated cycles of fuel cell operation, carbon materials such as carbon black face serious problems of corrosion. Though there is considerable decrement of corrosion rate with higher graphitic carbon materials such as carbon nanotubes, carbon nanofibers, they do not prevent carbon oxidation [63]. To achieve high corrosion/oxidation resistant, stability, and durability; metal oxides are preferred as a good catalyst support material instead of carbon [52, 64]. Metal oxides offer [62, 64]:
high electrochemical stability, mechanical stability, porosity, high surface area, cycling stability and durability [62, 63].
Debe et al. derived development criteria for automotive fuel cell electrocatalysts as given in Tables 1 and 2. They proposed that increased surface area of catalyst will improve the activity of the outer Pt layer [65]. Nanostructured thinfilm (NSTF) catalysts will give high surface area for efficient activity for the catalyst. NSTF electrocatalysts offer areaspecific activity of the catalyst, catalyst utilization, stability, and performance with ultra-low PGM loadings.
Problems associated with ultra‑low loading
During continuous operation of fuel cell, there will be a loss in ECSA due to dissolution, agglomeration, and Ostwald ripening. So, catalyst stability and durability are being decided by ECSA loss before and after operation of specified hours. Most recent catalyst systems with ultra-low loading present very high mass activity (30 × higher mass activity vs. Pt/C), but they fail at high current density targets. For example, core–shell (Pt@Pd/C)catalysts exhibit higher mass activitybut undergo some degree of base metal dissolution [71]. So, new catalyst development with the focus on ultralow loading of precious metal and stability at high current densities (HCD) is required even though they exhibit higher mass activity.
Requirements of cathode catalysts
PGM alloy shows high performance at the beginning and offers higher ohmic/mass transport losses during continuous operation. During long cycling, a conventional Pt/C lost its performance by degradation (dissolution, agglomeration, and Ostwald ripening). And PGM alloy contaminates ionomer by the dissolution of ions and results in additional performance loss at high current densities. Hence, a novel cathode catalyst layer is required for high performance and durability. As pointed out earlier, most Pt alloy catalysts with high mass activity show high performance at low current densities, but suffer from performance loss at high current densities due to base metal or support dissolution, and it is progressive when operating under voltage cycling. Hence, a novel cathode catalyst layer design is proposed to get rid of the above-discussed problems and to deliver stable performance/ durability. Dustin Banham et al. [72] presents realworld requirements for the design of PEMFC catalysts.
Platinum is a superior catalyst for hydrogen oxidation reaction in the anode of the fuel cell, and it accounts for 50% of the fuel cell cost [72]. During the stack operation, if flow field in anode side is blocked, the current forces malfunctioning of the cell, and stack. Materials such as carbon, catalyst, water present in the anode layer oxidized to supply the necessary electrons. This is, in turn, leads to high anodic potential (> 1.5 V), and the deterioration of the anode catalyst layer. This implies that the requirement of a novel catalyst layer with strong support material which has electrochemical stability and durability. Nowadays, the catalyst research group must have a strategy to test their catalyst for fuel cell performance and durability at the MEA level. It will further require real-time stack testing and optimizing various parameters by incorporating interdependency of various materials involved in the system.
Maximum mass‑specific power density (MSPD)
DOE has targeted maximum mass-specific power density (MSPD) values [73], which account for both low Pt anode and low Pt cathode catalysts, as an index for performance with reference to Pt loading. DOE targets more than 5 mW μg−1 Pt total at cell voltages higher than 0.65 V [Department of Energy (DOE)]. This cost reduction to meet DOE target 2020 is possible if we could reduce Pt loading in MEAs to less 125 μg cm−2 MEA. In general, it is classified into three regions: (1) > 5 mW μg−1 Pt total (2) between 1 and 5 mW μg−1 Pt total (3) < 1 mW μg−1 Pt total. The maximum MSPD value 8.76 mW μg−1 Pt total at 0.65 V is obtained by a proprietary catalyst, PtNi/PtCo, of General Motors and United Technologies Research Center (UTRC), and stack modeling performed by ANL [23] (Fig. 3).
Catalyst synthesis and deposition methods: MSPD values
Various catalyst synthesis methods are listed in Tables 4 and 5 with a primary focus on how an ultra-low loading of catalyst impacts the fuel cell performance by the influence of maximum mass-specific power density (MSPD) values. Each method has achieved maximum performance with low loading of catalyst within the boundary of its limitation.
Combination method of synthesis and coating
By comparing all synthesis methods (Fig. 4), it is found that the combination method of synthesis and coating (e.g., spraying and sputtering) has achieved increased MSPD values than the specific method of synthesis. It is also encouraged to note that the combination method of synthesis and coating may eliminate the limitation posed by a specific method. In this review, for example, electrodeposition and plasma sputtering/spraying synthesis methods are recommended for developing an efficient catalyst system which would deliver good performance and stability, at high current density with long-term durability. Here the disadvantages posed by each method are overcome by other methods. Any catalyst synthesis and coating technique, which is being scaled up with high performance/durable catalyst layer, is now a superior priority. Hence, greater attention should be paid not only towards the alloycatalyst but also the catalyst preparation methods, and choice of catalyst support materials [64]. Table 4 shows various catalyst synthesis methods and respective MSPD values along with reference. Table 5 shows various synthesis methods and their merits and demerits.
Conclusion
Here a brief review of various catalyst synthesis methods and their efficacies is performed with a focus on ultra-low loading of catalyst. Also, the merits and demerits of various synthesis methods are discussed. The ultra-low loading in electrodes was discussed in terms of MSPD values, and is compared with DOE 2020 target values. The catalyst prepared by any combination of the method of synthesis which results in MSPD values more than 5 mW μg−1 Pt total at > 0.65 V will be the best catalyst to meet the target of DOE 2020.
Deep neural network for x-ray photoelectron spectroscopy data analysis G. Drera, C. M. Kropf and L. Sangaletti, Mach. Learn.: Sci. Technol. 1 (2020) 015008
Abstract In this work, we characterize the performance of a deep convolutional neural network designed to detect and quantify chemical elements in experimental x-ray photoelectron spectroscopy data.
Given the lack of a reliable database in literature, in order to train the neural network we computed a large (<100 k) dataset of synthetic spectra, based on randomly generated materials covered with a layer of adventitious carbon.
Fine details about the net layout, the choice of the loss function and the quality assessment strategies are presented and discussed.
CNNの詳細、損失関数の選択、性能評価結果などについて述べているようだ。
Given the synthetic nature of the training set, this approach could be applied to the automatization of any photoelectron spectroscopy system, without the need of experimental reference spectra and with a low computational effort.
4. Conclusions In conclusion, we have shown the application of a neural network to the identification and quantification task of XPS data on the basis of a synthetic random training set.
Results are encouraging, showing a detection and an accuracy comparable with standard XPS users, supporting both the training set generation algorithm and the DNN layout.
This approach can easily be scaled to different photon energies, energy resolution and data range; furthermore, theDNNcould be trained to provide more output values, such as the actual chemical shifts for each element, expanding the net sensitivity towards the chemical bonds classification.
Deep learning in electron microscopy Jeffrey M Ede, Mach. Learn.: Sci. Technol. 2 (2021) 011004
Abstract Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy.
1. Introduction Following decades of exponential increases in computational capability [1] and widespread data availability [2, 3], scientists can routinely develop artificial neural networks [4–11] (ANNs) to enable new science and technology [12–17].
1.1. Improving signal-to-noise A popular application of deep learning is to improve signal-to-noise [74, 75], for example, of medical electrical [76, 77], medical image [78–80], optical microscopy [81–84], and speech [85–88] signals.
1.2. Compressed sensing Compressed sensing [203–207] is the efficient reconstruction of a signal from a subset of measurements. Applications include faster medical imaging [208–210], image compression [211, 212], increasing image resolution [213, 214], lower medical radiation exposure [215–217], and low-light vision [218, 219]. In STEM, compressed sensing has enabled electron beam exposure and scan time to be decreased by 10–100× with minimal information loss [201, 202].
1.3. Labelling Deep learning has been the basis of state-of-the-art classification [270–273] since convolutional neural networks (CNNs) enabled a breakthrough in classification accuracy on ImageNet [71].
1.4. Semantic segmentation Semantic segmentation is the classification of pixels into discrete categories. In electron microscopy, applications include the automatic identification of local features [288, 289], such as defects [290, 291], dopants [292], material phases [293], material structures [294, 295], dynamic surface phenomena [296], and chemical phases in nanoparticles [297].
1.5. Exit wavefunction reconstruction Electrons exhibit wave-particle duality [350, 351], so electron propagation is often described by wave optics [352]. Applications of electron wavefunctions exiting materials [353] include determining projected potentials and corresponding crystal structure information [354, 355], information storage, point spread function deconvolution, improving contrast, aberration correction [356], thickness measurement [357], and electric and magnetic structure determination [358, 359].
2. Resources Access to scientific resources is essential to scientific enterprise [378]. Fortunately, most resources needed to get started with machine learning are freely available.
2.1. Hardware acceleration A DNN is an ANN with multiple layers that perform a sequence of tensor operations. Tensors can either be computed on central processing units (CPUs) or hardware accelerators [62], such as FPGAs [382–385], GPUs [386–388], and TPUs [389–391]. Most benchmarks indicate that GPUs and TPUs outperform CPUs for typical DNNs that could be used for image processing [392–396] in electron microscopy.
2.2. Deep learning frameworks A DLF [9, 458–464] is an interface, library or tool for DNN development. Features often include automatic differentiation [465], heterogeneous computing, pretrained models, and efficient computing [466] with CUDA [467–469], cuDNN [415, 470], OpenMP [471, 472], or similar libraries.
2.3. Pretrained models Training ANNs is often time-consuming and computationally expensive [403]. Fortunately, pretrained models are available from a range of open access collections [505], such as Model Zoo [506], Open Neural Network Exchange [507–510] (ONNX) Model Zoo [511], TensorFlow Hub [512, 513], and TensorFlow Model Garden [514].
2.4. Datasets Randomly initialized ANNs [537] must be trained, validated, and tested with large, carefully partitioned datasets to ensure that they are robust to general use [538].
2.5. Source code Software is part of our cultural, industrial, and scientific heritage [612]. Source code should therefore be archived where possible. For example, on an open source code platform such as Apache Allura [613], AWS CodeCommit [614], Beanstalk [615], BitBucket [616], GitHub [617], GitLab [618], Gogs [619], Google Cloud Source Repositories [620], Launchpad [621], Phabricator [622], Savannah [623] or SourceForge [624].
2.6. Finding information Most web traffic [636, 637] goes to large-scale web search engines [638–642] such as Bing, DuckDuckGo, Google, and Yahoo. This includes searches for scholarly content [643–645]. We recommend Google for electron microscopy queries as it appears to yield the best results for general [646–648], scholarly [644, 645] and other [649] queries.
2.7. Scientific publishing The number of articles published per year in reputable peer-reviewed [693–697] scientific journals [698, 699] has roughly doubled every nine years since the beginning of modern science [700].
3. Electron microscopy An electron microscope is an instrument that uses electrons as a source of illumination to enable the study of small objects. Electron microscopy competes with a large range of alternative techniques for material analysis [732–734], including atomic force microscopy [735–737]; Fourier transformed infrared spectroscopy [738, 739]; nuclear magnetic resonance [740–743]; Raman spectroscopy [744–750]; and x-ray diffraction (XRD) [751, 752], dispersion [753], fluorescence [754, 755], and photoelectron spectroscopy [756, 757].
3.1. Microscopes There are a variety of electron microscopes that use different illumination mechanisms. For example, reflection electron microscopy (REM) [759, 760], SEM [761, 762], STEM [763, 764], scanning tunnelling microscopy [765, 766] (STM), and TEM [767–769].
3.2. Contrast simulation The propagation of electron wavefunctions though electron microscopes can be described by wave optics [136]. Following, the most popular approach to modelling measurement contrast is multislice simulation [853, 854], where an electron wavefunction is iteratively perturbed as it travels through a model of a specimen.
3.3. Automation Most modern electron microscopes support Gatan Microscopy Suite (GMS) Software [894]. GMS enables electron microscopes to be programmed by DigitalMicrograph Scripting, a propriety Gatan programming language akin to a simplified version of C++.
4. Components Most modern ANNs are configured from a variety of DLF components. To take advantage of hardware accelerators [62], most ANNs are implemented as sequences of parallelizable layers of tensor operations [914]. Layers are often parallelized across data and may be parallelized across other dimensions [915]. This section introduces popular non-linear activation functions, normalization layers, convolutional layers, and skip connections. To add insight, we provide comparative discussion and address some common causes of confusion.
5. Architecture There is a high variety of ANN architectures [4–7] that are trained to minimize losses for a range of applications. Many of the most popular ANNs are also the simplest, and information about them is readily available. For example, encoder-decoder [305–308, 502–504] or classifier [272] ANNs usually consist of single feedforward sequences of layers that map inputs to outputs. This section introduces more advanced ANNs used in electron microscopy, including actor-critics, GANs, RNNs, and variational autoencoders (VAEs). These ANNs share weights between layers or consist of multiple subnetworks. Other notable architectures include recursive CNNs [1078, 1079], network-in-networks [1141], and transformers [1142, 1143]. Although they will not be detailed in this review, their references may be good starting points for research.
6. Optimization Training, testing, deployment and maintenance of machine learning systems is often time-consuming and expensive [1287–1290]. The first step is usually preparing training data and setting up data pipelines for ANN training and evaluation. Typically, ANN parameters are randomly initialized for optimization by gradient descent, possibly as part of an automatic machine learning (autoML) algorithm. RL is a special optimization case where the loss is a discounted future reward. During training, ANN components are often regularized to stabilize training, accelerate convergence, or improve performance. Finally, trained models can be streamlined for efficient deployment. This section introduces each step. We find that electron microscopists can be apprehensive about robustness and interpretability of ANNs, so we also provide subsections on model evaluation and interpretation.
On the use of deep learning for computational imaging G. BARBASTATHIS, A. OZCAN AND G. SITU, Vol. 6, No. 8 / August 2019 / Optica
Since their inception in the 1930–1960s, the research disciplines of computational imaging and machine learning have followed parallel tracks and, during the last two decades, experienced explosive growth drawing on similar progress in mathematical optimization and computing hardware.
While these developments have always been to the benefit of image interpretation and machine vision, only recently has it become evident that machine learning architectures, and deep neural networks in particular, can be effective for computational image formation, aside from interpretation.
The deep learning approach has proven to be especially attractive when the measurement is noisy and the measurement operator ill posed or uncertain.
Examples reviewed here are: super-resolution; lensless retrieval of phase and complex amplitude from intensity; photon-limited scenes, including ghost imaging; and imaging through scatter.
In this paper, we cast these works in a common framework.
We relate the deep-learning-inspired solutions to the original computational imaging formulation and use the relationship to derive design insights, principles, and caveats of more general applicability.
We also explore how the machine learning process is aided by the physics of imaging when ill posedness and uncertainties become particularly severe.
It is hoped that the present unifying exposition will stimulate further progress in this promising field of research.
1. INTRODUCTION Computational imaging (CI) is a class of imaging systems that, starting from an imperfect physical measurement and prior knowledge about the class of objects or scenes being imaged, deliver estimates of a specific object or scene presented to the imaging system [1–7]. This is shown schematically in Fig. 1.
The specific architecture of interest here is based on the neural network (NN), a multilayered computational geometry. Each layer is composed of simple nonlinear processing units, also referred to as activation units (or elements); and each unit receives its inputs as weighted sums from the previous layer (except the very first layer, whose inputs are the quantities we wish the NN to process.) Until about two decades ago, students were advised to design NNs with up to three layers: the input layer, the hidden layer, and the output layer.Recent progress in ML has demonstrated the superiority of architectures with many more than three layers, referred to as deep NNs (DNNs) [14–17]. Figure 2 is a simplified schematic diagram of the multi-layered DNN architecture.
During the past few years, a number of researchers have shown convincingly that the ML formulation is not only computationally efficient, but it also yields high-quality solutions in several CI problems. In this approach, shown in Fig. 3, the raw intensity image is fed into a computational engine specifically incorporating ML components, i.e., multilayered structures as in Fig. 2 and trained from examples—taking the place of the generic computational engine in Fig. 1. CI problems so solved have included lensless imaging, imaging through scatter, bandwidth- or samplinglimited imaging (also referred to as “super-resolution”), and extremely noisy imaging, e.g., under the constraint of very low photon counts.
2. OVERVIEW OF COMPUTATIONAL IMAGING A. General Formulation Referring to Fig. 1, let f denote the object or scene that the imaging system’s user wishes to retrieve. To avoid complications that are beyond the scope of this review, we will assume that even though objects are generally continuous, a discrete representation suffices [31–33]. Therefore, f is a vector or matrix matching the spatial dimension where the object is sampled. Light–object interaction is denoted by the illumination operatorHi, whereas the collection operatorHc models propagation through the rest of the optical system.
The output of the collection optics is optical intensityg, sampled and digitized at the output (camera) plane. After aggregating the illumination and collection models into the forward operator H = HcHi, the noiseless measurement model is g = Hf : (1). Since the measurements are by necessity discrete, g is arranged into a matrix of the appropriate dimension or rastered into a one-dimensional vector. For a single raw intensity image, g may be up to two dimensional; however, if scanning is involved (as, e.g., in computed tomography where multiple projections are obtained with the object rotated at various angles), then g must be augmented accordingly.
Uncertainty in the measurements and/or the forward operator is the main challenge in inverse problems. Typically, an optical measurement is subject to signal-dependent Poisson statistics due to the random arrival of signal photons, and additive signal- independent statistics due to thermal electrons in the detector circuitry. Thus, the deterministic model (1) should be replaced by g = P{Hf} + T : (2), Here, P generates a Poisson random process with arrival rate equal to its argument; and T is the thermal random process often modeled as additive white Gaussian noise (AWGN). In realistic sensors, noise may originate from multiple causes, such as environmental disturbances. For large photon counts, signal quantization is also modeled as AWGN.
B. Linear Inverse Problems, Regularization, and Sparsity For linear forward operators H, the image is obtained by minimizing the Tikhonov [3,4] functional
where || · ||2 denotes the L2 norm. The first term expresses fitness, i.e., matching in the least-squares sense the measurement to the forward model for the assumed object. The fitness term is constructed for AWGN errors, even though it is often used with more general noise models (2). The regularization parameterα expresses our relative belief in the measurement fitness versus our prior knowledge. Setting α = 0 to obtain the image from the fitness term yields only the pseudo-inverse solution, or its Moore–Penrose improvement [59,60]. The results are often prone to artifacts and seldom satisfactory, due to ill posedness in the forward operatorH. To improve, the second regularizing termΦ(f) is meant to compete with the fitness term, by driving the estimate fˆ to also match prior knowledge about the class of objects being imaged.
3. OVERVIEW OF NEURAL NETWORKS A. Neural Network Fundamentals
Classification tasks generally produce representations of much lower dimension than that of the input images; therefore, the width decreases progressively toward the output, following the contractingarchitecture in Fig. 4(a).
Up-sampling tasks, as in the image super-resolution examples that we discuss in Section 4.A, require output dimension larger than the input, so expanding architectures such as Fig. 4(b) may be considered.
The concatenation of the two is the encoder–decoder architecture in Fig. 4(c). The unit widths progressively decrease, forming a compressed (encoded) representation of the input near the waist of the structure, and then progressively increase again to produce the final reconstructed (decoded) image. In the encoder–decoder structure, skip connections are also used to transfer information directly between layers of the same width, bypassing the encoded channels.
B. Training and Testing Neural Networks The power of NNs to perform demanding computational tasks is drawn from the complex connectivity between very simple activation units. The training process determines the connectivity from examples, and can be supervised or unsupervised. The supervised mode has generally been used for CI tasks, though unsupervised training has also been proposed [111–113]. After training, performance is evaluated from test examples that were never presented during training.
The supervised training mode requires examples of inputs u and the corresponding precisely known outputs v˜. In practice, one starts from a database of available examples and splits them to training examples, validation examples, and test examples. The training examples are used to specify the network weights; the validation examples are used to determine when to stop training; and the test examples are never to be used during the training process, only to evaluate it.
Even if the test metric is the same as the training metric, generally the two do not evolve in the same way during training. Recall that test examples are not supposed to be used in any way during training; however, the test error may be monitored and plotted as a function of training epoch t, and typically its evolution compared to the training error is as shown in Fig. 5. The reason test error begins to increase after a certain training duration is that overtraining results in overfitting: network function becomes so specific to the training examples that it damages generalization. It is tempting to use the test error evolution to determine the optimum training duration topt; however, that is not permissible because it would contaminate the integrity of the test examples. This is the reason we set aside the third set of validation examples; their only purpose is to monitor error on them, and stop training just before this validation error starts to increase. Assuming that all three sets of training, test, and validation examples have been drawn so that they are statistically representative of the class of objects of interest, there is a reasonable guarantee that topt for validation and test error will be the same.
C. Weight Regularization Overtraining and overfitting relate to the complexity of the model being learned vis-à-vis the complexity of the NN. Here, we use the term complexity in the context of degrees of freedom in a computational model [133,134]. For learning models, in particular, model complexity is known as Vapnik–Chervonenkis (VC) dimension [135–138], and it should match the complexity of the computational task. Unfortunately, the VC dimension itself is seldom directly computable except for the simplest ML architectures.
D. Convolutional Neural Networks Certain tasks, such as speech and image processing, are naturally invariant to temporal and spatial shifts, respectively. This may be exploited to regularize the weights through convolutional architectures [146,147]. The convolutional NN (CNN) principle limits the spatial range on the next layer, i.e., the neighborhood where each unit is allowed to influence, and make the weights spatially repeating.
E. Training Loss Functions The most obvious TLF choices are the L2 (minimum square error, MSE) and L1 (MAE) metrics.
F. Physics Priors Unlike abstract classification, e.g., face recognition and customer taste prediction, in CI, the input g and intended output fˆ ≈ f of the NN are related by the known physics of the imaging system, i.e., by the operator H. Physical knowledge should be useful; how then to best incorporate it into an ML engine for imaging?
One possibility is to not incorporate it at all, as depicted in Fig. 10(a).
A compromise is the single-pass ML engine in Fig. 10(d). Here, an approximate inverse operator H* produces the single approximant f [0]. The single DNN is trained to receive f [0] as input and produce the image fˆ as output directly, rather than its projection onto the null space. In practice, the single-pass approach has proven to be robust and reliable even for CI problems with high ill posedness or uncertainty, as we will see in Sections 4.A(super-resolution)–4.C.
4. COMPUTATIONAL IMAGING REALIZATIONS WITH MACHINE LEARNING
The strategy for using ML for computational image formation is broadly described as follows:
(1) Obtain a database of physical realizations of objects and their corresponding raw intensity images through the instrument of interest. For example, such a physical database may be built by using an alternative imaging instrument considered accurate enough to be trusted as ground truth; or by displaying objects from a publicly available abstract database, e.g., ImageNet [178] on a spatial light modulator (SLM) as phase or intensity; or by rigorous simulation of the forward operator and associated noise processes. (2) Decide on an ML engine, regularization strategy, TLF (training loss function), and physical priors according to the principles of Sections 3.C–3.F, and then train the NN from the training and validation subsets of the database, as described in Section 3.B. (3) Test the ML engine for generalization by measuring a TLF, same as training or different, for on the test example subset of the database.
A. Super-Resolution
The two-point resolution problem was first posed by Airy [179] and Lord Rayleigh [180]. In modern optical imaging systems, resolution is understood to be limited by mainly two factors: undersampling by the camera, whence super-resolution should be taken to mean upsampling; and blur by the optics or camera motion, in which case super-resolution means deblurring. Both situations or their combination lead to a singular or severely ill-posed inverse problem due to suppression or loss of entire spatial frequency bands; therefore, they have attracted significant research interest, including some of the earliest uses of ML in the CI context.
[179]. G. B. Airy, “On the diffraction of an object-glass with circular aperture,” Trans. Cambridge Philos. Soc. 5, 283–291 (1834). [180]. L. Rayleigh, “Investigations in optics, with special reference to the spectroscope,” Philos. Mag. 8(49), 261–274 (1879).
A comprehensive review of methods for super-resolution in the sense of upsampling, based on a single image, is in [181]. To our knowledge, the first-ever effort to use a DNN in the same context was by Dong et al. [182,183]. The key insight, as with LISTA (Learned Iterative Shrinkage and Thresholding Algorithm) Review Article Vol. 6, No. 8 / August 2019 / Optica 930 (Section 3.F), was that dictionary-based sparse representations for upsampling [92,93] could equivalently be learned by DNNs. Both approaches similarly start by extracting compressed feature maps and then expanding these maps to a higher sampling rate. The difference is that sparse coding solvers are iterative; whereas, as we also pointed out in Section 1, with the ML approach, the iterative scheme takes place during training only; the trained ML engine operation is feed-forward and, thus, very fast. To combine super-resolution with motion compensation, a spatio-temporal CNN has been proposed, where, rather than simple images, the inputs are blocks consisting of multiple frames from video [184].
[92]. J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Proc. 19, 2861–2873 (2010).
[93]. J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, “Coupled dictionary training for image super-resolution,” IEEE Trans. Image Proc. 21, 3467–3478 (2012).
[181]. C.-Y. Yang, C. Ma, and M.-H. Yang, “Single-image super-resolution: a benchmark,” in European Conference on Computer Vision (ECCV)/ Lecture Notes on Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, eds. (2014), Vol. 8692, pp. 372–386.
[182]. C. Dong, C. Loy, K. He, and X. Tang, “Learning a deep convolutional neural network for image super-resolution,” in European Conference on Computer Vision (ECCV)/Lecture Notes on Computer Science Part IV (2014), Vol. 8692, pp. 184–199.
[183]. C. Dong, C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intel. 38, 295–307 (2015).
[184]. J. Caballero, C. Ledig, A. Aitken, A. Acosta, J. Totz, Z. Wang, and W. Shi, “Real-time video super-resolution with spatio-temporal networks and motion compensation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 4778–4787.
The ML approach to the super-resolution problem also served as motivation and testing ground for the perceptual TLF [170,171] (Section 3.E). The structure of the downsampling kernel was exploited in [177] using the cascaded ML engine architecture in Fig. 10(c) with M = 4. Figure 11 is a representative result showing the evolution of the image estimates along the ML cascade, as well as their spatial spectra. It is interesting that, by the final stage, the ML engine has succeeded in both suppressing high-frequency artifacts due to undersampling and boosting low frequency components to make the reconstruction appear smooth.
[170]. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision (ECCV)/Lecture Notes on Computer Science, B. Leide, J. Matas, N. Sebe, and M. Welling, eds. (2016), vol. 9906, pp. 694–711. [171]. C. Ledig, L. Theis, F. Huczar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 4681–4690.
[177]. M. Mardani, H. Monajemi, V. Papyan, S. Vasanawala, D. Donoho, and J. Pauly, “Recurrent generative residual networks for proximal learning and automated compressive image recovery,” arXiv:1711.10046 (2017).
Turning to inverse problems dominated by blur, early work [185] used a perceptron network with two hidden layers and a sigmoidal activation function to compensate for static blur caused by Gaussian and rectangular kernels, as well as motion blur [186]. Two years later, Sun Jiao et al. [187] showed that a CNN can learn to compensate even when the motion blur kernel across the image is non-uniform. This was accomplished by feeding the CNN with rotated patches containing simple object features, such that the network learned to predict the direction of motion.
[185]. C. J. Schuler, H. Christopher Burger, S. Harmeling, and B. Scholkopf, “A machine learning approach for non-blind image deconvolution,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013). [186]. A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009). [187]. J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015).
In optical microscopy, blur is typically caused by aberrations and diffraction [188]. More than 100 years of research, tracing back to Airy and Rayleigh’s observations, have been oriented toward modifying the optical hardware—in our language, designing the illumination and collection operators—to compensate for the blur and obtain sharp images of objects down to sub-micrometer size. Thorough review of this literature is beyond the present scope; we just point out the culmination of optical super-resolution methods with the 2014 Nobel Prize in Chemistry [189–192]. Stochastic optical reconstruction microscopy (STORM) and fluorescence photoactivation localization microscopy (PALM) for single molecule imaging [193,194] and localization [195] are examples of co-designing the illumination operator Hi and the computational inverse to achieve performance vastly better than an unaided microscope could do.
[188]. M. Sarikaya, “Evolution of resolution in microscopy,” Ultramicroscopy 47, 1–14 (1992). [189]. W. E. Moerner and L. Kador, “Optical detection and spectroscopy of single molecules in a solid,” Phys. Rev. Lett. 62, 2535–2538 (1989). [190]. S. W. Hell and J. Wichmann, “Breaking the diffraction resolution limit by stimulated emission: stimulated-emission-depletion fluorescence microscopy,” Opt. Lett. 19, 780–782 (1994). [191]. E. Betzig, “Proposed method for molecular optical imaging,” Opt. Lett. 20, 237–239 (1995). [192]. R. M. Dickson, A. B. Cubitt, R. Y. Tsien, and W. E. Moerner, “On/off blinking and switching behaviour of single molecules of green fluorescent protein,” Nature 388, 355–358 (1997). [193]. M. J. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM),” Nat. Methods 3, 793–796 (2006). [194]. E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwarz, and H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science 313, 1642–1645 (2006). [195]. S. T. Hess, T. P. Girirajan, and M. D. Mason, “Ultra-high resolution imaging by fluorescence photoactivation localization microscopy,” Biophys. J. 91, 4258–4272 (2006).
Computationally, the blur kernel can be compensated for through iterative blind deconvolution [196,197] or learned from examples [198]. A DNN-based solution to the inverse problem was proposed for the first time, to our knowledge, by Rivenson et al. [199] in a wide-field microscope. The approach and results are summarized in Fig. 12. For training, the samples were imaged twice, once with a 40 × 0.95 NA objective lens and again with a 100 × 1.4 NA objective lens. The training goal was such that with the 40 × 0.95 NA raw images as input g, the DNN would produce estimates fˆ matching the 100 × 1.4 NA images, i.e., the latter were taken to approximate the true objects f . The number of pixels in the high-resolution images was (2.5)^2 × the number of pixels in the low-resolution representation. Of course, the low resolution images were also subject to stronger blur due to the lower-NA objective lens. Therefore, the inverse algorithm had to perform both upsampling and deblurring in this case. The ML engine was of the end-to-end type, as in Fig. 10(a), implemented as convolutional DNN with pyramidal progression for upsampling. The TLF was a mixture of the MSE metric (23) and a TV-like ∂2 TV [Eq. (6)] penalty. Since then, ML has been shown to improve the resolution of fluorescence microscopy [200], as well as single-molecule STORM imaging [201] and 3D localization [202].
[196]. T. G. Stockham, T. M. Cannon, and R. B. Ingebretsen, “Blind deconvolution through digital signal processing,” Proc. IEEE 63, 678–692 (1975). [197]. G. R. Ayers and J. C. Dainty, “Iterative blind deconvolution method and its applications,” Opt. Lett. 13, 547–549 (1988). [198]. T. Kenig, Z. Kam, and A. Feuer, “Blind image deconvolution using machine learning for three-dimensional microscopy,” IEEE Trans. Pattern Anal. Mach. Intel. 32, 2191–2204 (2010). [199]. Y. Rivenson, Z. Gorocs, H. Gunaydin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica 4, 1437–1443 (2017).
[200]. H. Wang, Y. Rivenson, Z. Wei, H. Gunaydin, L. Bentolila, and A. Ozcan, “Deep learning achieves super-resolution in fluorescence microscopy,” Nat. Methods (2018).
[201]. E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-STORM: super-resolution single-molecule microscopy by deep learning,” Optica 5, 458–464 (2018).
[202]. N. Boyd, E. Jonas, H. P. Babcock, and B. Recht, “DeepLoco: fast 3D localization microscopy using neural networks,” bioRxiv.
B. Quantitative Phase Retrieval and Lensless Imaging
The forward operator relating the complex amplitude of an object to the raw intensity image at the exit plane of an optical system is nonlinear. Classical iterative solutions are the Gerchberg–Saxton algorithm [203,204]; the input–output algorithm, originally proposed by Fienup [205] and subsequent variants [206–208]; and the gradient descent [209] or its variants, steepest descent and conjugate gradient [210]. This inverse problem has attracted considerable attention because of its importance in retrieving the shape or optical density of transparent samples with visible light [211,212] and x rays [213,214].
[203]. R. W. Gerchberg and W. O. Saxton, “Phase determination from image and diffraction plane pictures in electron-microscope,” Optik 34, 275–284 (1971). [204]. R. W. Gerchberg and W. O. Saxton, “Practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35, 237–246 (1972). [205]. J. R. Fienup, “Reconstruction of an object from the modulus of its Fourier transform,” Opt. Lett. 3, 27–29 (1978). [206]. J. Fienup and C. Wackerman, “Phase-retrieval stagnation problems and solutions,” J. Opt. Soc. Am. A 3, 1897–1907 (1986). [207]. H. H. Bauschke, P. L. Combettes, and D. R. Luke, “Phase retrieval, error reduction algorithm, and fienup variants: a view from convex optimization,” J. Opt. Soc. Am. A 19, 1334–1345 (2002). [208]. V. Elser, “Phase retrieval by iterated projections,” J. Opt. Soc. Am. A 20, 40–55 (2003). [209]. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21, 2758–2769 (1982). [210]. M. R. Hestenes and E. Stiefel, “Method of conjugate gradients for solving linear systems,” J. Res. Natl. Bur. Stand. 49, 409–436 (1952). [211]. P. Marquet, B. Rappaz, P. J. Magistretti, E. Cuche, Y. Emery, T. Colomb, and C. Depeursinge, “Digital holographic microscopy: a noninvasive contrast imaging technique allowing quantitative visualization of living cells with subwavelength axial accuracy,” Opt. Lett. 30, 468–470 (2005). [212]. G. Popescu, T. Ikeda, R. R. Dasari, and M. S. Feld, “Diffraction phase microscopy for quantifying cell structure and dynamics,” Opt. Lett. 31, 775–777 (2006). [213]. S. C. Mayo, T. J. Davis, T. E. Gureyev, P. R. Miller, D. Paganin, A. Pogany, A. W. Stevenson, and S. W. Wilkins, “X-ray phase-contrast microscopy and microtomography,” Opt. Express 11, 2289–2302 (2003). [214]. F. Pfeiffer, T. Weitkamp, O. Bunk, and C. David, “Phase retrieval and differential phase-contrast imaging with low-brilliance x-ray sources,” Nat. Phys. 2, 258–261 (2006).
よくわからないが、先へ進もう。
In the case of weak scattering, the problem may be linearized through a quasi-hydrodynamic approximation leading to the transport of intensity equation (TIE) formulation [215,216]. Alternatively, if a reference beam is provided in the optical system, the measurement may be interpreted as a digital hologram [217], and the object may be reconstructed by a computational backpropagation algorithm [218,219] (not to be confused with the back-propagation algorithm for NN training, Section 3.B.) Ptychography captures measurements effectively in the phase (Wigner) space, where the problem is linearized, by modulating the illumination with a quadratic phase and structuring it so that it is confined and scanned in either space [220–224] or angle [225–227]. Due to the difficulty of the phase retrieval inverse problem, compressive priors have often been used to regularize it in its various linear forms, including digital holography [228,229], TIE [82,230], and Wigner deconvolution ptychography [231,232].
[215]. M. R. Teague, “Deterministic phase retrieval: a Green’s function solution,” J. Opt. Soc. Am. 73, 1434–1441 (1983). [216]. N. Streibl, “Phase imaging by the transport-equation of intensity,” Opt. Commun. 49, 6–10 (1984). [217]. J. W. Goodman and R. Lawrence, “Digital image formation from electronically detected holograms,” Appl. Phys. Lett. 11, 77–79 (1967). [218]. W. Xu, M. H. Jericho, I. A. Meinertzhagen, and H. J. Kreuzer, “Digital inline holography for biological applications,” Proc. Nat. Acad. Sci. USA 98, 11301–11305 (2001). [219]. J. H. Milgram and W. Li, “Computational reconstruction of images from holograms,” Appl. Opt. 41, 853–864 (2002). [220]. S. L. Friedman and J. M. Rodenburg, “Optical demonstration of a new principle of far-field microscopy,” J. Phys. D 25, 147–154 (1992). [221]. B. C. McCallum and J. M. Rodenburg, “Two-dimensional demonstration of Wigner phase-retrieval microscopy in the STEM configuration,” Ultramicroscopy 45, 371–380 (1992). [222]. J. M. Rodenburg and R. H. T. Bates, “The theory of super-resolution electron microscopy via Wigner-distribution deconvolution,” Philos. Trans. R. Soc. London A 339, 521–553 (1992). [223]. A. M. Maiden and J. M. Rodenburg, “An improved ptychographical phase retrieval algorithm for diffractive imaging,” Ultramicroscopy 109, 1256–1262 (2009). [224]. P. Li, T. B. Edo, and J. M. Rodenburg, “Ptychographic inversion via wigner distribution deconvolution: noise suppression and probe design,” Ultramicroscopy 147, 106–113 (2014). [225]. G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution Fourier ptychographic microscopy,” Nat. Photonics 7, 739–745 (2013). [226]. X. Ou, R. Horstmeyer, and C. Yang, “Quantitative phase imaging via Fourier ptychographic microscopy,” Opt. Lett. 38, 4845–4848 (2013). [227]. R. Horstmeyer, “A phase space model for Fourier ptychographic microscopy,” Opt. Express 22, 338–358 (2014). [228]. D. J. Brady, K. Choi, D. L. Marks, R. Horisaki, and S. Lim, “Compressive holography,” Opt. Express 17, 13040–13049 (2009). [229]. Y. Rivenson, A. Stern, and B. Javidi, “Compressive Fresnel holography,” J. Disp. Technol. 6, 506–509 (2010). [230]. A. Pan, L. Xu, J. C. Petruccelli, R. Gupta, B. Singh, and G. Barbastathis, “Contrast enhancement in x-ray phase contrast tomography,” Opt. Express 22, 18020–18026 (2014). [231]. Y. Zhang, W. Jiang, L. Tian, L. Waller, and Q. Dai, “Self-learning based Fourier ptychographic microscopy,” Opt. Express 23, 18471–18486 (2015). Review Article Vol. 6, No. 8 / August 2019 / Optica 941 [232]. J. Lee and G. Barbastathis, “Denoised Wigner distribution deconvolution via low-rank matrix completion,” Opt. Express 24, 20069–20079 (2016).
When the linearization assumptions do not apply or regularization priors are not explicitly available, an ML engine may instead be applied directly on the nonlinear inverse problem. To our knowledge, this investigation was first attempted by Sinha et al. with binary pure phase objects [233], and subsequently with multi-level pure phase objects [234]. Representative results are shown in Fig. 13. The phase objects were displayed on a reflective SLM (spatial light modulator), and the light propagated in free space until intensity sampling by the camera. The ML engine of the end-to-end type [Fig. 10(a)] was of the convolutional DNN type with residuals. Training was carried out by drawing objects from standard databases, Faces-LFW, and ImageNet, converting each object’s grayscale intensity to a phase signal in the range (0, π), and then displaying that signal on the SLM. Because of the relatively large range of phase modulation, linearizing assumptions would have been invalid in this arrangement.
[233]. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” arXiv:1702.08516 (2017). [234]. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4, 1117–1125 (2017). Abstract : Deep learning has been proven to yield reliably generalizable solutions to numerous classification and decision tasks. Here, we demonstrate for the first time to our knowledge that deep neural networks (DNNs) can be trained to solve end-to-end inverse problems in computational imaging. We experimentally built and tested a lensless imaging system where a DNN was trained to recover phase objects given their propagated intensity diffraction patterns.
Retrieval of the complex amplitude, i.e., of both the magnitude and phase, of biological samples using ML in the digital holography (DH) arrangement was reported by Rivenson et al. [240]; see Fig. 14. The samples used in the experiments were from breast tissue, Papanicolaou (Pap) smears, and blood smears. In this case, the ML engine used a single-pass physics-informed preprocessor, as in Fig. 10(d), with the approximant H implemented as the (optical) backpropagation algorithm. The DNN was of the convolutional type. Training was carried out using up to eight holograms to produce accurate estimates of the samples’ phase profiles. After training, the ML engine was able, with a single hologram input, to match imaging quality, in terms of SSIM (Structural Similarity Image Measure : Section 3.E) of traditional algorithms that would have required two to three times as many holograms, and was faster as well by a factor of three to four times.
[240]. Y. Rivenson, Y. Zhang, H. Gunaydin, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light Sci. Appl. 7, 17141 (2018).
Abstract : Phase recovery from intensity-only measurements forms the heart of coherent imaging techniques and holography. In this study, we demonstrate that a neural network can learn to perform phase recovery and holographic image reconstruction after appropriate training. This deep learning-based approach provides an entirely new framework to conduct holographic imaging by rapidly eliminating twin-image and self-interference-related spatial artifacts. This neural network-based method is fast to compute and reconstructs phase and amplitude images of the objects using only one hologram, requiring fewer measurements in addition to being computationally faster. We validated this method by reconstructing the phase and amplitude images of various samples, including blood and Pap smears and tissue sections. These results highlight that challenging problems in imaging science can be overcome through machine learning, providing new avenues to design powerful computational imaging systems.
[256]. M. Deng, S. Li, and G. Barbastathis, “Learning to synthesize: splitting and recombining low and high spatial frequencies for image recovery,” arXiv:1811.07945 (2018).
C. Imaging of Dark Scenes
The challenges associated with super-resolution and phase retrieval become much exacerbated when the photon budget is tight or other sources of noise are strong. This is because deconvolutions, in general, tend to amplify noise artifacts [5]. In standard photography, histogram equalization and gamma correction are automatically applied by modern high-end digital cameras and even in smartphones; however, “grainy” images and color distortion still occur. In more challenging situations, a variety of more sophisticated denoising algorithms utilizing compressed sensing and local feature representations have been investigated and benchmarked [257–262]. What these algorithms exploit, with varying success, is that natural images are characterized by the prior of strong correlation structure, which should persist even under noise fluctuations that much exceed the signal. Understood in this sense, ML presents itself as an attractive option to learn the correlation structures and then recover high-resolution content from the noisy raw images.
The first use of a CNN for monochrome Poisson denoising, to our knowledge, was by Remez et al. [263]. More recently, a convolutional network of the U-net type was trained to operate on all three color channels under illumination and exposure conditions that, to the naked eye, make the raw images appear entirely dark while histogram- and gamma-corrected reconstructions are severely color distorted [169]; see Fig. 17. The authors created a see-in-the-dark (SID) dataset of short-exposure images, coupled with their respective long-exposure images, for training and testing; and used Amazon’s Mechanical Turk platform for perceptual image evaluation by humans [168]. They also report that, unlike other related works, neither skip connections in U-net nor generative adversarial training led to any improvement in their reconstructions.
[169]. C. Chen, Q. Chen, J. Xu, and V. Koltun, “Learning to see in the dark,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018), pp. 3291–3300.
Lyu et al. [279] used deep learning with the single-pass physics-informed engine [Fig. 10(d)] and approximant H computed according to the original computational ghost imaging [273]. Due to the low sampling rate and the noisy nature of the raw measurements, the approximant reconstructions fˆ [0] were corrupted and unrecognizable. However, when these fˆ [0] were used as input to the DNN, high-quality final estimates fˆ were obtained even with sampling rates β as low as 5%, as shown in Fig. 19.
[273]. J. H. Shapiro, “Computational ghost imaging,” Phys. Rev. A 78, 061802 (2008).
[279]. M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. 7, 17865 (2017).
D. Imaging in the Presence of Strong Scattering
Imaging through diffuse media [280,281] is a classical challenging inverse problem with significant practical applications ranging from non-invasive medical imaging through tissue to autonomous navigation of vehicles in foggy conditions. The noisy statistical inverse model formulation (2) must now be reinterpreted with the forward operator H itself becoming random. When f is the index of refraction of the strongly scattering medium itself, then H is also nonlinear. Not surprisingly, this topic has attracted considerable attention in the literature, with most attempts generally belonging to one of two categories. The first is to characterize the diffuse medium H, assuming it is accessible and static, through (incomplete) measurement of the operator H, which in this context is referred to as transmission matrix [282–284]. The alternative is to characterize statistical similarities between moments of H. The second-order moment, or speckle correlations, are known as the memory effect. The idea originated in the context of electron propagation in disordered conductors [285] and of course is also valid for the analogous problem of optical disordered media [286–290].
Deep learning solutions to the problem were first presented in [299] and [155], using end-to-end fully connected and residualconvolutional (CNN) architectures, respectively. Results are shown in Figs. 22 and 23. The fully connected solution [299] is motivated by the physical fact that when light propagates through a strongly scattering medium, every object pixel influences every raw image pixel in shift non-invariant fashion. However, the large number of connections creates risks of undertraining and overfitting, and limits the space-bandwidth product (SBP) of the reconstructions due to limited computational resources. On the other hand, the CNN trained with NPCC loss function [155,300], despite being designed for situations when limited range of influence and shift invariance constraints are valid, Section 3.D, does a surprisingly good job at learning shift variance—through the ReLU nonlinearities and pooling operations, presumably— and achieves larger SBP. Both methods work well with spatially sparse datasets, e.g., handwritten numerical digits, and Latin and Chinese characters. Compared to Horisaki et al. [18], the deep architectures perform comparably well with spatially dense datasets of restricted content, e.g., faces, and also hallucinate when tested outside their learned priors.
Non-line-of-sight (NLOS) imaging, recognition, and tracking belong to a related class of problems, because capturing details about objects in such cases must rely on scattering, typically of light pulses [301–309] or spatially incoherent light [310–313]. Convolutional DNNs have been found to be useful for improving gesture classification [314], and person identification and threedimensional localization [315]; in the latter case even with asingle-photon, single-pixel detector only.
5. CONCLUDING REMARKS
The diverse collection of ML flavors adopted and problems tackled by the CI community in a relatively brief time period, mostly since ∼2010 [104], indicate that the basic idea of doing at least partially the job of Tikhonov–Wiener optimization by DNN holds much promise. A significant increase in the rate of related publications is evident—we had trouble keeping up while crafting the present review—and is likely to accelerate, at least in the near future. As we saw in Section 4, in many cases, ML algorithms have been discovered to offer new insights or substantial performance improvements on previous CI approaches, mostly compressive sensing based, whereas in other cases, particular challenges associated with acute CI problems have prompted innovations in ML architectures themselves. This productive interplay is likely to benefit both disciplines in the long run, especially because of the strong connection they share through optimization theory and practice.
(後半省略)
AtomAI: A Deep Learning Framework for Analysis of Image and Spectroscopy Data in (Scanning) Transmission Electron Microscopy and Beyond
Maxim Ziatdinov, Ayana Ghosh, Tommy Wong and Sergei V. Kalinin
AtomAI is an open-source software package bridging instrument-specific Python libraries, deep learning, and simulation tools into a single ecosystem. AtomAI allows direct applications of the deep convolutional neural networks for atomic and mesoscopic image segmentation converting image and spectroscopy data into class-based local descriptors for downstream tasks such as statistical and graph analysis. For atomically-resolved imaging data, the output is types and positions of atomic species, with an option for subsequent refinement. AtomAI further allows the implementation of a broad range of image and spectrum analysis functions, including invariant variational autoencoders (VAEs). The latter consists of VAEs with rotational and (optionally) translational invariance for unsupervised and class-conditioned disentanglement of categorical and continuous data representations. In addition, AtomAI provides utilities for mapping structure property relationships via im2spec and spec2im type of encoder-decoder models. Finally, AtomAI allows seamless connection to the first principles modeling with a Python interface, including molecular dynamics and density functional theory calculations on the inferred atomic position. While the majority of applications to date were based on atomically resolved electron microscopy, the flexibility of AtomAI allows straightforward extension towards the analysis of mesoscopic imaging data once the labels and feature identification workflows are established/available. The source code and example notebooks are available at https://github.com/pycroscopy/atomai.
Jones R R, Hooper D C, Zhang L, Wolverson D and Valev V K 2019 "Raman techniques: Fundamentals and frontiers Nanoscale", Res. Lett. 14 1–34
Raman spectroscopy is now an eminent technique for the characterisation of 2D materials (e.g. graphene [8–10] and transition metal dichalcogenides [11–13]) and phonon modes in crystals [14–16]. Properties such as number of monolayers [9, 12, 17, 18], inter-layer breathing and shear modes [19], in-plane anisotropy [20], doping [21–23], disorder [10, 24–26], thermal conductivity [11], strain [27] and phonon modes [14, 16, 28] can be extracted using Raman spectroscopy.
Denoising of stimulated Raman scattering microscopy images via deep learningB. MANIFOLD, E. THOMAS, A. T. FRANCIS, A. H. HILL, AND DAN FU Vol. 10, No. 8 | 1 Aug 2019 | BIOMEDICAL OPTICS EXPRESS 3861
Abstract: Stimulated Raman scattering (SRS) microscopy is a label-free quantitative chemical imaging technique that has demonstrated great utility in biomedical imaging applications ranging from real-time stain-free histopathology to live animal imaging. However, similar to many other nonlinear optical imaging techniques, SRS images often suffer from low signal to noise ratio (SNR) due to absorption and scattering of light in tissue as well as the limitation in applicable power to minimize photodamage. We present the use of a deep learning algorithm to significantly improve the SNR of SRS images. Our algorithm is based on a U-Net convolutional neural network (CNN) and significantly outperforms existing denoising algorithms. More importantly, we demonstrate that the trained denoising algorithm is applicable to images acquired at different zoom, imaging power, imaging depth, and imaging geometries that are not included in the training. Our results identify deep learning as a powerful denoising tool for biomedical imaging at large, with potential towards in vivo applications, where imaging parameters are often variable and ground-truth images are not available to create a fully supervised learning training set.
Rapid histology of laryngeal squamous cell carcinoma with deep-learning based stimulated Raman scattering microscopy, Lili Zhang et al., Theranostics 2019, Vol. 9, Issue 9 2541
Abstract Maximal resection of tumor while preserving the adjacent healthy tissue is particularly important for larynx surgery, hence precise and rapid intraoperative histology of laryngeal tissue is crucial for providing optimal surgical outcomes. We hypothesized that deep-learning based stimulated Raman scattering (SRS) microscopy could provide automated and accurate diagnosis of laryngeal squamous cell carcinoma on fresh, unprocessed surgical specimens without fixation, sectioning or staining. Methods: We first compared 80 pairs of adjacent frozen sections imaged with SRS and standard hematoxylin and eosin histology to evaluate their concordance. We then applied SRS imaging on fresh surgical tissues from 45 patients to reveal key diagnostic features, based on which we have constructed a deep learning based model to generate automated histologic results. 18,750 SRS fields of views were used to train and cross-validate our 34-layered residual convolutional neural network, which was used to classify 33 untrained fresh larynx surgical samples into normal and neoplasia. Furthermore, we simulated intraoperative evaluation of resection margins on totally removed larynxes. Results: We demonstrated near-perfect diagnostic concordance (Cohen's kappa, κ > 0.90) between SRS and standard histology as evaluated by three pathologists. And deep-learning based SRS correctly classified 33 independent surgical specimens with 100% accuracy. We also demonstrated that our method could identify tissue neoplasia at the simulated resection margins that appear grossly normal with naked eyes. Conclusion: Our results indicated that SRS histology integrated with deep learning algorithm provides potential for delivering rapid intraoperative diagnosis that could aid the surgical management of laryngealcancer.
European Gravitational Observatory - EGO:17 teams, 3 months to go
新たなコンペがスタートした。
G2Net Gravitational Wave Detection Find gravitational wave signals from binary black hole collisions
Observation of Gravitational Waves from a Binary Black Hole Merger B. P. Abbott et al. (LIGO Scientific Collaboration and Virgo Collaboration)
On September 14, 2015 at 09:50:45 UTC the two detectors of the Laser Interferometer Gravitational-Wave Observatory simultaneously observed a transient gravitational-wave signal. The signal sweeps upwards in frequency from 35 to 250 Hz with a peak gravitational-wave strain of 1.0 × 10−21. It matches the waveform predicted by general relativity for the inspiral and merger of a pair of black holes and the ringdown of the resulting single black hole. The signal was observed with a matched-filter signal-to-noise ratio of 24 and a false alarm rate estimated to be less than 1 event per 203 000 years, equivalent to a significance greater than 5.1σ. The source lies at a luminosity distance of 410þ160 −180 Mpc corresponding to a redshift z ¼ 0.09þ0.03 −0.04 . In the source frame, the initial black hole masses are 36þ5 −4M⊙ and 29þ4 −4M⊙, and the final black hole mass is 62þ4 −4M⊙, with 3.0þ0.5 −0.5M⊙c2 radiated in gravitational waves. All uncertainties define 90% credible intervals. These observations demonstrate the existence of binary stellar-mass black hole systems. This is the first direct detection of gravitational waves and the first observation of a binary black hole merger.
Virgoは、イタリア、フランス、オランダ、ポーランド、ハンガリー、スペインの6か国の研究所による科学的コラボレーションの一部である。アメリカ・ワシントン州のハンフォード・サイトとルイジアナ州リビングストンにある2つのLIGO干渉計を含む、Virgoと同様の他の大型干渉計は、いずれも重力波を検出するという同じ目標を持っている。 2007年以降、VirgoとLIGOは、それぞれの検出器で記録されたデータを共有して共同で解析し、その結果を共同で発表することに同意している[1]。干渉検出器には指向性がなく(掃天観測する)、弱くて頻度の低い1回限りのイベントの信号を探しているため、信号の妥当性を確認し、信号源の方向を推定するためには、複数の干渉計で重力波を同時に検出する必要がある。from Wikipedia
7月2日(金)
European Gravitational Observatory - EGO:45 teams, 3 months to go
European Gravitational Observatory - EGO:110 teams, 3 months to go
In this competition, you’ll aim to detect GW signals from the mergers of binary black holes. Specifically, you'll build a model to analyze simulated GW time-series data from a network of Earth-based detectors.
シミュレーションによって作成されたスペクトルを解析する、ということでは、開催・中断中の次のコンペ "SETI Breakthrough Listen - E.T. Signal Search, Find extraterrestrial signals in data from deep space"と同様である。違うのは重力波は検出されたが、地球外生命体からの信号はまだ検出されていないことである。重力波は理論的根拠が存在するが、地球外生命体からの信号には、理論的根拠が存在しない(だろうと思われる)。
The parameters that determine the exact form of a binary black hole waveform are the masses, sky location, distance, black hole spins, binary orientation angle, gravitational wave polarisation, time of arrival, and phase at coalescence (merger). These parameters (15 in total) have been randomised according to astrophysically motivated prior distributions and used to generate the simulated signals present in the data, but are not provided as part of the competition data.
Gravitational wave denoising of binary black hole mergers with deep learning
Wei Wei and E.A. Huerta, PhysicsLettersB800(2020)135081 Gravitational wave detection requires an in-depth understanding of the physical properties of gravitational wave signals, and the noise from which they are extracted. Understanding the statistical properties of noise is a complex endeavor, particularly in realistic detection scenarios. In this article we demonstrate that deep learning can handle the non-Gaussian and non-stationary nature of gravitational wave data, and showcase its application to denoise the gravitational wave signals generated by the binary black hole mergers GW150914, GW170104, GW170608 and GW170814 from advanced LIGO noise. To exhibit the accuracy of this methodology, we compute the overlap between the time-series signals produced by our denoising algorithm, and the numerical relativity templates that are expected to describe these gravitational wave sources, finding overlaps O0.99. We also show that our deep learning algorithm is capable of removing noise anomalies from numerical relativity signals that we inject in real advanced LIGO data. We discuss the implications of these results for the characterization of gravitational wave signals.
さらに相対論的2体力学の解析的研究が進み、2005年には(Pretorius, Campanelli et al., Baker et al.,)2体ブラックホールの合体(合併)により発生する重力波を計算した結果が報告されている。Pretoriusの計算によれば、合体によって約5%の質量が失われ、それがエネルギーとして放出される。
日本の状況:TAMA 300の後継機KAGRAは2010年に開発が開始され、2013年には開発経過が論文発表され(Phys. Rev. D 88, 043007)、そこには2017年に稼働予定と書かれている。The construction of KAGRA started in 2010 and it is planed to start the operation of the detector at its full configuration in 2017.:2015年の大発見に間に合わなかった。
干渉検出器には指向性がなく(掃天観測する)、弱くて頻度の低い1回限りのイベントの信号を探しているため、信号の妥当性を確認し、信号源の方向を推定するためには、複数の干渉計で重力波を同時に検出する必要がある。by Wikipedia このことは、コンペと関係あるかもしれない。H1とL1は特性が同じだから、GWが検出されていれば、GWの信号強度(ひずみの大きさ)が同じで、位相が6 msずれる、ということになるのではないだろうか。VirgoはLIGOよりも感度が悪いので、スペクトルにノイズがみられるが、位相がいくらかずれて、GWが検出されるはず。これらのスペクトルを画像化してCNNで分類するのが良さそうに思うが、3つのスペクトルをモデルにどういうふうに認識させるのが良いのだろうか。
2つの方法で重力波の信号探索を行った。1つは、相対性理論から予測される重力波の波形を用いたフィルタリング、1つは、一般的な一時的な信号で、最小限の仮定の下に作成した波形を探すという方法である。独立に探索した結果、両法でbinary black hole mergerからの強い信号を探し当てることができた。さらに2つの観測地点の結果が、観測場所の距離に相当する時間差をもって一致した。
Your final score may not be based on the same exact subset of data as the public leaderboard, but rather a different private data subset of your full submission — your public score is only a rough indication of what your final score is.
You should thus choose submissions that will most likely be best overall, and not necessarily on the public subset.
How to accessLIGO data How to do some basic signal processing Data visualization of LIGO data in time-frequency plots Matched filtering to extract a known signal
7月11日(日)
European Gravitational Observatory - EGO:220 teams, 3 months to go
公開コードのいくつかは、CQT1992v2を使っているので、CQTについて調べてみた。2020年の論文:nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks:
I. INTRODUCTION
SPECTROGRAMS, as time-frequency representations of audio signals, have been used as input for neural network models since the 1980s [1–3]. Different types of spectrograms are tailored to different applications. For example, Mel spectrograms and Mel frequency cepstral coefficients (MFCCs) are designed for speech-related applications [4, 5], and the constant-Q transformation is best for music related applications [6, 7]. Despite recent advances in end-to-end learning in the audio domain, such as WaveNet [8] and SampleCNN [9], which make model training on raw audio data possible, many recent publications still use spectrograms as the input to their models for various applications [10].(注)CQTは、constant-Q transformationの略である。
An audio signal is a representation of sound, typically using either a changing level of electrical voltage for analog signals, or a series of binary numbers for digital signals. Audio signals have frequencies in the audio frequency range of roughly20 to 20,000 Hz, which corresponds to the lower and upper limits of human hearing. Audio signals may be synthesized directly, or may originate at a transducer such as a microphone, musical instrument pickup, phonograph cartridge, or tape head. Loudspeakers or headphones convert an electrical audio signal back into sound.
重力波の解析なので、PyCBCのチュートリアル; "How to do some basic signal processing" を眺めてみよう。
Transformers State-of-the-art Natural Language Processing for Jax, Pytorch and TensorFlow
RoBERTa Overview The RoBERTa model was proposed in
RoBERTa: A Robustly Optimized BERT Pretraining Approach
by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google’s BERT model released in 2018.
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with much larger mini-batches and learning rates.
The abstract from the paper is the following:
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.
Coleridge Initiative - Show US the Data, Discover how data is used for the public good
In this competition, you'll use natural language processing (NLP) to automate the discovery of how scientific data are referenced in publications. Utilizing the full text of scientific publications from numerous research areas gathered from CHORUS publisher members and other sources, you'll identify data sets that the publications' authors used in their work.
Harvard Data Science Review • Issue 3.2, Spring 2021 Enhancing and Accelerating Social Science Via Automation: Challenges and Opportunities
Tal Yarkoni, Dean Eckles, James A. J. Heathers, Margaret C. Levenstein, Paul E. Smaldino, Julia Lane Published on: Apr 30, 2021 DOI: 10.1162/99608f92.df2262f5
6月11日(金)
Coleridge Initiative:
1,483 teams
12 days to go
The objective of the competition is to identify the mention of datasets within scientific publications. Your predictions will be short excerpts from the publications that appear to note a dataset.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova, Google AI Language arXiv:1810.04805v2 [cs.CL] 24 May 2019
Abstract We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.
トランスフォーマーからの双方向エンコーダー表現を表すBERTと呼ばれる新しい言語表現モデルを紹介します。 最近の言語表現モデル(Peters et al。、2018a; Radford et al。、2018)とは異なり、BERTは、すべてのレイヤーで左右両方のコンテキストを共同で調整することにより、ラベルのないテキストから深い双方向表現を事前トレーニングするように設計されています。 その結果、事前にトレーニングされたBERTモデルを1つの追加出力レイヤーで微調整して、質問応答や言語推論などの幅広いタスク用の最先端のモデルを作成できます。タスク固有のアーキテクチャを大幅に変更する必要はありません。by Googlr翻訳
A.3 Fine-tuning Procedure For fine-tuning, most model hyperparameters are the same as in pre-training, with the exception of the batch size, learning rate, and number of training epochs. The dropout probability was always kept at 0.1. The optimal hyperparameter values are task-specific, but we found the following range of possible values to work well across all tasks: • Batch size: 16, 32 • Learning rate (Adam): 5e-5, 3e-5, 2e-5 • Number of epochs: 2, 3, 4
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") >>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")
Extractive Question Answering:
>>> tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad") >>> model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
Language Modeling:
Language modeling is the task of fitting a model to a corpus, which can be domain specific. All popular transformer-based models are trained using a variant of language modeling, e.g. BERT with masked language modeling, GPT-2 with causal language modeling.
Language modeling can be useful outside of pretraining as well, for example to shift the model distribution to be domain-specific: using a language model trained over a very large corpus, and then fine-tuning it to a news dataset or on scientific papers e.g. LysandreJik/arxiv-nlp.
Masked Language Modeling:
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased") >>> model = AutoModelWithLMHead.from_pretrained("distilbert-base-cased")
Causal Language Modeling:
>>> tokenizer = AutoTokenizer.from_pretrained("gpt2") >>> model = AutoModelWithLMHead.from_pretrained("gpt2")
Text Generation:
>>> model = AutoModelWithLMHead.from_pretrained("xlnet-base-cased") >>> tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
Named Entity Recognition:
>>> model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english") >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
Summarization:
Summarization is the task of summarizing a document or an article into a shorter text. If you would like to fine-tune a model on a summarization task, you may leverage the run_summarization.py script.
>>> model = AutoModelWithLMHead.from_pretrained("t5-base") >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
Translation:
Translation is the task of translating a text from one language to another. If you would like to fine-tune a model on a translation task, you may leverage the run_translation.py script.
>>> model = AutoModelWithLMHead.from_pretrained("t5-base") >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> inputs = tokenizer.encode("translate English to German: Hugging Face is a technology company based in New York and Paris", return_tensors="pt") >>> outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)
>>> print(tokenizer.decode(outputs[0])) Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer, Facebook AI arXiv:1910.13461v1 [cs.CL] 29 Oct 2019
In this paper, we present BART, which pre-trains a model combining Bidirectional and Auto-Regressive Transformers. BART is a denoising autoencoder built with a sequence-to-sequence model that is applicable to a very wide range of end tasks. Pretraining has two stages (1) text is corrupted with an arbitrary noising function, and (2) a sequence-to-sequence model is learned to reconstruct the original text. BART uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes (see Figure 1).
Google Smartphone Decimeter Challenge Improve high precision GNSS positioning and navigation accuracy on smartphones
Global Navigation Satellite System (GNSS) provides raw signals, which the GPS chipset uses to compute a position. Current mobile phones only offer 3-5 meters of positioning accuracy. While useful in many cases, it can create a “jumpy” experience. For many use cases the results are not fine nor stable enough to be reliable.
In this competition, you'll use data collected from the host team’s own Android phones to compute location down to decimeter or even centimeter resolution, if possible. You'll have access to precise ground truth, raw GPS measurements, and assistance data from nearby GPS stations, in order to train and test your submissions.
Fast Kalman filters in Python leveraging single-instruction multiple-data vectorization. That is, running n similar Kalman filters on n independent series of observations. Not to be confused with SIMDprocessor instructions.
カルマンフィルターは、 離散的な誤差のある観測から、時々刻々と時間変化する量(例えばある物体の位置と速度)を推定するために用いられる。レーダーやコンピュータビジョンなど、工学分野で広く用いられる。例えば、カーナビゲーションでは、機器内蔵の加速度計や人工衛星からの誤差のある情報を統合して、時々刻々変化する自動車の位置を推定するのに応用されている。カルマンフィルターは、目標物の時間変化を支配する法則を活用して、目標物の位置を現在(フィルター)、未来(予測)、過去(内挿あるいは平滑化)に推定することができる。by Wikipedia
Berkeley SETI Research Center:942 teams, a month to go
Image Size vs Scoreというタイトルで情報のやりとりが行われている。画像解像度が高いほどスコアは高い傾向にある。コードコンペではないので、使える計算資源による差が現れやすい。What's your best single model?ここでも画像サイズが話題になり、大きな画像で良いスコアだが、Kaggle kernelでは動かないという話。最後には、アンサンブルの話。
MLB Player Digital Engagement Forecasting Predict fan engagement with baseball player digital content
engagementが何を意味しているのかが、わからない。
In this competition, you’ll predict how fans engage with MLB players’ digital content on a daily basis for a future date range. You’ll have access to player performance data, social media data, and team factors like market size. Successful models will provide new insights into what signals most strongly correlate with and influence engagement.
Google Smartphone Decimeter Challenge:474 teams, 2 months to go
Our team is currently using only post processing to improve the accuracy. We have found that the order of post processing changes the accuracy significantly, so we share the results.
ポストプロセスに関する手順と効果に関するDiscussionが行われている。
6月18日(金)
Berkeley SETI Research Center:1,001 teams, a month to go
There are just a few of us data scientists at Kaggle launching about 50 competitions a year with many different data types over a very wide range of domains. Worrying about leakage and other failure points keeps us up at night. We absolutely value our community's time and effort and know how important it is to have fun and challenging competitions.
As this competition is brought to you in collaboration with the launch of Vertex AI, we're providing GCP coupons for users to try out some of the great, powerful new resources made available through Vertex AI. This includes JupyterLab Notebooks, Explainable AI, hyperparameter tuning through Vizier, and countless other AI training and deployment tools.
Vertex AI brings AutoML and AI Platform together into a unified API, client library, and user interface. AutoML allows you to train models on image, tabular, text, and video datasets without writing code, while training in AI Platform lets you run custom training code. With Vertex AI, both AutoML training and custom training are available options. Whichever option you choose for training, you can save models, deploy models and request predictions with Vertex AI.
JupyterLab is a next-generation web-based user interface for Project Jupyter.
Notebooks enables you to create and manage virtual machine (VM) instances that are pre-packaged with JupyterLab.
Notebooks instances have a pre-installed suite of deep learning packages, including support for the TensorFlow and PyTorch frameworks. You can configure either CPU-only or GPU-enabled instances, to best suit your needs.
Your Notebooks instances are protected by Google Cloud authentication and authorization, and are available using a Notebooks instance URL. Notebooks instances also integrate with GitHub so that you can easily sync your notebook with a GitHub repository.
Notebooks saves you the difficulty of creating and configuring a Deep Learning virtual machine by providing verified, optimized, and tested images for your chosen framework.
Introduction to Vertex Explainable AI for Vertex AI
Images in the test set may contain more than one object.
For each object in a given test image, you must predict a class ID of "opacity", a confidence score, and bounding box in format xmin ymin xmax ymax.
If you predict that there are NO objects in a given image, you should predict none 1.0 0 0 1 1, where none is the class ID for "No finding", 1.0 is the confidence, and 0 0 1 1 is a one-pixel bounding box.
<雑談>AI, Machin Learning, Deep Learning, Data Science, Engineer or Scientist or Programmer, Kaggler:これらの単語からイメージされる領域で、自分が目指す方向を表現するのに適しているのは何か。AIは、漠然としていてつかみどころがない。Data Scienceは、自分の中では前処理のイメージが強い。Wikipediaで調べてみよう。
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data,[1][2] and apply knowledge and actionable insights from data across a broad range of application domains. Data science is related to data mining, machine learning and big data.
Data Scienceは、Machin Learning、Deep Learningだけでなく、あらゆる科学技術分野に対して、横断的に関連しているものと捉えることができるもののようである。
Data mining is a process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.[1] Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use.
勝手な解釈かもしれないが、data mining + big data = data scienceということにして、自分の現在および近未来の専門領域は、仮に、Data Scientistとしておこう。
AutoDS: Towards Human-Centered Automation of Data Science
Dakuo Wang et al., arXiv:2101.05273v1 [cs.HC] 13 Jan 2021
Abstract
Data science (DS) projects often follow a lifecycle that consists of laborious tasks for data scientists and domain experts (e.g., data exploration, model training, etc.). Only till recently, machine learning(ML) researchers have developed promising automation techniques to aid data workers in these tasks. This paper introduces AutoDS, an automated machine learning (AutoML) system that aims to leverage the latest ML automation techniques to support data science projects. Data workers only need to upload their dataset, then the system can automatically suggest ML configurations, preprocess data, select algorithm, and train the model. These suggestions are presented to the user via a web-based graphical user interface and a notebook-based programming user interface. We studied AutoDS with 30 professional data scientists, where one group used AutoDS, and the other did not, to complete a data science project. As expected, AutoDS improves productivity; Yet surprisingly, we find that the models produced by the AutoDS group have higher quality and less errors, but lower human confidence scores. We reflect on the findings by presenting design implications for incorporating automation techniques into human work in the data science lifecycle.
5.2. ImageNet Results for EfficientNet We train our EfficientNet models on ImageNet using similar settings as (Tan et al., 2019): RMSProp optimizer with decay 0.9 and momentum 0.9; batch norm momentum 0.99; weight decay 1e-5; initial learning rate 0.256 that decays by 0.97 every 2.4 epochs. We also use swish activation (Ramachandran et al., 2018; Elfwing et al., 2018), fixed AutoAugment policy (Cubuk et al., 2019), and stochastic depth (Huang et al., 2016) with drop connect ratio 0.3. As commonly known that bigger models need more regularization, we linearly increase dropout (Srivastava et al., 2014) ratio from 0.2 for EfficientNet-B0 to 0.5 for EfficientNet-B7.
AutoAugment: Learning Augmentation Strategies from Data
RandAugment: Practical automated data augmentation with a reduced search space
4.stochastic depth (Huang et al., 2016) with drop connect ratio 0.3
Stochastic depth aims to shrink the depth of a network during training, while keeping it unchanged during testing. We can achieve this goal by randomly dropping entire ResBlocks during training and bypassing their transformations through skip connections.
5.linearly increase dropout (Srivastava et al., 2014) ratio from 0.2 for EfficientNet-B0 to 0.5 for EfficientNet-B7
May13/2021: Initial code release for EfficientNetV2 models: accepted to ICML'21.
1. About EfficientNetV2 Models
EfficientNetV2 are a family of image classification models, which achieve better parameter efficiency and faster training speed than prior arts.
Built upon EfficientNetV1, our EfficientNetV2 models use neural architecture search (NAS) to jointly optimize model size and training speed, and are scaled up in a way for faster training and inference speed.
6月7日(月)
Society for Imaging Informatics in Medicine (SIIM):386 teams, 2 months to go
Abstract In this paper, we address the problem of image captioning specifically for molecular translation where the result would be a predicted chemical notation in InChI format for a given molecular structure. Current approaches mainly follow rule-based or CNN+RNN based methodology. However, they seem to underperform on noisy images and images with small number of distinguishable features. To overcome this, we propose an end-to-end transformer model. When compared to attention-based techniques, our proposed model outperforms on molecular datasets.
分子構造は画像として与えられる。その画質は、新しい教科書に掲載されているような鮮明な画像ではなく、コピーを繰り返して不鮮明になった画像である。O, N, P, Sなどの元素記号は不鮮明であり、1重結合と2重結合が見分けにくいものがあり、立体構造を表す結合なども不鮮明なものがある。斑点状のノイズがのっている。
Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI,
Noel M O’Boyle, Journal of Cheminformatics 2012, 4:22
Figure 1 An overview of the steps involved in generating Universal and Inchified SMILES. The normalisation step just applies to Inchified SMILES. To simplify the diagram a Standard InChI is shown, but in practice a non-standard InChI (options FixedH and RecMet) is used for Universal SMILES.
Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI, Noel M O’Boyle, Journal of Cheminformatics 2012, 4:22
韓国のコンペで使われたSMILESは、2012年の時点では、最もポピュラーな1行表記方法”The SMILES format is the most popular line notation in use today."とのことであるが、課題は、立体構造の表現が困難であることの他に、標準化されていないことだということで、InChIをベースに標準化を提案しているのがこの論文の内容となっている。SMILESの標準化が進まなかった原因として、提案された方法が立体構造に対応していない、開発品の専売化、フリーソフトは互いに互換性が無く出版もされなかったことなどがあげられている。
1999年にNIST(National Institute of Standards and Technology)において、分子の新しい1行表記方法の開発がすすめられ、InChIが国際標準として提案されたようである。
この論文では、このInChIからの標準的なラベルを用いて、標準的なSMILESを生成するという方法をとっている。オープンソースの様々なケモインフォマティクスライブラリー(Open Babel, Chemistry Development Kit, RDKit, Chemkit, Indigoなど)に含まれるコードを用いることができるようである。
The following commands show how to use the obabel command-line program to generate Universal and Inchified SMILES strings for a structure stored in a Mol file: C:\>obabel figure1.mol -osmi –xU c1cc(/C=C/F)cc(c1)[N+](=O)[O-] C:\>obabel figure1.mol -osmi –xI c1cc(/C=C/F)cc(c1)N(=O)=O
昨年の9月に、たくさんのコンペに参加した中に、Halite by Two Sigma, Collect the most halite during your match in spaceというのがあった。Reinforcement Learningを試すコンペのようなので、ここでReinforcement Learningを学ぼうと思った。しかし、同時期にいくつかのコンペにも参加していたので、各コンペに避ける時間があまりにも少なく、結局、RLの学習もスコアも中途半端で終わった。このコンペのことを思い出してそのサイトに行って、トップチーム(個人)の解説をざっと読んで驚いた。その方は、なんと、Reinforcement Learningを開発したDeep Mindにおられて熟知されていたようである。当然のことながらプログラミングレベルもはかり知れないものだろうと思う。RLがトップを狙うには不十分ということで、従来型のプログラミングで勝負したとのことである。なんと11,000行、とのこと。Reinforcement Learning自体がだめということではなく、学習時間が足りないということのようである。とてもまねできないなと思ったのは、対戦状況を観察して相手の戦術を読み取ってそれを凌駕する戦術を考えてプログラミングしたことである。
DEEP REINFORCEMENT LEARNING Yuxi Li (yuxili@gmail.com), arXiv:1810.06339v1 [cs.LG] 15 Oct 2018
ABSTRACT We discuss deep reinforcement learning in an overview style. We draw a big picture, filled with details. We discuss six core elements, six important mechanisms, and twelve applications, focusing on contemporary work, and in historical contexts. We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources. Next we discuss RL core elements, including value function, policy, reward, model, exploration vs. exploitation, and representation. Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multiagent RL, relational RL, and learning to learn. After that, we discuss RL applications, including games, robotics, natural language processing (NLP), computer vision, finance, business management, healthcare, education, energy, transportation, computer systems, and, science, engineering, and art. Finally we summarize briefly, discuss challenges and opportunities, and close with an epilogue.
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems Sergey Levine, Aviral Kumar, George Tucker, Justin Fu arXiv:2005.01643v3 [cs.LG] 1 Nov 2020
Abstract In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize previously collected data, without additional online data collection. Offline reinforcement learning algorithms hold tremendous promise for making it possible to turn large datasets into powerful decision making engines. Effective offline reinforcement learning methods would be able to extract policies with the maximum possible utility out of the available data, thereby allowing automation of a wide range of decision-making domains, from healthcare and education to robotics. However, the limitations of current algorithms make this difficult. We will aim to provide the reader with an understanding of these challenges, particularly in the context of modern deep reinforcement learning methods, and describe some potential solutions that have been explored in recent work to mitigate these challenges, along with recent applications, and a discussion of perspectives on open problems in the field.
World Models David Ha and Jurgen Schmidhuber, arXiv:1803.10122v4 [cs.LG] 9 May 2018
Abstract We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment.
Humans develop a mental model of the world based on what they are able to perceive with their limited senses. The decisions and actions we make are based on this internal model. Jay Wright Forrester, the father of system dynamics, described a mental model as: The image of the world around us, which we carry in our head, is just a model. Nobody in his head imagines all the world, government or country. He has only selected concepts, and relationships between them, and uses those to represent the real system. (Forrester, 1971)
To handle the vast amount of information that flows through our daily lives, our brain learns an abstract representation of both spatial and temporal aspects of this information. We are able to observe a scene and remember an abstract description thereof. Evidence also suggests that what we perceive at any given moment is governed by our brain’s prediction of the future based on our internal model.
Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore,
Journal of Artificial Intelligence Research 4 (1996) 237-285
Abstract This paper surveys the field of reinforcement learning from a computer-science per- spective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word \reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Deep Reinforcement Learning for Autonomous Driving: A Survey B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yogamani, and Patrick Pérez, arXiv:2002.00444v2 [cs.LG] 23 Jan 2021
Abstract—With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed. Index Terms—Deep reinforcement learning, Autonomous driving, Imitation learning, Inverse reinforcement learning, Controller learning, Trajectory optimisation, Motion planning, Safe reinforcement learning.
The main contributions of this work can be summarized as follows: ・Self-contained overview of RL background for the automotive community as it is not well known. ・Detailed literature review of using RL for different autonomous driving tasks. ・Discussion of the key challenges and opportunities for RL applied to real world autonomous driving. The rest of the paper is organized as follows.
Section II provides an overview of components of a typical autonomous driving system.
Section III provides an introduction to reinforcement learning and briefly discusses key concepts.
Section IV discusses more sophisticated extensions on top of the basic RL framework.
Section V provides an overview of RL applications for autonomous driving problems.
Section VI discusses challenges in deploying RL for real-world autonomous driving systems.
Section VII concludes this paper with some final remarks.
この表のキャプションは、OPEN-SOURCE FRAMEWORKS AND PACKAGES FOR STATE OF THE ART RL/DRL ALGORITHMS AND EVALUATION.(最先端のRL / DRLアルゴリズムと評価のためのオープンソースフレームワークとパッケージ。by Google翻訳)
Behaviour Suite for Reinforcement Learning Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, and Hado Van Hasselt, arXiv:1908.03568v3 [cs.LG] 14 Feb 2020 Abstract This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to study agent behaviour through their performance on these shared benchmarks. To complement this effort, we open source github.com/deepmind/bsuite, which automates evaluation and analysis of any agent on bsuite. This library facilitates reproducible and accessible research on the core issues in RL, and ultimately the design of superior learning algorithms. Our code is Python, and easy to use within existing projects. We include examples with OpenAI Baselines, Dopamine as well as new reference implementations. Going forward, we hope to incorporate more excellent experiments from the research community, and commit to a periodic review of bsuite from a committee of prominent researchers. このホワイトペーパーでは、強化学習のためのBehavior Suite、または略してbsuiteを紹介します。 bsuiteは、2つの目的を持つ強化学習(RL)エージェントのコア機能を調査する慎重に設計された実験のコレクションです。まず、一般的で効率的な学習アルゴリズムの設計における重要な問題を捉えた、明確で有益でスケーラブルな問題を収集します。次に、これらの共有ベンチマークでのパフォーマンスを通じてエージェントの動作を調査します。この取り組みを補完するために、私たちはオープンソース github.com/deepmind/bsuite。bsuite上のエージェントの評価と分析を自動化します。このライブラリは、RLの主要な問題に関する再現性のあるアクセス可能な研究を促進し、最終的には優れた学習アルゴリズムの設計を促進します。私たちのコードはPythonであり、既存のプロジェクト内で簡単に使用できます。 OpenAIベースライン、ドーパミン、および新しいリファレンス実装の例が含まれています。今後は、研究コミュニティからのより優れた実験を取り入れ、著名な研究者の委員会による定期的なbsuiteのレビューに取り組んでいきたいと考えています。by Google翻訳
Interest in artificial intelligence has undergone a resurgence in recent years. Part of this interest is driven by the constant stream of innovation and success on high profile challenges previously deemed impossible for computer systems. Improvements in image recognition are a clear example of these accomplishments, progressing from individual digit recognition (LeCun et al., 1998), to mastering ImageNet in only a few years (Deng et al., 2009; Krizhevsky et al., 2012). The advances in RL systems have been similarly impressive: from checkers (Samuel, 1959), to Backgammon (Tesauro, 1995), to Atari games (Mnih et al., 2015a), to competing with professional players at DOTA (Pachocki et al., 2019) or StarCraft (Vinyals et al., 2019) and beating world champions at Go (Silver et al., 2016). Outside of playing games, decision systems are increasingly guided by AI systems (Evans & Gao, 2016).
近年、人工知能への関心が復活しています。 この関心の一部は、以前はコンピュータシステムでは不可能と考えられていた注目を集める課題に対する革新と成功の絶え間ない流れによって推進されています。 画像認識の改善は、これらの成果の明確な例であり、個々の数字の認識(LeCun et al, 1998)からわずか数年でImageNetを習得する(Deng et al, 2009; Krizhevsky et al, 2012)まで進んでいます。 RLシステムの進歩も同様に印象的でした。チェッカー(Samuel, 1959)、バックギャモン(Tesauro, 1995)、アタリゲーム(Mnih et al, 2015a)、DOTAでのプロプレーヤーとの競争(Pachocki et al, 2019)またはStarCraft(Vinyals et al, 2019)およびGo(Silver et al, 2016)で世界チャンピオンを破っています。 ゲームをプレイする以外に、意思決定システムはますますAIシステムによって導かれています(Evans&Gao, 2016年)。by Google翻訳
As we look towards the next great challenges for RL and AI, we need to understand our systems better (Henderson et al., 2017). This includes the scalability of our RL algorithms, the environments where we expect them to perform well, and the key issues outstanding in the design of a general intelligence system. We have the existence proof that a single self-learning RL agent can master the game of Go purely from self-play (Silver et al., 2018). We do not have a clear picture of whether such a learning algorithm will perform well at driving a car, or managing a power plant. If we want to take the next leaps forward, we need to continue to enhance our understanding.
RLとAIの次の大きな課題に目を向けるとき、システムをよりよく理解する必要があります(Henderson et al, 2017)。 これには、RLアルゴリズムのスケーラビリティ、それらが適切に機能すると予想される環境、および未解決の主要な問題が含まれます。 一般的なインテリジェンスシステムの設計において。 単一の自己学習RLエージェントが純粋に自己プレイから囲碁のゲームを習得できるという存在証明があります(Silver et al, 2018)。 そのような学習アルゴリズムが車の運転や発電所の管理でうまく機能するかどうかについては、明確な見通しがありません。 次の飛躍を遂げたいのであれば、理解を深めていく必要があります。 by Google翻訳
1.1 Practical theory often lags practical algorithms:
The current theory of deep RL is still in its infancy. In the absence of a comprehensive theory, the community needs principled benchmarks that help to develop an understanding of the strengths and weakenesses of our algorithms.
Just as the MNIST dataset offers a clean, sanitised, test of image recognition as a stepping stone to advanced computer vision; so too bsuite aims to instantiate targeted experiments for the development of key RL capabilities.
1.3 Open source code, reproducible research
As part of this project we open source github.com/deepmind/bsuite, which instantiates all experiments in code and automates the evaluation and analysis of any RL agent on bsuite. This library serves to facilitate reproducible and accessible research on the core issues in reinforcement learning.
1.4 Related work
2 Experiments
2.1 Example experiment: memory length
読んで理解しようとしたが、課題が何なのか、次の語句が何なのか(DQNは見たことがあるという程度)、さっぱりわからない。 actor-critic with a recurrent neural network
In this chapter we will first explain what Reinforcement Learning is and what it's good at, then present two of the most important techniques in Deep Reinforcement Learning: polycy gradients and deep Q-networks (DQNs), including a discussion of Markov decision processes (MDPs).
Floris den Hengst et al., Data Science 3 (2020) 107–147
Abstract.
The major application areas of reinforcement learning (RL) have traditionally been game playing and continuous control. In recent years, however, RL has been increasingly applied in systems that interact with humans. RL can personalize digital systems to make them more relevant to individual users. Challenges in personalization settings may be different from challenges found in traditional application areas of RL. An overview of work that uses RL for personalization, however, is lacking. In this work, we introduce a framework of personalization settings and use it in a systematic literature review. Besides setting, we review solutions and evaluation strategies. Results show that RL has been increasingly applied to personalization problems and realistic evaluations have become more prevalent. RL has become sufficiently robust to apply in contexts that involve humans and the field as a whole is growing. However, it seems not to be maturing: the ratios of studies that include a comparison or a realistic evaluation are not showing upward trends and the vast majority of algorithms are used only once. This review can be used to find related work across domains, provides insights into the state of the field and identifies opportunities for future work. 強化学習(RL)の主な応用分野は、伝統的にゲームプレイと継続的な制御でした。しかし、近年、RLは人間と相互作用するシステムにますます適用されています。 RLは、デジタルシステムをパーソナライズして、個々のユーザーとの関連性を高めることができます。パーソナライズ設定の課題は、RLの従来のアプリケーション分野で見られる課題とは異なる場合があります。ただし、パーソナライズにRLを使用する作業の概要は不足しています。この作業では、パーソナライズ設定のフレームワークを紹介し、系統的文献レビューで使用します。設定に加えて、ソリューションと評価戦略を確認します。結果は、RLがパーソナライズの問題にますます適用され、現実的な評価がより一般的になっていることを示しています。 RLは、人間が関与するコンテキストに適用するのに十分な堅牢性を備えており、フィールド全体が成長しています。ただし、成熟していないようです。比較または現実的な評価を含む研究の比率は上昇傾向を示しておらず、アルゴリズムの大部分は1回しか使用されていません。このレビューは、ドメイン間で関連する作業を見つけるために使用でき、フィールドの状態への洞察を提供し、将来の作業の機会を特定します。by Google翻訳
Jess Whittlestone et al., Journal of Artificial Intelligence Research 70 (2021) 1003–1030
Abstract Deep Reinforcement Learning (DRL) is an avenue of research in Artificial Intelligence (AI) that has received increasing attention within the research community in recent years, and is beginning to show potential for real-world application. DRL is one of the most promising routes towards developing more autonomous AI systems that interact with and take actions in complex real-world environments, and can more flexibly solve a range of problems for which we may not be able to precisely specify a correct ‘answer’. This could have substantial implications for people’s lives: for example by speeding up automation in various sectors, changing the nature and potential harms of online influence, or introducing new safety risks in physical infrastructure. In this paper, we review recent progress in DRL, discuss how this may introduce novel and pressing issues for society, ethics, and governance, and highlight important avenues for future research to better understand DRL’s societal implications.
Deep Reinforcement Learning(DRL)は、近年研究コミュニティ内でますます注目を集めている人工知能(AI)の研究手段であり、実際のアプリケーションの可能性を示し始めています。 DRLは、複雑な実世界の環境と相互作用してアクションを実行する、より自律的なAIシステムを開発するための最も有望なルートのひとつであり、正しい答えを正確に特定できない可能性のあるさまざまな問題をより柔軟に解決できます。これは、人々の生活に大きな影響を与える可能性があります。たとえば、さまざまなセクターでの自動化の高速化、オンラインの影響の性質と潜在的な害の変化、物理インフラストラクチャへの新しい安全上のリスクの導入などです。このホワイトペーパーでは、DRLの最近の進歩を確認し、これが社会、倫理、ガバナンスに斬新で差し迫った問題をどのようにもたらすかについて説明し、DRLの社会的影響をよりよく理解するための将来の研究のための重要な手段を強調します。by Google翻訳
Wikipedia : Personalization (broadly known as customization) consists of tailoring a service or a product to accommodate specific individuals, sometimes tied to groups or segments of individuals. A wide variety of organizations use personalization to improve customer satisfaction, digital sales conversion, marketing results, branding, and improved website metrics as well as for advertising. Personalization is a key element in social media and recommender systems.
systematic literature review (SLR):この論文が、Reinforcement learning for personalizationの内容の論文をレビューしているのだと思ったが、SLRにRLを適用し、personalizationにLRを適用した文献の調査を行ったということなのだろうか。
Our discussion aims to provide important context and a clear starting point for the AI ethics and governance community to begin considering the societal implications of DRL in more depth.
The algorithm a software agent uses to determine its actions is called its policy. The policy could be a neural network taking observations as inputs and outputting the action to take (see Figure 18-2).
Figure 18-2. Reinforcement Learning using a neural network polycy:この図は何度も見ていたのだが、今日、ようやくこの模式図の意味するところがわかったように思う。
Structure prediction of surface reconstructions by deep reinforcement learning Søren A Meldgaard, Henrik L Mortensen, Mathias S Jørgensen and Bjørk Hammer
Abstract We demonstrate how image recognition and reinforcement learning combined may be used to determine the atomistic structure of reconstructed crystalline surfaces. A deep neural network represents a reinforcement learning agent that obtains training rewards by interacting with an environment. The environment contains a quantum mechanical potential energy evaluator in the form of a density functional theory program. The agent handles the 3D atomistic structure as a series of stacked 2D images and outputs the next atom type to place and the atomic site to occupy. Agents are seen to require 1000–10 000 single point density functional theory evaluations, to learn by themselves how to build the optimal surface reconstructions of anatase TiO2(001)-(1 × 4) and rutile SnO2(110)-(4 × 1).
Gaussian representation for image recognition and reinforcement learning of atomistic structure
Mads-Peter V Christiansen, Henrik Lund Mortensen, Søren Ager Meldgaard, and Bjørk Hammer, J Chem Phys. 2020 Jul 28;153(4):044107
Abstract The success of applying machine learning to speed up structure search and improve property prediction in computational chemical physics depends critically on the representation chosen for the atomistic structure. In this work, we investigate how different image representations of two planar atomistic structures (ideal graphene and graphene with a grain boundary region) influence the ability of a reinforcement learning algorithm [the Atomistic Structure Learning Algorithm (ASLA)] to identify the structures from no prior knowledge while interacting with an electronic structure program. Compared to a one-hot encoding, we find a radial Gaussian broadening of the atomic position to be beneficial for the reinforcement learning process, which may even identify the Gaussians with the most favorable broadening hyperparameters during the structural search. Providing further image representations with angular information inspired by the smooth overlap of atomic positions method, however, is not found to cause further speedup of ASLA.
Predictive Synthesis of Quantum Materials by Probabilistic Reinforcement Learning Pankaj Rajak, Aravind Krishnamoorthy, Ankit Mishra, Rajiv Kalia, Aiichiro Nakano and Priya Vashishta, arXiv.org > cond-mat > arXiv:2009.06739v1, [Submitted on 14 Sep 2020]
Abstract Predictive materials synthesis is the primary bottleneck in realizing new functional and quantum materials. Strategies for synthesis of promising materials are currently identified by time consuming trial and error approaches and there are no known predictive schemes to design synthesis parameters for new materials. We use reinforcement learning to predict optimal synthesis schedules, i.e. a time-sequence of reaction conditions like temperatures and reactant concentrations, for the synthesis of a prototypical quantum material, semiconducting monolayer MoS2, using chemical vapor deposition. The predictive reinforcement leaning agent is coupled to a deep generative model to capture the crystallinity and phase-composition of synthesized MoS2 during CVD synthesis as a function of time-dependent synthesis conditions. This model, trained on 10000 computational synthesis simulations, successfully learned threshold temperatures and chemical potentials for the onset of chemical reactions and predicted new synthesis schedules for producing well-sulfidized crystalline and phase-pure MoS2, which were validated by computational synthesis simulations. The model can be extended to predict profiles for synthesis of complex structures including multi-phase heterostructures and can also predict long-time behavior of reacting systems, far beyond the domain of the MD simulations used to train the model, making these predictions directly relevant to experimental synthesis.
Learning to grow: control of material self-assembly using evolutionary reinforcement learning Stephen Whitelam and Isaac Tamblyn, arXiv:1912.08333v3 [cond-mat.stat-mech] 28 May 2020
We show that neural networks trained by evolutionary reinforcement learning can enact efficient molecular self-assembly protocols. Presented with molecular simulation trajectories, networks learn to change temperature and chemical potential in order to promote the assembly of desired structures or choose between competing polymorphs. In the first case, networks reproduce in a qualitative sense the results of previously-known protocols, but faster and with higher fidelity; in the second case they identify strategies previously unknown, from which we can extract physical insight. Networks that take as input the elapsed time of the simulation or microscopic information from the system are both effective, the latter more so. The evolutionary scheme we have used is simple to implement and can be applied to a broad range of examples of experimental self-assembly, whether or not one can monitor the experiment as it proceeds. Our results have been achieved with no human input beyond the specification of which order parameter to promote, pointing the way to the design of synthesis protocols by artificial intelligence.
Generative Adversarial Networks for Crystal Structure Prediction Sungwon Kim, Juhwan Noh, Geun Ho Gu, Alan Aspuru-Guzik, and Yousung Jung ACS Cent. Sci. 2020, 6, 1412−1420
ABSTRACT: The constant demand for novel functional materials calls for efficient strategies to accelerate the materials discovery, and crystal structure prediction is one of the most fundamental tasks along that direction. In addressing this challenge, generative models can offer new opportunities since they allow for the continuous navigation of chemical space via latent spaces. In this work, we employ a crystal representation that is inversion-free based on unit cell and fractional atomic coordinates and build a generative adversarial network for crystal structures. The proposed model is applied to generate the Mg−Mn−O ternary materials with the theoretical evaluation of their photoanode properties for high-throughput virtual screening (HTVS). The proposed generative HTVS framework predicts 23 new crystal structures with reasonable calculated stability and band gap. These findings suggest that the generative model can be an effective way to explore hidden portions of the chemical space, an area that is usually unreachable when conventional substitutionbased discovery is employed.
agentのトレーニングには、working environmentが必要。そのためのsimulated environmentを提供するのがOpenAI Gym (Atari games, board games, 2D and 3D physical simulations, and so on)。
obsは1D NumPy arrayで、4つの数字が、obs[0]: cart's horizontal position (0.0 = center), obs[1]: its velocity (positive means right), obs[2]: the angle of the pole (0.0 = vertical), and obs[3]: its angular velocity (positive means clockwise)の順に並んでいる。
The step( ) method exacutes the given action and returns four values:
obs:
This is the new observation. The cart is now moving toward the right (obs[1] > 0). The pole is still tilted toward the right (obs[2] >0), but its angular velosity is now negative (obs[3] <0), so it will likely be tilted toward the left after the next step.
reward:
In this environment, you get a reward of 1.0 at every step, no matter what you do, so that the goal is to keep the episode running as long as possible.
done:
This value will be True when the episode is over. This will happen when the pole tilts too much, or goes off the screen, or after 200 steps (in this last case, you have won). After that, the environment must be reset before it can be used again.
info:
This environment-specific dictionary can provide some extra information that you may find useful for debugging or for training. For example, in some games it may indicate how many lives the agent has.
Fairseq is a sequence modeling toolkit written in PyTorch that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.
Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.
pre-norm activation transformer:
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen and Julian Salazar, arXiv:1910.05895v2 [cs.CL] 30 Dec 2019
Abstract We evaluate three simple, normalizationcentric changes to improve Transformer training. First, we show that pre-norm residual connections (PRENORM) and smaller initializations enable warmup-free, validation-based training with large learning rates. Second, we propose `2 normalization with a single scale parameter (SCALENORM) for faster training and better performance. Finally, we reaffirm the effectiveness of normalizing word embeddings to a fixed length (FIXNORM). On five low-resource translation pairs from TED Talks-based corpora, these changes always converge, giving an average +1.1 BLEU over state-of-the-art bilingual baselines and a new 32.8 BLEU on IWSLT '15 EnglishVietnamese. We observe sharper performance curves, more consistent gradient norms, and a linear relationship between activation scaling and decoder depth. Surprisingly, in the highresource setting (WMT '14 English-German), SCALENORM and FIXNORM remain competitive but PRENORM degrades performance.
Show and Tell: A Neural Image Caption Generator Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan arXiv:1411.4555v2 [cs.CV] 20 Apr 2015
Abstract Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU-1 score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66, and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we achieve a BLEU-4 of 27.7, which is the current state-of-the-art.
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Kelvin Xu et al., arXiv:1502.03044v3 [cs.LG] 19 Apr 2016
Abstract Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to automatically learn to fix its gaze on salient objects while generating the corresponding words in the output sequence. We validate the use of attention with state-of-theart performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.
機械翻訳とオブジェクト検出の最近の研究に触発されて、画像の内容を説明することを自動的に学習する注意ベースのモデルを紹介します。 標準的なバックプロパゲーション手法を使用して、変分下限を最大化することにより確率的にこのモデルを決定論的にトレーニングする方法について説明します。 また、視覚化を通じて、出力シーケンスで対応する単語を生成しながら、モデルが顕著なオブジェクトを注視することを自動的に学習する方法を示します。 Flickr8k、Flickr30k、MS COCOの3つのベンチマークデータセットで、最新のパフォーマンスを使用してattentionの使用を検証します。 by Google翻訳
If we knew what the best action was at each step, we could train the neural network as usual, by minimizing the cross entropy between the estimated probability distribution and the target probability distribution.
It would just be regular supervised learning.
However, in Reinforcement Learning the only guidance the agent gets is through rewards, and rewards are typically sparce and delayed.
For example, if the agent manages to balance the pole for 100 steps, how can it know which of the 100 actions it took were good, and which of them were bad?
All it knows is that the pole fell after the last action, but surely this last action is not entirely responsible.
This is called the credit assignment problem: when the agent gets a reward, it is hard for it to know which actions should get credited (or blamed) for it.
Think of a dog that gets rewarded hours after it behaved well: will it understand what it is being rewarded for?
To tackle this problem, a common strategy is to evaluate an action based on the sum of all the rewards that come after it, usually applying a discount factor ɤ (gamma) at each step. discount gactorの意味、役割がわからんな。
This sum of discounted rewards is called the action's return. ここでのactionは、1 stepではなく、一連のstepの集合体を指しているようだ。
consider the example in Figure 18-6.
If an agent decides to go right three times in a row and gets +10 reward after the first step, 0 after the second step, and finally -50 after the third step, then assume we use a discount factor ɤ=0.8, the first action will have a return of 10 + ɤ x 0 + ɤ^2 x (-50) = -22.
If the discount factor is close to 0, then future rewards won't count for much compared to immediate rewards.
Conversely, if the discount factor is close to 1, then rewards far into the future will count almost as much as immediate rewards.
Typical discount factors vary from 0.9 to 0.99.
With a discount factor of 0.95, rewards 13 steps into the future count roughly for half as much as immediate rewards (since 0.95^13≒0.5), while with a discount factor of 0.99, rewards 69 steps into the future count for half as much as immediate rewards.
In the CartPole environment, actions have fairly short-term effects, so choosing a discount factor of 0.95 seems reasonable.
Figure 18-6. Computing an action's return: the sum of discounted future rewards
Of cource, a good action may be followed by several bad actions that cause the pole to fall quickly, resulting in the good action getting a low return (similarly, a good actor may sometimes star in a terrible movie).
However, if we play the game enough times, on average good actions will get a higher return than bad ones.
We want to estimate how much better or worse an action is, compared to the other possible actions, on average.
This is called the action average.
For this, we must run many episodes and normalize all the action returns (by subtracting the mean and dividing by the standard deviation).
After that, we can reasonably assume that actions with a negative advantage were bad while actions with a positive advantage were good.
Perfect - now that we have a way to evaluate each action, we are ready to train our first agent using polycy gradients.
Let's see how.
Policy Gradients
As discusses earlier, PG algorithms optimize the parameters of a polycy by following the gradients toward higher rewards.
One popular class of GP algorithms, called REINFORCE algorithms, was introduced back in 1992 (https://homl.info/132) by Ronald Williams.
そのような背景から、現在ではRNNを取り除く研究(もしくは並列計算可能なRNNの研究)が活発に行われています。これは『Attention is all you need』というタイトルの論文で提案された手法です。そのタイトルが示すとおり、RNNではなくAttentionを使って処理します。ここでは、このTransformerについて簡単に見ていきます。
Choosing an appropriate image size is difficult, as complex molecules will need a high resolution to preserve details, but training on 2.4 million in high resolution is unfeasable. The chosen resolution of 256*448 should preserve enough detail and allow for training on a TPU within the 3 hours limit.
A. Geron氏のテキストには、Glorot and He initialization, LeCun initialization, Xavier initializationなどが紹介されている。今使っているSegmentation Modelsでは、これらの initializationには対応していないようである。
random initialization(epochs=30):LB=0.819 :意外に健闘しているようにみえる。
ここまで、"An activation function to apply after the final convolution layer. "をNoneにして、 score > 0、で評価してきたが、activation functionを"sigmoid"にして、score > thresholdとして、thresholdの最適化をやってみる。
ちょっとやってみたが、うまくいかない。何かおかしい。間違ったことをしているかもしれない・・・。
best single modelというタイトルのdiscussionにおいて、LB=0.84+から0.86+くらいのスコアのモデル(encoder+decoder)、optimizer、loss function、epochsなど概要が紹介されている。自分の結果、0.83+とは最大で0.03くらい違う。前処理、後処理などの情報が少ないので何を参考にすればよいのかわからないが、気になるのは、スコアが高い0.85+の4件が、EncoderにEfficientNetを使っていないことと、開発年代の古いEncoderを用いて、エポック数を多くしていることである。さらに、resnet34-UnetでLB=0.857というスコアを得ているのは驚きだ。FPNは1件で、そのほかはUnetを使っている。
Googleが昨年発表した論文"EfficientDet: Scalable and Efficient Object Detection"で提案している新しいモデルEfficientDetは、FPNから派生したもののようである"we propose a weighted bi-directional feature pyramid network (BiFPN)"。
Bristol-Myers Squibb – Molecular Translation Can you translate chemical images to text?
chemical imagesというのは構造式、textというのは、InChIのこと。構造式は画像として与えられる。InCHIについて、wikipediaの説明は以下の通り。
InChI(International Chemical Identifier)は、標準的かつ人間が読める方法で分子情報を提供し、またウェブ上でのデータベースからの情報の検索機能を提供する。元々、2000年から2005年にIUPACとNISTによって開発され、フォーマットとアルゴリズムは非営利であり、開発の継続は、IUPACも参画する非営利団体のInChI Trustにより、2010年までサポートされていた。現在の1.04版は、2011年9月にリリースされた。
1.04版の前までは、ソフトウェアはオープンソースのGNU Lesser General Public Licenseで無償で入手できたが[3]、現在は、IUPAC-InChI Trust Licenseと呼ばれる固有のライセンスとなっている[4]。ウイキペディアより引用。
Deep Semantic Segmentation of Natural and Medical Images: A Review Saeid Asgari Taghanaki, Kumar Abhishek, Joseph Paul Cohen, Julien Cohen-Adad and Ghassan Hamarneh, arXiv:1910.07655v3 [cs.CV] 3 Jun 2020,
Cross Entropy --> Weighted Cross Entropy --> Focal Loss (Lin et al. (2017b) added the term (1 − pˆ)γ to the cross entropy loss)
Overlap Measure based Loss Functions:Dice Loss / F1 Score --> Tversky Loss(Tversky loss (TL) (Salehi et al., 2017) is a generalization of the DL(Dice Loss) --> Exponential Logarithmic Loss --> Lovasz-Softmax loss(a smooth extension of the discrete Jaccard loss(IoU loss))
上記のような種々の損失関数の解説の最後に次のように書かれている。
The loss functions which use cross-entropy as the base and the overlap measure functions as a weighted regularizer show more stability during training.
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam Google Inc. ECCV 2018
Abstract.
Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information.
In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.
Microsoft COCO: Common Objects in Context Tsung-Yi Lin Michael Maire Serge Belongie Lubomir Bourdev Ross Girshick James Hays Pietro Perona Deva Ramanan C. Lawrence Zitnick Piotr Dollar´
arXiv:1405.0312v3 [cs.CV] 21 Feb 2015 Abstract—We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complexeveryday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.
Panoptic segmentation aims to unify the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). Existing metrics are specialized for either semantic or instance segmentation and cannot be used to evaluate the joint task involving both stuff and thing classes. Rather than using a heuristic combination of disjoint metrics for the two tasks, the panoptic task introduces a new Panoptic Quality (PQ) metric. PQ evaluates performance for all categories, including both stuff and thing categories, in a unified manner.
Panoptic Leaderboardのトップのチームのpaper
Joint COCO and Mapillary Workshop at ICCV 2019: Panoptic Segmentation Challenge Track Technical Report: Explore Context Relation for Panoptic Segmentation Shen Wang1,2∗ Tao Liu1,3∗ Huanyu Liu1∗ Yuchen Ma1 Zeming Li1 Zhicheng Wang1 Xinyu Zhou1 Gang Yu1 Erjin Zhou1 Xiangyu Zhang1 Jian Sun
次は、損失関数だが、上記のReviewには、「The loss functions which use cross-entropy as the base and the overlap measure functions as a weighted regularizer show more stability during training.」と書かれていた。今使っているものでいえば、overlap系はDice, Jaccard, Lovaszとなる。試してみるしかないのだろう。MAnetの論文では、BCEとDiceの比率は1対4であったように思う。Diceをベースに、BCEを加えている。LB=0.836を得ているのは全て、DiceLossのみなので、手始めに、MAnetの論文をまねてみるのもよいかもしれない。
We've got a new private test set incoming. Some of the old test samples will be moved to train. The process is currently underway and may take some time.Submissions are disabled while this is taking place. We will notify you when the update is complete. Thank you for your patience!
UNet_3Plus.pyには、UNet_3Plusの他に、Unet_3Plus_DeepSupとUnet_3Plus_DeepSup_CGMが含まれている。それぞれ、ベースコード、with deep supervision、with deep supervision and class-guided moduleであり、論文中で追加機能として説明されているものである。
import torch import torch.nn as nn import torch.nn.functional as F from layers import unetConv2, unetUp, unetUp_origin from init_weights import init_weights from torchvision import models import numpy as np
UNet.pyをnotebookにコピペして走らせてみると、次のメッセージが出て停止した。
ModuleNotFoundError: No module named 'layers'
from layers import unetConv2, unetUp, unetUp_origin:ここでひっかかっている。
上記のLB=0.916の公開コードでは、loss=bce_dice or bce_jaccard, となっているが、コードをよく見ると、bce_weight=1となっているので、BCE一択のようである。これは、自分の最近の経験と一致する。すなわち、DiceLossとJaccardLossは、activationを"sigmoid"にすると、自分が使っているコードでは、全く機能しない。
Furthermore, there is a certain elegance to the rhythm of these cycles and it simplifies the decision of when to drop learning rates and when to stop the current training run. Experiments show that replacing each step of a constant learning rate with at least 3 cycles trains the network weights most of the way and running for 4 or more cycles will achieve even better performance. Also, it is best to stop training at the end of a cycle, which is when the learning rate is at the minimum value and the accuracy peaks.
Leslie N. Smith氏の論文の表を見ると、エポック数を25、50、75、100、150と増やせば、確実に、スコアは上がっている。データ量が多い場合はこういう傾向になるのだろう。他方で、データ量が少ないときには、20エポックくらいで、CyclicLRやOneCycleLRが良い結果を与えることを、実験して確かめているようだ。
scale_fn (function) – Custom scaling policy defined by a single argument lambda function, where 0 <= scale_fn(x) <= 1 for all x >= 0. If specified, then ‘mode’ is ignored. Default: None
非常に気になったので、"super convergence deep learning" で検索したら、Super-Convergenceを理論的に解明した(らしい)論文があった。
Super-Convergence with an Unstable Learning Rate Samet Oymak∗, arXiv:2102.10734v1 [cs.LG] 22 Feb 2021 Abstract Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don’t blow up. This note introduces a simple scenario where an unstable learning rate scheme leads to a super fast convergence, with the convergence rate depending only logarithmically on the condition number of the problem. Our scheme uses a Cyclical Learning Rate where we periodically take one large unstable step and several small stable steps to compensate for the instability. These findings also help explain the empirical observations of [Smith and Topin, 2019] where they claim CLR with a large maximum learning rate leads to “super-convergence”. We prove that our scheme excels in the problems where Hessian exhibits a bimodal spectrum and the eigenvalues can be grouped into two clusters (small and large). The unstable step is the key to enabling fast convergence over the small eigen-spectrum.
従来の通念では、勾配ベースのアルゴリズムが爆発しないように、学習率は安定した状態にある必要があります。 このノートでは、不安定な学習率スキームが超高速収束につながる単純なシナリオを紹介します。収束率は、問題の条件数に対数的にのみ依存します。 私たちのスキームは、不安定性を補うために、定期的に1つの大きな不安定なステップといくつかの小さな安定したステップを実行する循環学習率を使用します。 これらの調査結果は、[Smith and Topin、2019]の経験的観察を説明するのにも役立ちます。ここでは、最大学習率が高いCLRが「超収束」につながると主張しています。 私たちのスキームは、ヘッセ行列が二峰性スペクトルを示し、固有値を2つのクラスター(小さいものと大きいもの)にグループ化できる問題に優れていることを証明します。 不安定なステップは、小さな固有スペクトルでの高速収束を可能にするための鍵です。(Google翻訳)
Super-Convergence with an Unstable Learning Rate Samet Oymak∗, arXiv:2102.10734v1 [cs.LG] 22 Feb 2021
Our scheme uses a Cyclical Learning Rate where we periodically take one large unstable step and several small stable steps to compensate for the instability.
one large unstable step and several small stable steps
GRADIENT DESCENT ON NEURAL NETWORKS TYPICALLY OCCURS AT THE EDGE OF STABILITY Jeremy Cohen Simran Kaur Yuanzhi Li J. Zico Kolter1 and Ameet Talwalkar2 Carnegie Mellon University and: 1Bosch AI 2 Determined AI Correspondence to: jeremycohen@cmu.edu
arXiv:2103.00065v1 [cs.LG] 26 Feb 2021, Published as a conference paper at ICLR 2021
F. Chollet氏のテキストの犬と猫の分類で、500枚づつのデータを使用するとoverfittingによってaccuracyが0.7くらいで頭打ちになり、augmentationでデータ数を増やすことによって、0.85くらいまで上がるということから、augmentationの効果を体感するというのがあった。
AutoML: A Survey of the State-of-the-Art Xin He, Kaiyong Zhao, Xiaowen Chu, arXiv:1908.00709v6 [cs.LG] 16 Apr 2021
Abstract Deep learning (DL) techniques have obtained remarkable achievements on various tasks, such as image recognition, object detection, and language modeling. However, building a high-quality DL system for a specific task highly relies on human expertise, hindering its wide application. Meanwhile, automated machine learning (AutoML) is a promising solution for building a DL system without human assistance and is being extensively studied. This paper presents a comprehensive and up-to-date review of the state-of-the-art (SOTA) in AutoML. According to the DL pipeline, we introduce AutoML methods –– covering data preparation, feature engineering, hyperparameter optimization, and neural architecture search (NAS) –– with a particular focus on NAS, as it is currently a hot sub-topic of AutoML. We summarize the representative NAS algorithms’ performance on the CIFAR-10 and ImageNet datasets and further discuss the following subjects of NAS methods: one/two-stage NAS, one-shot NAS, joint hyperparameter and architecture optimization, and resource-aware NAS. Finally, we discuss some open problems related to the existing AutoML methods for future research.
概要 ディープラーニング(DL)技術は、画像認識、オブジェクト検出、言語モデリングなどのさまざまなタスクで目覚ましい成果を上げています。ただし、特定のタスクのために高品質のDLシステムを構築することは、人間の専門知識に大きく依存しており、その幅広いアプリケーションを妨げています。一方、自動機械学習(AutoML)は、人間の支援なしでDLシステムを構築するための有望なソリューションであり、広く研究されています。このホワイトペーパーでは、AutoMLの最先端(SOTA)の包括的で最新のレビューを紹介します。 DLパイプラインによると、現在AutoMLのホットなサブトピックであるため、NASに特に焦点を当てて、データ準備、機能エンジニアリング、ハイパーパラメータ最適化、ニューラルアーキテクチャ検索(NAS)をカバーするAutoMLメソッドを紹介します。 CIFAR-10およびImageNetデータセットでの代表的なNASアルゴリズムのパフォーマンスを要約し、NASメソッドの次の主題についてさらに説明します:1/2ステージNAS、ワンショットNAS、共同ハイパーパラメータとアーキテクチャの最適化、およびリソース認識NAS。最後に、将来の研究のために、既存のAutoMLメソッドに関連するいくつかの未解決の問題について説明します。 by Google翻訳
2.3. Data Augmentation To some degree, data augmentation (DA) can also be regarded as a tool for data collection, as it can generate new data based on the existing data. However, DA also serves as a regularizer to avoid over-fitting of model training and has received more and more attention. Therefore, we introduce DA as a separate part of data preparation in detail. Figure 3 classifies DA techniques from the perspective of data type (image, audio, and text), and incorporates automatic DA techniques that have recently received much attention. For image data, the affine transformations include rotation, scaling, random cropping, and reflection; the elastic transformations contain the operations like contrast shift, brightness shift, blurring, and channel shuffle; the advanced transformations involve random erasing, image blending, cutout [89], and mixup [90], etc. These three types of common transformations are available in some open source libraries, like torchvision 5, ImageAug [91], and Albumentations [92]. In terms of neural-based transformations, it can be divided into three categories: adversarial noise [93], neural style transfer [94], and GAN technique [95].
THE BREAK-EVEN POINT ON OPTIMIZATION TRAJECTORIES OF DEEP NEURAL NETWORKS Stanisław Jastrz˛ebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor Kyunghyun Cho, Krzysztof Geras, Published as a conference paper at ICLR 2020
ABSTRACT The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the optimization trajectory. We argue for the existence of the “break-even" point on this trajectory, beyond which the curvature of the loss surface and noise in the gradient are implicitly regularized by SGD. In particular, we demonstrate on multiple classification tasks that using a large learning rate in the initial phase of training reduces the variance of the gradient, and improves the conditioning of the covariance of gradients. These effects are beneficial from the optimization perspective and become visible after the break-even point. Complementing prior work, we also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers. In short, our work shows that key properties of the loss surface are strongly influenced by SGD in the early phase of training. We argue that studying the impact of the identified effects on generalization is a promising future direction.
Abstract The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improveSGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam, and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of “fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR- 10/100, neural machine translation, and Penn Treebank.
<雑談>今使っているコードはKerasで書かれていて、callbacksが使えるレベルまでには理解できていないことに気付き、F. Chollet氏のDeep Learningのテキストを読み返すことにした。画像処理分野は、Chapter 5のDeep Learning for computer visionまで勉強していたが、Chapter 7のAdvanced deep-learning best practicesは、殆ど読んでいなかった。7.2に面白い記述がある。大きなデータセットに対してmodel.fit( )またはmodel.fit_generator( )を用いて何十エポックものトレーニングの計算をすることは、紙飛行機を飛ばすようなもので、一旦手を離れると飛行経路も着陸位置も制御できない。これに対して、紙飛行機ではなくドローンを飛ばせば、環境情報を把握することができ、その情報をオペレータに伝えることによって状況に即した飛行をさせることができる。それがcallbacksの機能ということである。keras.callbacksとして、ModelCheckpoint, EarlyStopping, LearningRateScheduler, ReduceLROnPlateau, CSVLoggerなどの説明が続く。
TPU:バッチ数を多くすることによる高速化のイメージがある。Flower Classification with TPUsというコンペで、チュートリアルモデルでは、128バッチで計算していて、Adamを用い、学習率は0.00001からスタートし、5エポックくらいで0.0004まで上げ、12エポックで0.00011くらいまで下げるというような感じであった。なんだかCyclicLRに似ている。
HuBMAP:1,162 teams, 3 days to go:昨日より、400チームくらい少なくなっている。何がおきたのだろうか。データセットを更新する前の状態のまま放置しているチームがいなくなったようだ。自分のsubmit回数も、そのぶん、減っている。ということで、母数が減ったのでメダル圏内に該当するチーム数が40ほど減った。それゆえ、一気にメダル圏外に押し出され、やる気が失せた。残念だな。