Classes

The following classes are available globally.

  • A mutable, shareable, owning reference to a tensor.

    Declaration

    public final class Parameter<Scalar> where Scalar : TensorFlowScalar
    extension Parameter: CopyableToDevice
  • Class wrapping a C pointer to a TensorHandle. This class owns the TensorHandle and is responsible for destroying it.

    Declaration

    public class TFETensorHandle : _AnyTensorHandle
    extension TFETensorHandle: Equatable
  • A RMSProp optimizer.

    Implements the RMSProp optimization algorithm. RMSProp is a form of stochastic gradient descent where the gradients are divided by a running average of their recent magnitude. RMSProp keeps a moving average of the squared gradient for each weight.

    References:

    Declaration

    public class RMSProp<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative
        & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • An AdaGrad optimizer.

    Implements the AdaGrad (adaptive gradient) optimization algorithm. AdaGrad has parameter-specific learning rates, which are adapted relative to how frequently parameters gets updated during training. Parameters that receive more updates have smaller learning rates.

    AdaGrad individually adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the running sum of squares of gradient norms.

    Reference: “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization” (Duchi et al, 2011)

    Declaration

    public class AdaGrad<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative
        & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • An AdaDelta optimizer.

    Implements the AdaDelta optimization algorithm. AdaDelta is a stochastic gradient descent method based on the first order information. It adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. Thus, AdaDelta continues learning even when many updates have been done. It adapts faster to changing dynamics of the optimization problem space.

    Reference: “ADADELTA: An Adaptive Learning Rate Method” (Zeiler, 2012)

    Declaration

    public class AdaDelta<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative
        & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • Adam optimizer.

    Implements the Adam optimization algorithm. Adam is a stochastic gradient descent method that computes individual adaptive learning rates for different parameters from estimates of first- and second-order moments of the gradients.

    Reference: “Adam: A Method for Stochastic Optimization” (Kingma and Ba, 2014).

    Examples:

    • Train a simple reinforcement learning agent:
    ...
    // Instantiate an agent's policy - approximated by the neural network (`net`) after defining it 
    in advance.
    var net = Net(observationSize: Int(observationSize), hiddenSize: hiddenSize, actionCount: actionCount)
    // Define the Adam optimizer for the network with a learning rate set to 0.01.
    let optimizer = Adam(for: net, learningRate: 0.01)
    ...
    // Begin training the agent (over a certain number of episodes).
    while true {
    ...
        // Implementing the gradient descent with the Adam optimizer:
        // Define the gradients (use withLearningPhase to call a closure under a learning phase).
        let gradients = withLearningPhase(.training) {
            TensorFlow.gradient(at: net) { net -> Tensor<Float> in
                // Return a softmax (loss) function
                return loss = softmaxCrossEntropy(logits: net(input), probabilities: target)
            }
        }
        // Update the differentiable variables of the network (`net`) along the gradients with the Adam 
    optimizer.
        optimizer.update(&net, along: gradients)
        ...
        }
    }
    
    • Train a generative adversarial network (GAN):
    ...
    // Instantiate the generator and the discriminator networks after defining them.
    var generator = Generator()
    var discriminator = Discriminator()
    // Define the Adam optimizers for each network with a learning rate set to 2e-4 and beta1 - to 0.5.
    let adamOptimizerG = Adam(for: generator, learningRate: 2e-4, beta1: 0.5)
    let adamOptimizerD = Adam(for: discriminator, learningRate: 2e-4, beta1: 0.5)
    ...
    Start the training loop over a certain number of epochs (`epochCount`).
    for epoch in 1...epochCount {
        // Start the training phase.
        ...
        for batch in trainingShuffled.batched(batchSize) {
            // Implementing the gradient descent with the Adam optimizer:
            // 1) Update the generator.
            ...
            let 𝛁generator = TensorFlow.gradient(at: generator) { generator -> Tensor<Float> in
                ...
                return loss
                }
            // Update the differentiable variables of the generator along the gradients (`𝛁generator`) 
            // with the Adam optimizer.
            adamOptimizerG.update(&generator, along: 𝛁generator)
    
            // 2) Update the discriminator.
            ...
            let 𝛁discriminator = TensorFlow.gradient(at: discriminator) { discriminator -> Tensor<Float> in
                ...
                return loss
            }
            // Update the differentiable variables of the discriminator along the gradients (`𝛁discriminator`) 
            // with the Adam optimizer.
            adamOptimizerD.update(&discriminator, along: 𝛁discriminator)
            }
    }       
    

    Declaration

    public class Adam<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative
        & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • AdaMax optimizer.

    A variant of Adam based on the infinity-norm.

    Reference: Section 7 of “Adam - A Method for Stochastic Optimization”

    Declaration

    public class AdaMax<Model: Differentiable & KeyPathIterable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
        & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • AMSGrad optimizer.

    This algorithm is a modification of Adam with better convergence properties when close to local optima.

    Reference: “On the Convergence of Adam and Beyond”

    Declaration

    public class AMSGrad<Model: Differentiable & KeyPathIterable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
        & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • RAdam optimizer.

    Rectified Adam, a variant of Adam that introduces a term to rectify the adaptive learning rate variance.

    Reference: “On the Variance of the Adaptive Learning Rate and Beyond”

    Declaration

    public class RAdam<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
        & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • An infinite sequence of collections of sample batches suitable for training a DNN when samples are not uniformly sized.

    The batches in each epoch:

    • all have exactly the same number of samples.
    • are formed from samples of similar size.
    • start with a batch whose maximum sample size is the maximum size over all samples used in the epoch.

    Declaration

    public final class NonuniformTrainingEpochs<
      Samples: Collection,
      Entropy: RandomNumberGenerator
    >: Sequence, IteratorProtocol
  • General optimizer that should be able to express multiple possible optimizations. The optimizer is composed of a mapping from ParameterGroup to ParameterGroupOptimizer. This optimizer also contains the number of elements working in a cross replica sum. This is for efficiency to prevent multiple inefficient iterations over the gradient.

    Declaration

    public class GeneralOptimizer<Model: EuclideanDifferentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • SGD

    A stochastic gradient descent (SGD) optimizer.

    Implements the stochastic gradient descent algorithm with support for momentum, learning rate decay, and Nesterov momentum. Momentum and Nesterov momentum (a.k.a. the Nesterov accelerated gradient method) are first-order optimization methods that can improve the training speed and convergence rate of gradient descent.

    References:

    Declaration

    public class SGD<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • An infinite sequence of collections of batch samples suitable for training a DNN when samples are uniform.

    The batches in each epoch all have exactly the same size.

    Declaration

    public final class TrainingEpochs<
      Samples: Collection,
      Entropy: RandomNumberGenerator
    >: Sequence, IteratorProtocol
  • Declaration

    public class EpochPipelineQueue
  • The state of a training loop on a device.

    Declaration

    public class ThreadState<Model: Layer, Opt: Optimizer>
    where
      Opt.Model == Model, Opt.Scalar == Float, Model.Input == Tensor<Float>,
      Model.Output == Tensor<Float>,
      Model.TangentVector.VectorSpaceScalar == Float