Which optimization algorithm adapts the learning rate for each parameter?SGDAdamMomentumGradient DescentB) AdamAdam optimizer adjusts learning rates for each parameter, improving convergence.
Dropout in neural networks is used to:Speed up trainingPrevent overfittingIncrease accuracy directlyReduce dataset sizeB) Prevent overfittingDropout randomly ignores neurons during training, reducing reliance on specific neurons.
Which function is commonly used as an output activation in classification tasks?ReLUSigmoidTanhSoftmaxD) SoftmaxSoftmax converts outputs into probabilities across multiple classes.
Backpropagation is primarily used for:Data preprocessingWeight adjustmentModel evaluationFeature selectionB) Weight adjustmentBackpropagation updates weights by propagating error backward.
CNNs are especially effective for:Sequential dataImage dataTabular dataGraph dataB) Image dataCNNs excel at spatial feature extraction in images.
Which activation function solves the vanishing gradient problem better than sigmoid?ReLUTanhLinearStepA) ReLUReLU avoids saturation for positive inputs, reducing vanishing gradient issues.
Dropout in deep learning is used for:Increasing accuracyReducing overfittingFaster trainingNormalizationB) Reducing overfittingDropout randomly removes neurons during training to prevent overfitting.
CNNs are mainly used for:Text processingImage recognitionTime series predictionSorting algorithmsB) Image recognitionConvolutional Neural Networks excel at extracting spatial features in images.
Backpropagation updates weights using:Random valuesGradient descentDecision treesPCAB) Gradient descentGradient descent adjusts weights to minimize loss by following error gradients.
LSTM networks are designed to handle:Random noiseLong-term dependenciesLinear regressionParallel computingB) Long-term dependenciesLSTMs include memory cells and gates to capture long-term patterns in sequences.
Which layer in CNN reduces spatial dimensions?ConvolutionalPoolingFully connectedDropoutB) PoolingPooling layers downsample feature maps to reduce dimensions and computation.
Batch normalization helps in:Increasing model sizeFaster convergence and stabilityReducing training dataOverfittingB) Faster convergence and stabilityBatch normalization normalizes activations to speed up training and reduce internal covariate shift.
RNNs face vanishing gradient issues with:Small datasetsLong sequencesLarge batch sizesLinear modelsB) Long sequencesGradients shrink over long dependencies, making it hard to learn.
Which optimizer uses momentum to accelerate training?SGDAdamRMSPropNoneA) SGD with momentumMomentum in SGD speeds up learning by considering previous gradients.
Autoencoders are primarily used for:ClassificationFeature learning and dimensionality reductionSorting dataReinforcement learningB) Feature learning and dimensionality reductionAutoencoders compress and reconstruct data to learn useful features.
What is the vanishing gradient problem mainly associated with?ReLU activationSigmoid/Tanh activationsBatch normalizationDropout layersB) Sigmoid/Tanh activationsGradients shrink during backpropagation in deep networks with sigmoid/tanh.
Which optimizer adapts learning rate individually for each parameter?SGDAdamMomentumGradient DescentB) AdamAdam optimizer adjusts learning rates per parameter dynamically.
Convolution in CNNs is primarily used for:Sequence modelingFeature extraction from spatial dataMemory allocationWeight initializationB) Feature extraction from spatial dataCNNs capture patterns like edges and textures in images.
Dropout prevents:OverfittingUnderfittingGradient descentData preprocessingA) OverfittingDropout randomly disables neurons during training to improve generalization.
LSTM networks solve:Long-term dependency problemsLinear regressionImage classificationMemory fragmentationA) Long-term dependency problemsLSTMs control gradient flow with gates, preserving memory over time.
Which technique helps reduce exploding gradients?Weight decayGradient clippingDropoutEarly stoppingB) Gradient clippingGradient clipping caps gradients during backpropagation to stabilize training.
Which layer in CNN is responsible for reducing spatial dimensions?Convolution layer Pooling layerFully connected layerNormalization layerB) Pooling layerPooling layers downsample feature maps, reducing computation.
In GANs, the generator’s role is to:Detect fake dataCreate realistic data samplesTrain the discriminatorMinimize batch sizeB) Create realistic data samplesGenerators produce fake samples to fool the discriminator.
Which optimizer adapts learning rate using first and second moment estimates?SGDRMSPropAdamAdagradC) AdamAdam combines RMSProp and momentum for efficiency.
Capsule networks improve CNNs by:Removing poolingCapturing spatial hierarchiesUsing fewer layersReplacing ReLUB) Capturing spatial hierarchiesCapsules model part-whole relationships in images.
What does "vanishing gradient" typically affect the most?Output layerHidden layers in deep networksInput layerDropout layerB) Hidden layers in deep networksGradients become too small during backpropagation, slowing or stopping learning.
Which network type is best for image segmentation tasks?CNNRNNGANU-NetD) U-NetU-Net architecture is designed for precise image segmentation.
Batch Normalization helps in:Increasing biasReducing overfitting onlyStabilizing training and improving convergenceAdding noiseC) Stabilizing training and improving convergenceIt normalizes layer inputs to prevent internal covariate shift.
In a CNN, pooling layers mainly help in:Adding more neuronsIncreasing image sizeReducing spatial dimensions and computationGenerating gradientsC) Reducing spatial dimensions and computationPooling downsamples feature maps to make the model efficient.
Which neural-network layer type is most effective for sequence-to-sequence models like translation?Fully ConnectedConvolutional RecurrentDropoutC) RecurrentRNNs process sequential data where order matters.
Gradient clipping prevents:Overfitting Exploding gradientVanishing gradientsData leakageB) Exploding gradientsIt limits large updates that destabilize training.
The “attention” mechanism helps a model:Focus on relevant input parts Reduce model sizeDrop neuronsNormalize dataA) Focus on relevant input partsAttention weights important sequence elements.
Dropout layers are mainly used to:Increase training speedPrevent overfittingReduce bias Add layersB) Prevent overfittingThey randomly deactivate neurons during training.
A ReLU neuron outputs:Negative inputsOnly positive values Binary valuesWeighted averageB) Only positive valuesReLU(x)=max(0,x) improves gradient flow.
When deploying a deep learning model to production, which approach ensures consistent inference results across hardware types?RetrainingModel quantizationModel serialization (ONNX/TensorFlow SavedModel)GPU scalingC) Model serializationSaved formats like ONNX ensure model compatibility across platforms.
Which technique helps prevent overfitting in deep neural networks?Gradient DescentDropout Batch NormalizationData AugmentationB) DropoutDropout randomly deactivates neurons during training to improve generalization.
What does the vanishing gradient problem primarily affect?Recurrent Networks Convolutional LayersOutput LayersPooling LayersA) Recurrent NetworksRNNs often suffer from small gradient updates during backpropagation through time.
Which optimizer adapts learning rates for individual parameters?SGDRMSPropAdaGradAdamD) AdamAdam combines momentum and adaptive learning rates for efficient convergence.
Which activation function outputs values between 0 and 1?ReLUSigmoidTanhSoftmaxB) SigmoidSigmoid squashes input into a small range, commonly used in binary output layers.
What is the role of a loss function in deep learning?To initialize weightsTo measure model errorTo increase accuracyTo store model parametersB) To measure model errorThe loss function quantifies how far predictions are from true labels.
What is a convolutional layer mainly used for?Text processingImage feature extractionData compressionModel tuningB) Image feature extractionConvolutional layers extract spatial features from images in deep networks.
Which activation function reduces vanishing gradients?SigmoidReLUSoftmaxTanhB) ReLUReLU helps gradients flow through deep layers, preventing vanishing.
What is dropout used for in neural networks?Data augmentationPreventing overfittingWeight initialization Speeding up convergenceB) Preventing overfittingDropout randomly disables neurons during training for better generalization.
LSTM networks are most useful for:Static imagesSequential data Random noiseStructured tablesB) Sequential dataLSTMs capture long-term dependencies in time-series or text data.
What does “epoch” represent in training?One batchOne full dataset passOne gradient stepOne model testB) One full dataset passEach epoch means one complete iteration over all training samples.
Which of the following is used to prevent overfitting in deep neural networks?DropoutPoolingBatch normalization ActivationA) DropoutDropout randomly disables neurons during training to prevent overfitting.
What is the main function of an activation function?Increase accuracyAdd non-linearityReduce lossAdjust weightsB) Add non-linearityActivation functions introduce non-linear properties to model complex data patterns.
ReLU stands for:Rectified Linear UnitRegular Linear UnitRandomized Learning UnitRecursive Layer UnitA) Rectified Linear UnitReLU outputs positive values linearly and zeros out negatives, improving training efficiency.
The vanishing gradient problem occurs in:Shallow networksCNNs onlySmall datasetsNoneD)NoneAs gradients propagate backward, they shrink, slowing training.
Batch normalization helps to:Reduce computationNormalize activationsAvoid backpropagationIncrease dropoutB) Normalize activationsIt normalizes input layers, stabilizing and speeding up training.
In autoencoders, the middle layer is known as:Output layerHidden layerBottleneck layerReLU layerC) Bottleneck layerIt captures compressed representations of the input data.
What is the main purpose of a cost function?Measure model accuracyEvaluate loss between prediction and truthIncrease learning rateRegularize weightsB) Evaluate loss between prediction and truthCost functions guide optimization by measuring prediction error.
Which architecture uses skip connections?AlexNetVGGNetResNetLeNetC) ResNetResidual Networks use skip connections to avoid vanishing gradients.
In deep learning, what is a key benefit of using residual connections in ResNet architectures?They reduce the dataset sizeThey help train very deep networksThey normalize gradientsThey compress input dataB) They help train very deep networksResidual connections allow gradients to flow through skip paths, solving the vanishing gradient issue in deep models.
What does “early stopping” help prevent during model training?Overfitting UnderfittingGradient clippingModel biasA) OverfittingEarly stopping halts training once validation loss stops improving to avoid overfitting.
In CNNs, what is the purpose of pooling layers?Increase feature sizeReduce spatial dimensionsNormalize weightsAdd noise to inputB) Reduce spatial dimensionsPooling layers downsample feature maps, improving computational efficiency and translation invariance.
What is transfer learning most useful for?Training models from scratchApplying knowledge from a pre-trained modelData labeling Model compressionB) Applying knowledge from a pre-trained modelTransfer learning reuses learned features from large datasets to improve performance on smaller ones.
Which neural network architecture is most suited for sequence data?CNNRNNGANAutoencoderB) RNNRecurrent Neural Networks handle sequential data like time series and language.
What is the main purpose of the activation function?Add non-linearityNormalize outputIncrease biasSpeed up gradientA) Add non-linearityActivation functions like ReLU introduce non-linearity to model complex data.
What is the key feature of convolution in CNNs?Feature extractionMemory sharingPoolingBackpropagationA) Feature extractionConvolution layers extract spatial hierarchies from images.
Which activation function helps avoid the vanishing gradient problem?SigmoidReLUTanhSoftmaxB) ReLUReLU (Rectified Linear Unit) maintains positive gradients, preventing vanishing gradient issues.
What does “epoch” mean in deep learning?One forward passOne backward passOne full cycle through training dataA batch of samplesC) One full cycle through training dataAn epoch means one complete iteration over the entire dataset.
Dropout is used to:Increase accuracy on training dataReduce overfittingEnhance convergenceRemove neurons permanentlyB) Reduce overfittingDropout randomly deactivates neurons during training to improve generalization.
What is the main advantage of batch normalization?Decreases data sizeNormalizes inputs to each layerImproves hardware speedReduces epochsB) Normalizes inputs to each layerBatch normalization stabilizes learning and speeds up training by normalizing inputs.
Which activation function is most commonly used in the output layer for multi-class classification?Sigmoid ReLUSoftmaxTanhC) SoftmaxSoftmax converts output scores into probability distributions for multiple classes.
Which algorithm is used for weight optimization in neural networks?Decision TreeBackpropagationGradient BoostingK-MeansB) BackpropagationBackpropagation updates weights using gradients to minimize loss.
The vanishing gradient problem occurs mainly in:Shallow networks Deep networks Linear modelsClusteringB) Deep networksGradients diminish in deep layers, making training slow or unstable.