h_t = tanh(W_x * x_t + W_h * h_t-1 + b)
Recurrent Neural Networks (RNNs) are the powerhouse behind most modern breakthroughs in sequence data—think speech recognition, machine translation, time series forecasting, and even music generation. While standard neural networks treat each input as independent, RNNs have a "memory" that captures information from previous steps.
from keras.models import Sequential from keras.layers import LSTM, GRU, SimpleRNN, Dense, Embedding from keras.preprocessing import sequence max_features = 20000 maxlen = 100 # truncate reviews to 100 words batch_size = 32 Build model model = Sequential() model.add(Embedding(max_features, 128, input_length=maxlen)) model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2)) # or GRU(128) model.add(Dense(1, activation='sigmoid')) Compile (Theano backend) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) Train model.fit(x_train, y_train, batch_size=batch_size, epochs=5, validation_data=(x_val, y_val)) h_t = tanh(W_x * x_t + W_h *
import theano import theano.tensor as T import numpy as np x_t = T.matrix('input') h_prev = T.matrix('hidden_prev') W_xh = theano.shared(np.random.randn(input_dim, hidden_dim)) W_hh = theano.shared(np.random.randn(hidden_dim, hidden_dim)) b_h = theano.shared(np.zeros(hidden_dim))
import numpy as np from keras.models import Sequential from keras.layers import GRU, Dense def generate_sine_wave(seq_length, num_samples): X, y = [], [] for _ in range(num_samples): start = np.random.uniform(0, 4*np.pi) seq = np.sin(np.linspace(start, start + seq_length, seq_length + 1)) X.append(seq[:-1].reshape(-1, 1)) y.append(seq[-1]) return np.array(X), np.array(y) You'll learn the core RNN architectures (Simple RNN,
In this post, we’ll cut through the hype and get practical. You'll learn the core RNN architectures (Simple RNN, LSTM, GRU), and implement them in Python using (via the Keras wrapper, which historically used Theano as a backend). Even if you now use TensorFlow or PyTorch, understanding the Theano-era patterns will solidify your fundamentals.
h_t = T.tanh(T.dot(x_t, W_xh) + T.dot(h_prev, W_hh) + b_h) Here's how you'd build an LSTM for sentiment
By [Your Name]
| Architecture | # Gates | Cell State | Best for | |--------------|---------|------------|-----------| | Simple RNN | 0 | No | Very short sequences | | LSTM | 3 | Yes | Long dependencies, complex data | | GRU | 2 | No | Smaller datasets, faster training | While Theano is no longer actively developed (it was a pioneer, but most have moved to TensorFlow/PyTorch), many legacy systems and research codebases still use it. Here's how you'd build an LSTM for sentiment analysis using Theano with the Keras 1.x API:
Vanilla RNNs suffer from the vanishing/exploding gradient problem — they can't learn long-range dependencies (e.g., information from 50 steps ago). This is where LSTM and GRU come in. LSTM (Long Short-Term Memory) LSTMs introduce a cell state (a conveyor belt of information) and three gates: forget, input, and output. These gates learn what to remember, what to write, and what to output.
In Python (with Theano-style tensors), a naive implementation looks like: